Sign in to see all reviews and comparisons. It's Free!
By clicking Sign In with Social Media, you agree to let PAT RESEARCH store, use and/or disclose your Social Media profile and email address in accordance with the PAT RESEARCH
Privacy Policy
and agree to the
Terms of Use.
The RapidMiner Text Extension adds all operators necessary for statistical text analysis. Texts from different data sources can be loaded and, can be transformed by different filtering techniques, to analyze text data.
Category
Text Analytics Software
Features
•Statistical text analysis
•Load texts from many different data sources
•Filtering techniques, and finally analyze your text data
•Supports several text formats including plain text, HTML, or PDF as well as other data sources
•Standard filters for tokenization, stemming, stopword filtering, or n-gram generation
License
Proprietary Software
Price
Free
Free Trial
Available
Users Size
Small (<50 employees), Medium (50 to 1000 employees), Enterprise (>1000 employees)
Company
RapidMiner
What is best?
•Statistical text analysis
•Load texts from many different data sources
•Filtering techniques, and finally analyze your text data
•Supports several text formats including plain text, HTML, or PDF as well as other data sources
What are the benefits?
•Analyze your text data
•Different filtering techniques
PAT Rating™
Editor Rating
Aggregated User Rating
Rate Here
Ease of use
7.7
7.6
Features & Functionality
7.7
8.1
Advanced Features
7.7
7.0
Integration
7.7
6.2
Performance
7.7
7.8
Customer Support
7.7
9.2
Implementation
9.1
Renew & Recommend
6.8
Bottom Line
The RapidMiner Text Extension adds all operators necessary for statistical text analysis. The Rapidminer Text Extensions supports several text formats including plain text, HTML, or PDF. It also provides standard filters for tokenization, stemming, stopword filtering, or n-gram generation.
7.7
Editor Rating
7.7
Aggregated User Rating
22 ratings
You have rated this
RapidMiner is an open source data mining framework, which offers many operators that can be formed together into a process. A graphical user interface (GUI) allows to connect the operators with each other in the process view. The major function of a process is the analysis of the data which is retrieved at the beginning of the process. There are many packages available for RapidMiner, such as text processing, Weka extension, parallel processing, web mining, reporting extension, series processing, PMML, community, and R extension packages.
RapidMiner Text Mining Extension
The RapidMiner Text Extension adds all operators necessary for statistical text analysis. Texts from different data sources can be loaded and, can be transformed by different filtering techniques, to analyze text data. The Rapidminer Text Extensions supports several text formats including plain text, HTML, or PDF. It also provides standard filters for tokenization, stemming, stopword filtering, or n-gram generation.
RapidMiner Text Analytics
The Text Processing package, can be installed and updated through the Update RapidMiner menu item under the Help menu. The Text Mining extension uses a special class for handling documents, called the Document class. This class stores the whole document in combination with additional meta information.
RapidMiner Text Analytics
In the case of text mining the document is split into unique tokens. These tokens can be used to classify the complete document. Tokenization is the process of breaking a stream of text up into phrases, words, symbols, or other meaningful elements called tokens. The application of these tokenizers, result in a sheet containing the tokens in the order as they have been found in the document. Each token contains a number providing the information from which general unit it has been created. As an example, each word token of a particular sentence, contains the number of the sentence, and each sentence-token of a document contains the number of that document. There are also functionality to extend the Tokenizer class easily to create own tokenizers. There are also features for eliminating all the stop words. The other features include Stemming, which is also known as lemmatisation, a technique for the reduction of words into their stems, base or root and filtering.
By clicking Sign In with Social Media, you agree to let PAT RESEARCH store, use and/or disclose your Social Media profile and email address in accordance with the PAT RESEARCH Privacy Policy and agree to the Terms of Use.