Reviews
Now Reading
RapidMiner Text Mining Extension
0
Review

RapidMiner Text Mining Extension

Overview
Synopsis

The RapidMiner Text Extension adds all operators necessary for statistical text analysis. Texts from different data sources can be loaded and, can be transformed by different filtering techniques, to analyze text data.

Category

Text Analytics Software

Features

•Statistical text analysis
•Load texts from many different data sources
•Filtering techniques, and finally analyze your text data
•Supports several text formats including plain text, HTML, or PDF as well as other data sources
•Standard filters for tokenization, stemming, stopword filtering, or n-gram generation

License

Proprietary Software

Price

Free

Free Trial

Available

Users Size

Small (<50 employees), Medium (50 to 1000 employees), Enterprise (>1000 employees)

Company

RapidMiner

What is best?

•Statistical text analysis
•Load texts from many different data sources
•Filtering techniques, and finally analyze your text data
•Supports several text formats including plain text, HTML, or PDF as well as other data sources

What are the benefits?

•Analyze your text data
•Different filtering techniques

PAT Rating™
Editor Rating
Aggregated User Rating
Rate Here
Ease of use
7.7
7.6
Features & Functionality
7.7
8.1
Advanced Features
7.7
7.0
Integration
7.7
6.2
Performance
7.7
7.8
Customer Support
7.7
9.2
Implementation
9.1
Renew & Recommend
6.8
Bottom Line

The RapidMiner Text Extension adds all operators necessary for statistical text analysis. The Rapidminer Text Extensions supports several text formats including plain text, HTML, or PDF. It also provides standard filters for tokenization, stemming, stopword filtering, or n-gram generation.

7.7
Editor Rating
7.7
Aggregated User Rating
22 ratings
You have rated this

RapidMiner is an open source data mining framework, which offers many operators that can be formed together into a process. A graphical user interface (GUI) allows to connect the operators with each other in the process view. The major function of a process is the analysis of the data which is retrieved at the beginning of the process. There are many packages available for RapidMiner, such as text processing, Weka extension, parallel processing, web mining, reporting extension, series processing, PMML, community, and R extension packages.

RapidMiner Text Mining Extension

The RapidMiner Text Extension adds all operators necessary for statistical text analysis. Texts from different data sources can be loaded and, can be transformed by different filtering techniques, to analyze text data. The Rapidminer Text Extensions supports several text formats including plain text, HTML, or PDF. It also provides standard filters for tokenization, stemming, stopword filtering, or n-gram generation.

RapidMiner Text Analytics

RapidMiner Text Analytics

The Text Processing package, can be installed and updated through the Update RapidMiner menu item under the Help menu. The Text Mining extension uses a special class for handling documents, called the Document class. This class stores the whole document in combination with additional meta information.

RapidMiner Text Analytics

RapidMiner Text Analytics

In the case of text mining the document is split into unique tokens. These tokens can be used to classify the complete document. Tokenization is the process of breaking a stream of text up into phrases, words, symbols, or other meaningful elements called tokens. The application of these tokenizers, result in a sheet containing the tokens in the order as they have been found in the document. Each token contains a number providing the information from which general unit it has been created. As an example, each word token of a particular sentence, contains the number of the sentence, and each sentence-token of a document contains the number of that document. There are also functionality to extend the Tokenizer class easily to create own tokenizers. There are also features for eliminating all the stop words. The other features include Stemming, which is also known as lemmatisation, a technique for the reduction of words into their stems, base or root and filtering.

Filter reviews
User Ratings





User Company size



User role





User industry





Ease of use
Features & Functionality
Advanced Features
Integration
Performance
Customer Support
Implementation
Renew & Recommend

What's your reaction?
Love It
8%
Very Good
58%
INTERESTED
25%
COOL
8%
NOT BAD
0%
WHAT !
0%
HATE IT
0%