Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

Top 11 Free Software for Text Analysis, Text Mining, Text Analytics 4.92/5 (98.33%) 12 ratings

Top 11 Free Software for Text Analysis, Text Mining, Text Analytics : Text Analytics is the process of converting unstructured text data into meaningful data. List of some of the Top 11+ Free Software for Text Analysis, Text Mining, Text Analytics. KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable, LPU and Apache Mahout are some of the key vendors who provides open source text analytics software in no particular order. The text analysis applications scan a set of documents written in a natural language. These applications model the document set for predictive classification purposes or populate a database or search index with the information extracted.

You may also like to review the Text Analysis, Text Mining, Text Analytics proprietary software list:

Top 30 software for Text Analysis, Text Mining, Text Analytics

 

 Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

Here is a list of some of the open source – Top 11 Free Software for Text Analysis, Text Mining, Text Analytics :

Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

1.KH Coder

KH Coder is an application for quantitative content analysis, text mining or corpus linguistics. It can handle Japanese, English, French, German, Italian, Portuguese and Spanish language data. By inputting the raw texts the searching and statistical analysis functionalities like KWIC, collocation statistics, co-occurrence networks, self-organizing map, multidimensional scaling, cluster analysis and correspondence analysis can be utilized.

KH Coder

KH Coder

KH Coder

2.Carrot2

Carrot2 does text and search results clustering frame work. It can automatically cluster small collections of documents, search results or document abstracts into thematic categories. Its an open source search results Clustering Engine. Apart from two specialized search results clustering algorithms, Carrot also offers ready to use components for fetching search results from various sources such as including GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, and more.

Carrot2

Carrot2

Carrot2

3.GATE

GATE is the General Architecture for Text Engineering. This is an open source toolbox for natural language processing and language engineering. Used for all sorts of language processing tasks and applications, including voice of the customer, cancer research, drug research, decision support, recruitment, web mining, information extraction and semantic annotation.
GATE includes an information extraction system called ANNIE which is known as A Nearly-New Information Extraction System. This is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks.
Languages currently handled in GATE are English, Spanish, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian.

GATE

GATE

GATE

4.tm (Text Mining Infrastructure in R)

tm package provides a framework for text mining applications within R. The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package provides native support for reading in several classic file formats such as plain text, PDFs, or XML files. There is also a plug-in mechanism to handle additional file formats. The data structures and algorithms can be extended to fit custom demands.

tm

tm

tm

5.Gensim

Gensim is a Python library which provides scalable statistical semantics, analyze plain text documents for semantic structure and retrieve semantically similar documents. The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised.

Gensim

6.Natural Language Toolkit (NLTK)

Natural Language Toolkit (NLTK) is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. NLTK is available for Windows, Mac OS X, and Linux.

 Natural Language Toolkit (NLTK)

7.RapidMiner

RapidMiner is an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. The Text Processing Extension provides data and text mining software.

 RapidMiner

Rapidminer

Rapidminer

8.Unstructured Information Management Architecture (UIMA)

Unstructured Information Management Architecture (UIMA) is a component framework to analyze unstructured content such as text, audio and video. This is originally developed by IBM.
UIMA enables applications to be decomposed into components, for example “language identification” => “language specific segmentation” => “sentence boundary detection” => Each component implements interfaces defined by the framework and provides self describing metadata via XML descriptor files. Also provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

 Unstructured Information Management Architecture (UIMA)

UIMA

UIMA

9.OpenNLP

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.

 OpenNLP

10. KNIME Text Processing

The KNIME Text processing feature enables to read, process, mine and visualize textual data in a convenient way. It provides functionality from natural language processing (NLP), text mining and information retrieval.

 KNIME Text processing

11.Orange-Textable

Textable is an add-on for Orange data mining software package. It enables users to build data tables on the basis of text data, by means of a flexible and intuitive interface. It offers in particular the following features such as import text data from various sources, apply systematic recoding operations, apply analytical processes such as segmentation and annotation, manually, automatically or randomly select unit subsets and build concordances and collocation lists.

Orange Textable

12.LPU

LPU is Learning from Positive and Unlabeled data. LPU is a text learning or classification system that learns from a set of positive documents and a set of unlabeled documents, without labeled negative documents. This type of learning is different from classic text learning/classification, in which both positive and negative training documents are required.

LPU

13.Apache Mahout

Apache Mahout is a project of the Apache Software Foundation with the objective of creating scalable machine learning algorithms that are free to use under the Apache license. Mahout contains implementations for clustering, categorization and collaborative filtering. The implementation can be on the top of Apache Hadoop using the map/reduce paradigm. The three use cases which are supported are, recommendation mining, which takes users behavior and from that tries to find items users might like. Clustering which takes the text documents and groups them into groups of topically related documents. Classification which learns from existing categorized documents on what documents of a specific category look like and assign unlabelled documents to the correct category.

Apache Mahout

You may also like to review the Text Analysis, Text Mining, Text Analytics proprietary software list:

Top 30 software for Text Analysis, Text Mining, Text Analytics

Author: [email protected]

Share This Post On

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Pinterest