Now Reading
Top 27 Free Software for Text Analysis, Text Mining, Text Analytics

Top 27 Free Software for Text Analysis, Text Mining, Text Analytics

Top 27 Free Software for Text Analysis, Text Mining, Text Analytics
4.65 (93.06%) 124 ratings

Top 27 Free Software for Text Analysis, Text Mining, Text Analytics : Text Analytics is the process of converting unstructured text data into meaningful data. List of the Top 27+ Free Software for Text Analysis, Text Mining, Text Analytics include QDA Miner Lite, KH Coder, TAMS Analyzer, Carrot2, CAT, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable, LPU, Apache Mahout, Pattern, LingPipe, S-EM, LibShortText, VisualText, Twinword, Apache Stanbol, Datumbox API, Aika, Distributed Machine Learning Toolkit and Coh-Metrix. These are some of the key vendors who provides open source text analytics software in no particular order. The text analysis applications scan a set of documents written in a natural language. These applications model the document set for predictive classification purposes or populate a database or search index with the information extracted.

You may also like to review the Text Analysis, Text Mining, Text Analytics proprietary software list:

Top software for Text Analysis, Text Mining, Text Analytics

 Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

Top 27 Free Software for Text Analysis, Text Mining, Text Analytics

Here is a list of some of the open source – Top 27 Free Software for Text Analysis, Text Mining, Text Analytics :

QDA Miner Lite, KH Coder, TAMS Analyzer, Carrot2, CAT, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable, LPU, Apache Mahout, Pattern, LingPipe, S-EM, LibShortText, VisualText, Twinword, Apache Stanbol, Datumbox API, Aika, Distributed Machine Learning Toolkit and Coh-Metrix.

Top 27 Free Software for Text Analysis, Text Mining, Text Analytics

1.QDA Miner Lite

QDA Miner Lite is a free computer assisted qualitative analysis software from Provalis Research. It can be used for the analysis of textual data such as interview and news transcripts, open ended responses, as well as for the analysis of still images. It offers basic CAQDAS features such as, importation of documents from plain text, RTF, HTML, PDF as well as data stored in Excel, MS Access, CSV, tab delimited text files. Features also include importation from other qualitative coding software, intuitive coding using codes organized in a tree structure, ability to add comments (or memos) to coded segments, cases or the whole project.

The software also has functionalities for fast Boolean text search tool for retrieving and coding text segments, code frequency analysis with bar chart, pie chart and tag clouds, coding retrieval with Boolean and proximity operators, export tables to XLS, Tab Delimited, CSV formats, and Word format and export graphs to BMP, PNG, JPEG, WMF formats.


Sisense empower the most non-technical user with the ability to access data and build interactive dashboards and business intelligence reports. Sisense provides a variety of dashboard widgets to pinpoint the best visualization for your data, such as: geographical maps, gauges to measure KPIs, line charts to determine trends, scatter plots to see correlations, and pie charts for clear comparisons.Sisense enables to customize dashboard layout with drag-and-drop features to place each widget exactly where you want for optimal representation.


Easily join, analyze and visualize using SiSense

Provalis Research

QDA Miner Lite

QDA Miner Lite


GATE is the General Architecture for Text Engineering. This is an open source toolbox for natural language processing and language engineering. Used for all sorts of language processing tasks and applications, including voice of the customer, cancer research, drug research, decision support, recruitment, web mining, information extraction and semantic annotation.

GATE includes an information extraction system called ANNIE which is known as A Nearly-New Information Extraction System. This is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger. ANNIE can be used as-is to provide basic information extraction functionality, or provide a starting point for more specific tasks.
Languages currently handled in GATE are English, Spanish, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian.




3.TAMS Analyzer

TAMS Analyzer for Macintosh OS X is a convention for identifying themes in texts such as web pages, interviews, field notes. It was designed for use in ethnographic and discourse research. TAMS Analyzer is a program that works with TAMS to assign ethnographic codes to passages of a text just by selecting the relevant text and double clicking the name of the code on a list. It then allows to extract, analyze, and save coded information.

TAMS Analyzer

TAMS Analyzer

TAMS Analyzer


Carrot2 does text and search results clustering frame work. It can automatically cluster small collections of documents, search results or document abstracts into thematic categories. Its an open source search results Clustering Engine. Apart from two specialized search results clustering algorithms, Carrot also offers ready to use components for fetching search results from various sources such as including GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, and more.





CAT is a free service of the Qualitative Data Analysis Program, which efficiently code raw text data sets, annotate coding with shared memos, manage team coding permissions via the Web, create unlimited collaborator sub-accounts and assign multiple coders to specific tasks. CAT, easily measure inter-rater reliability, adjudicate valid & invalid coder decisions,report validity by dataset, code or coder and export coding in RTF, CSV or XML format.


6.KH Coder

KH Coder is an application for quantitative content analysis, text mining or corpus linguistics. It can handle Japanese, English, French, German, Italian, Portuguese and Spanish language data.
By inputting the raw texts the searching and statistical analysis functionalities like KWIC, collocation statistics, co-occurrence networks, self-organizing map, multidimensional scaling, cluster analysis and correspondence analysis can be utilized.KH Coder provides various kinds of search and statistical analysis functions using back-end tools such as Stanford POS Tagger, Snowball stemmer, MySQL and R.

KH Coder

KH Coder

KH Coder (Text Mining Infrastructure in R)

tm package provides a framework for text mining applications within R. The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package provides native support for reading in several classic file formats such as plain text, PDFs, or XML files. There is also a plug-in mechanism to handle additional file formats. The data structures and algorithms can be extended to fit custom demands.





Gensim is a Python library which provides scalable statistical semantics, analyze plain text documents for semantic structure and retrieve semantically similar documents. The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation or Random Projections, discover semantic structure of documents, by examining word statistical co-occurrence patterns within a corpus of training documents. These algorithms are unsupervised.


9.Natural Language Toolkit (NLTK)

Natural Language Toolkit (NLTK) is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. NLTK is available for Windows, Mac OS X, and Linux.

Natural Language Toolkit (NLTK)


RapidMiner is an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. The Text Processing Extension provides data and text mining software.




11.Unstructured Information Management Architecture (UIMA)

Unstructured Information Management Architecture (UIMA) is a component framework to analyze unstructured content such as text, audio and video. This is originally developed by IBM.

UIMA enables applications to be decomposed into components, for example “language identification” => “language specific segmentation” => “sentence boundary detection” => Each component implements interfaces defined by the framework and provides self describing metadata via XML descriptor files. Also provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

Unstructured Information Management Architecture (UIMA)




The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.


13. KNIME Text Processing

The KNIME Text processing feature enables to read, process, mine and visualize textual data in a convenient way. It provides functionality from natural language processing (NLP), text mining and information retrieval.

KNIME Text processing


Textable is an add-on for Orange data mining software package. It enables users to build data tables on the basis of text data, by means of a flexible and intuitive interface. It offers in particular the following features such as import text data from various sources, apply systematic recoding operations, apply analytical processes such as segmentation and annotation, manually, automatically or randomly select unit subsets and build concordances and collocation lists.

Orange Textable


LPU is Learning from Positive and Unlabeled data. LPU is a text learning or classification system that learns from a set of positive documents and a set of unlabeled documents, without labeled negative documents. This type of learning is different from classic text learning/classification, in which both positive and negative training documents are required.


16.Apache Mahout

Apache Mahout is a project of the Apache Software Foundation with the objective of creating scalable machine learning algorithms that are free to use under the Apache license. Mahout contains implementations for clustering, categorization and collaborative filtering. The implementation can be on the top of Apache Hadoop using the map/reduce paradigm. The three use cases which are supported are, recommendation mining, which takes users behavior and from that tries to find items users might like. Clustering which takes the text documents and groups them into groups of topically related documents. Classification which learns from existing categorized documents on what documents of a specific category look like and assign unlabelled documents to the correct category.

Apache Mahout


Pattern is a web mining module for the Python programming language which provide tools for data mining: Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser, natural language processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet, machine learning: vector space model, clustering, SVM, network analysis and canvas visualization.



LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like finding the names of people, organizations or locations in news, automatically classify Twitter search results into categories and suggest correct spellings of queries. LingPipe is Java API with source code and unit tests and multi-lingual, multi-domain, multi-genre models.



S-EM is a text learning or classification system that learns from a set of positive and unlabeled examples with no negative examples. It is based on a “spy” technique, naive Bayes and EM algorithm.



LibShortText is an open source tool for short-text classification and analysis. LibShortText can handle the classification of titles, questions, sentences, and short messages. It is more efficient than general text-mining packages. On a typical computer, processing and training 10 million short texts takes only around half an hour. An interactive tool for error analysis is included. Based on the property that each short text contains few words, LibShortText provides details in predicting each text.


VisualText is the premier integrated development environment for building information extraction systems, natural language processing systems, and text analyzers. It features NLP++ — a new C++ -like programming language for quickly elaborating grammars, patterns, heuristics, and knowledge.



Twinword provides text analysis APIs that can understand and associate words in the same way as humans do. Features include context and topic extraction ,online consumer sentiment analysis for brands and products and personalized and targeted e-commerce/advertising platforms.


23.Apache Stanbol

Apache Stanbol provides a set of reusable components for semantic content management to extend traditional content management systems with semantic services. Other use cases includes direct usage from web applications for tag extraction/suggestion; or text completion in search fields, ‘smart’ content workflows or email routing based on extracted entities, topics, etc.

Apache Stanbol

24.Datumbox API

Datumbox API offers a large number of off-the-shelf Classifiers and Natural Language Processing services which can be used in a broad spectrum of applications including: Sentiment Analysis, Topic Classification, Language Detection, Subjectivity Analysis, Spam Detection, Reading Assessment, Keyword and Text Extraction and more.

Datumbox API


Aika is a text-mining algorithm. It combines various ideas from the field of machine learning such as artificial neuronal networks, frequent pattern mining and grammar induction. Aika does not use a predefined dictionary, so the goal is to derive the syllables and their occurrences within words from raw text.


26.Distributed Machine Learning Toolkit

Distributed Machine Learning Toolkit is a flexible framework that supports unified interface for data parallelization, hybrid data structure for big model storage, model scheduling for big model training, and automatic pipelining for high training efficiency.

Distributed Machine Learning Toolkit


Coh-Metrix is a system for computing computational cohesion and coherence metrics for written and spoken texts. Coh-Metrix allows readers, writers, educators, and researchers to instantly gauge the difficulty of written text for the target audience.


You may also like to review the Text Analysis, Text Mining, Text Analytics proprietary software list:
Top software for Text Analysis, Text Mining, Text Analytics

You may also like to review the Top Qualitative Data Analysis Software proprietary software list:
Top Qualitative Data Analysis Software

You may also like to review the Top Free Qualitative Data Analysis Software software list:
Top Free Qualitative Data Analysis Software

6 Reviews
  • May 22, 2014 at 9:54 am

    Have you looked at the free, open source, web-based ?

  • February 16, 2015 at 7:46 pm

    DiscoverText is a freemium software with many powerful text analytics features that is free for 30 days and a core set of coding (labeling/annotation) that remain free after the 30 day trial expires.

  • Amnon Meyers
    April 9, 2015 at 4:12 pm

    VisualText at has been here for 15 years, and is a one-stop shop for developing the most accurate and complete NLP solutions. Free for non-commercial use (that is, till you are actually deploying or reaping revenue from your analyzers).
    NLP++ is one of the only programming languages for NLP.

    Check out the new website at

    Amnon Meyers
    Text Analysis International, Inc

  • June 21, 2015 at 9:30 pm

    I would like to recommend Twinword’s Text Analysis APIs.

    Check out the website for a list of APIs for different functions of text analysis at:


  • July 25, 2015 at 11:23 am

    Coh-Metrix, a theoretically grounded, computational linguistics facility that analyzes texts on multiple levels of language and discourse (Graesser et al., 2014; Graesser, McNamara, Louwerse, & Cai, 2004; D. S. McNamara, Graesser, McCarthy, & Cai, 2014).

  • August 26, 2016 at 5:38 am

    I have used Natural Language Processing on review based texts to find insights…We found that when trying to identify issues or areas of concerns, we wrote queries to identify the Top 25 Negative Noun Tokens in Sentences and include the related sentences after Natural Language Processing. We then grouped those sentences for tagging in an interactive tree (tree of sentences). We were able to identify the top issues affecting consumers, very quickly; because of the refined sample size (Top 25 Tokens). We would repeat this effort with each week of new data…slowly becoming the knowledge experts in the source domain. As the unique issues started to dry up we instituted a dynamic filtering system where every keyword in a sentence became a filter. We could shuffle the results with each click, spinning the results. We also implemented the ability to combine those keywords and flip them for even more complex dynamic filters. And then we also started an automatic favourite keyword identification system so that on subsequent weeks of results, I knew which keywords/favs were able to pull back the targeted results we were after. So for those looking to find the top negative issues, this may be a plan of attack in the identification of issues, something you could include in your own system. I have incorporated these tools into to see this in action. Hope this helps someone when trying to identify the insights from customer feedback.

What's your reaction?
Love It
Very Good
About The Author