Predictive Analytics
Now Reading
50 Top Free Data Mining Software

50 Top Free Data Mining Software

50 Top Free Data Mining Software
4.56 (91.18%) 161 ratings

50 Top Free Data Mining Software : Data Mining is the computational process of discovering patterns in large data sets involving methods using the artificial intelligence, machine learning, statistical analysis, and database systems with the goal to extract information from a data set and transform it into an understandable structure for further use. Orange, Weka,Rattle GUI, Apache Mahout, SCaViS, RapidMiner, R, ML-Flex, Databionic ESOM Tools, Natural Language Toolkit, SenticNet API , ELKI ,  UIMA, KNIME, , Vowpal Wabbit, GNU Octave, CMSR Data Miner, Mlpy, MALLET, Shogun, Scikit-learn, LIBSVM, LIBLINEAR, Lattice Miner, Dlib,  Jubatus, KEEL, Gnome-datamine-tools, Alteryx Project Edition , OpenNN, ADaM, ROSETTA, ADaMSoft, Anaconda, yooreeka, AstroML, streamDM,  jHepWork, TraMineR, ARMiner, arules,  CLUTO and TANAGRA are some of the top free data mining software in no particular order.

You may also like to review the top free data analysis freeware software list :
Top Free Data Analysis Software

You may also like to review the  top proprietary data mining software list:
Top Data Mining Software

Top Free Data Mining Software

Orange, Weka,Rattle GUI, Apache Mahout, SCaViS, RapidMiner, R, ML-Flex, Databionic ESOM Tools, Natural Language Toolkit, SenticNet API , ELKI ,  UIMA, KNIME, , Vowpal Wabbit, GNU Octave, CMSR Data Miner, Mlpy, MALLET, Shogun, Scikit-learn, LIBSVM, LIBLINEAR, Lattice Miner, Dlib,  Jubatus, KEEL, Gnome-datamine-tools, Alteryx Project Edition , OpenNN, ADaM, ROSETTA, ADaMSoft, Anaconda, yooreeka, AstroML, streamDM,  jHepWork, TraMineR, ARMiner, arules, CLUTO and TANAGRA.


Orange is a component based data mining and machine learning software suite written in the Python language. It is an Open source data visualization and analysis for novice and experts. Data mining can be done through visual programming or Python scripting. It has components for machine learning. There are add ons for bioinformatics and text mining. It is also packed with features for data analytics, different visualizations, from scatterplots, bar charts, trees, to dendrograms, networks and heatmaps. Orange remembers the choices, and suggests most frequently used combinations, and intelligently chooses which communication channels between widgets to use. Orange uses common Python open-source libraries for scientific computing, such as numpy, scipy and scikit-learn, while its graphical user interface operates within the cross-platform Qt framework.The default installation includes a number of machine learning, preprocessing and data visualization algorithms in 6 widget sets such as data, visualize, classify, regression, evaluate and unsupervised. Additional functionalities are available as add-ons for bioinformatics, data fusion and text-mining.


Dataiku DSS is the collaborative data science platform that enables teams to explore, prototype, build, and deliver their own data products more efficiently. Dataiku DSS provides an interactive visual interface where they can point, click, and build or use languages like SQL to data wrangle, model, easily re-run workflows, visualize results, and get up-to-date insights on demand. Dataiku DSS provides tools to draft data preparation and modelisation in seconds, that wish to leverage their favorite ML libraries (scikitlearn, R, MLlib, H2O, and so on), and that rely on automating their work in a completely customizable interface. Data Ops.


Integrated visual environment for data science applications with Dataiku DSS 3





Weka is a suite of machine learning software applications written in the Java programming language. Weka is Waikato Environment for Knowledge Analysis. It is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka.




3.Rattle GUI

Rattle GUI is a free and open source software providing a graphical user interface (GUI) for Data Mining using the R statistical programming language. Rattle provides considerable data mining functionality by exposing the power of the R Statistical Software through a graphical user interface.

Rattle GUI



4.Apache Mahout

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification. Many of the implementations use the Apache Hadoop platform.Provides algorithms for Scala + Apache Spark, H2O, Apache Flink. Also provides Samsara, a vector math experimentation environment with R-like syntax which works at scale.

Apache Mahout


SCaViS is a Java cross platform data analysis framework developed at Argonne National Laboratory.SCaVis can be used to plot functions and data in 2D and 3D, perform statistical tests, data mining, numeric computations, function minimization, linear algebra, solving systems of linear and differential equations. Linear, non-linear and symbolic regression are also available.





RapidMiner provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. RapidMiner is used for business, industrial applications, research, education, training, rapid prototyping, and application development and has more than 600 enterprise customers and more than 250,000 active users.





R is a language and environment for statistical computing and graphics.





ML-Flex is a software package that enables users to integrate with third party machine learning packages written in any programming language, execute classification analyses in parallel across multiple computing nodes, and produce HTML reports of classification results.


9.Databionic ESOM Tools

Databionic ESOM Tools is a suite of programs to perform data mining tasks like clustering, visualization, and classification with Emergent Self Organizing Maps (ESOM).

Databionic ESOM Tools

Databionic ESOM Tools

Databionic ESOM Tools

10.NLTK (Natural Language Toolkit)

NLTK ,Natural Language Toolkit, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python language.

NLTK (Natural Language Toolkit)

11.SenticNet API

SenticNet API is a semantic and affective resource for opinion mining and sentiment analysis.

SenticNet API


ELKI is a university research project with advanced cluster analysis and outlier detection methods written in the Java language.ELKI provides a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms.In ELKI, data mining algorithms and data management tasks are separated and allow for an independent evaluation.



The UIMA is Unstructured Information Management Architecture. UIMA is a component framework for analyzing unstructured content such as text, audio and video and is originally developed by IBM.UIMA enables applications to be decomposed into components. Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them.



KNIME, the Konstanz Information Miner, is a user friendly and comprehensive data analytics framework which offers capabilities for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting.



Knime is a chemical structure miner and web search engine.

16.Vowpal Wabbit

Vowpal Wabbit is an open source fast out of core learning system library and program developed originally at Yahoo! Research, and currently at Microsoft Research. Vowpal Wabbit’s is notable as an efficient scalable implementation of online machine learning and support for a number of machine learning reductions, importance weighting, and a selection of different loss functions and optimization algorithms.

Vowpal Wabbit


GraphLab is a graph-based, high performance, distributed computation framework written in C++. It is used in a broad range of other data-mining tasks; out-performing other abstractions by orders of magnitude.


Dato updates GraphLab Create to build Intelligent Applications Faster

18.GNU Octave

GNU Octave is a high level programming language, primarily intended for numerical computations. It provides a command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB.

GNU Octave

19.CMSR Data Miner

CMSR Data Miner Suite provides an integrated environment for predictive modeling, segmentation, data visualization, statistical data analysis, and rule-based model evaluation. It also provides integrated analytics and rule-engine environment for advanced power users.

CMSR Data Miner

CMSR Data Miner

CMSR Data Miner


Mlpy is a Python, open source, machine learning library built on top of NumPy/SciPy, the GNU Scientific Library. Mlpy provides a wide range of state-of-the-art machine learning methods for supervised and unsupervised problems and it is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency.




MALLET is an integrated collection of Java code useful for statistical natural language processing, document classification, cluster analysis, information extraction, topic modeling and other machine learning applications to text.



Shogun is a free, open source toolbox written in C++. It offers numerous algorithms and data structures for machine learning problems. The focus of Shogun is on kernel machines such as support vector machines for regression and classification problems. Shogun also offers a full implementation of Hidden Markov models.



Scikit-learn is an open source machine learning library for the Python programming language.It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.




LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National Taiwan University. LIBSVM implements the SMO algorithm for kernelized support vector machines (SVMs), supporting classification and regression.LIBLINEAR implements linear SVMs and logistic regression models trained using a coordinate descent algorithm.



25.Lattice Miner

Lattice Miner is a formal concept analysis software tool for the construction, visualization and manipulation of concept lattices. Lattice Miner allows also the drawing of nested line diagrams.

Lattice Miner


Dlib is a general purpose cross platform open source software library written in the C++ programming language. Its design is heavily influenced by ideas from design by contract and component-based software engineering.



Jubatus is an open source online machine learning Distributed computing framework. Jubatus has many features like classification, recommendation, regression, Anomaly detection, graph mining.



KEEL is Knowledge Extraction based on Evolutionary Learning and is a suite of machine learning software tools, developed under the Spanish National Project.KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms in order to assess the behavior of the algorithms.



29.Gnome datamine tools

Gnome datamine tools is a growing collection of tools packaged to provide a freely available single collection of data mining tools.

Gnome datamine tools

30.Modular toolkit for Data Processing (MDP)

The Modular toolkit for Data Processing (MDP) is a library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software.

Modular toolkit for Data Processing (MDP)


Fityk is a curve fitting and data analysis application, predominantly used to fit analytical, bell-shaped functions to experimental data. It is positioned to fill the gap between general plotting software and programs specific for one field.





Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.



PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.


34. MiningMart

MiningMart processes data from relational databases.MiningMart currently supports PostgreSQL, MySql and Oracle.


35.Alteryx Project Edition

Alteryx Project Edition comes with over 150 tools, to blend, cleanse and analyze data. Project Edition includes a number of predictive (R language) drag-and-drop tools that can build into analytic workflows. Some of the most useful of these are those related to A/B Testing. These are tools can be used to pilot a change for instance a new menu, a promotion, or a new web layout.

Alteryx Project Edition

Alteryx Analytics

Alteryx Analytics


OpenNN is an open source class library written in C++ which implements neural networks. The library is intended for advanced users, with high C++ and machine learning skills. OpenNN provides an effective framework for the research and development of data mining and predictive analytics algorithms and applications.


Neural Viewer

Neural Viewer


Algorithm Development and Mining System (ADaM) is used to apply data mining technologies to remotely-sensed and other scientific data. The mining and image processing toolkits consist of interoperable components that can be linked together in a variety of ways for application to diverse problem domains.



DataMelt is a free mathematics software which can be used for numeric computation, statistics, symbolic calculations, data analysis and data visualization.


ROSETTA is a toolkit for analyzing tabular data within the framework of rough set theory. ROSETTA is designed to support the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to validation and analysis of the induced rules or patterns.



ADaMSoft is a free and Open Source Data Mining software developed in Java. It contains data management methods and it can create ready to use reports. It can read data from several sources and it can write the results in different formats.



Anaconda is an open data science platform powered by Python. The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science. There is also access to over 720 packages that can easily be installed with conda, the package, dependency and environment manager, that is included in Anaconda.Includes the most popular Python, R & Scala packages for stats, data mining, machine learning, deep learning, simulation & optimization, geospatial, text & NLP, graph & network, image analysis. Featured packages include: NumPy, SciPy, pandas, scikit-learn, Numba, PyTables, h5py, Matplotlib, Jupyter (formerly IPython), Spyder, Qt/PySide, VTK, Numexpr, Cython, Theano, scikit-image, NLTK, NetworkX, IRKernel, dplyr, shiny, ggplot2, tidyr, caret, nnet.




yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The algorithms covered are Clustering :Hierarchical—Agglomerative and Divisive Partitional , Classification :Bayesian,  Decision trees, Neural Networks, Rule based, Recommendation,  Collaborative filtering : Content based, Search, PageRank, DocRank and Personalization.



AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets.



jHepWork is an environment for scientific computation, data analysis and data visualization. It is fully multiplatform, a 100% Java and integrated with the Jython (Python) scripting language.




streamDM is a new open source software for mining big data streams using Spark Streaming, started at Huawei Noah’s Ark Lab. Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources.



TraMineR is a R-package for mining, describing and visualizing sequences of states or events. Its primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories.



ARMiner is a client-server data mining application specialized in finding association rule. ARMiner has been developped at UMass/Boston as a Software Engineering project.



arules provides the infrastructure for representing, manipulating and analyzing transaction data and patterns in frequent itemsets and association rules.



CLUTO is a software package for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters. CLUTO is well-suited for clustering data sets arising in many diverse application areas including information retrieval, customer purchasing transactions.



TANAGRA is a data mining software for academic and research purposes. It proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. TANAGRA contains some supervised learning but also other paradigms such as clustering, factorial analysis, parametric and nonparametric statistics, association rule, feature selection and construction algorithms.



You may also like to review the top free data analysis freeware software list :
Top Free Data Analysis Software

You may also like to review the  top proprietary data mining software list:
Top Data Mining Software

5 Reviews
  • Mike
    March 17, 2014 at 9:23 am

    Hello bud, on your data mining softwares witch 1 would u recommend for email mining? Thank you

  • Phoenix
    April 1, 2014 at 11:50 pm

    Do any of these have non-English capabilities?

  • Venkatesh
    July 29, 2014 at 12:52 am

    Hi buddy! Are there any attempts to do cloud based data analytics softwares? I think such a thing can solve the problem Phoenix had mentioned.

  • K R Chin
    January 25, 2015 at 6:14 pm

    I’d like to know if there are any data mining programs which could be used to predict terrorist activities or analyze material movements (shipping, purchases, and orders) to search for indicators of suspicious activity.

    I’m a security consultant and advisor, this sort of information would be useful in my consultations.

  • Mahrez
    March 5, 2015 at 4:00 pm

    Hi KR Chin,

    To predict any activity you need to know which variables you want to base your prediction on. You also need a historical data to run your predictive analysis and find the possible correlations between different event. I know that somewhere in the US the police uses crime predictions based on historical criminality data (new Orleans if I am not mistaken)…bottom line : you need data to get the info ! have fun 🙂

What's your reaction?
Love It
Very Good
About The Author