Predictive Analytics
Now Reading
50 Top Free Data Mining Software
6

50 Top Free Data Mining Software

50 Top Free Data Mining Software
4.5 (90.7%) 185 ratings

Data Mining is the computational process of discovering patterns in large data sets involving methods using the artificial intelligence, machine learning, statistical analysis, and database systems with the goal to extract information from a data set and transform it into an understandable structure for further use. Orange Data mining, R Software Environment, Weka Data Mining, SpagoBI Business Intelligence, Anaconda, Shogun, DataMelt, Natural Language Toolkit, Apache Mahout, GNU Octave, RapidMiner Starter Edition, GraphLab Create, Lavastorm Analytics Engine, Scikit-learn, ELKI, KNIME Analytics Platform Community, Apache UIMA, LIBLINEAR, CMSR Data Miner, Rattle GUI, TANAGRA, Fityk, ROSETTA, DataPreparator, Alteryx Project Edition, Pandas, OpenNN, mlpy, KEEL, Dataiku DSS Community, Vowpal Wabbit, CLUTO, MiningMart, Dlib, TraMineR, Databionic ESOM, MALLET, streamDM, Jubatus, ADaM, Chemicalize.org, Modular toolkit for Data Processing, ML-Flex, Sentic API, ADaMSoft, and LIBSVM are some of the top free data mining software.

 

Sisense

Sisense empower the most non-technical user with the ability to access data and build interactive dashboards and business intelligence reports. Sisense provides a variety of dashboard widgets to pinpoint the best visualization for your data, such as: geographical maps, gauges to measure KPIs, line charts to determine trends, scatter plots to see correlations, and pie charts for clear comparisons.Sisense enables to customize dashboard layout with drag-and-drop features to place each widget exactly where you want for optimal representation.

Sisense Demo

 

Top Free Data Mining Software

Orange Data mining, R Software Environment, Weka Data Mining, SpagoBI Business Intelligence, Anaconda, Shogun, DataMelt, Natural Language Toolkit, Apache Mahout, GNU Octave, RapidMiner Starter Edition, GraphLab Create, Lavastorm Analytics Engine, Scikit-learn, ELKI, KNIME Analytics Platform Community, Apache UIMA, LIBLINEAR, CMSR Data Miner, Rattle GUI, TANAGRA, Fityk, ROSETTA, DataPreparator, Alteryx Project Edition, Pandas, OpenNN, mlpy, KEEL, Dataiku DSS Community, Vowpal Wabbit, CLUTO, MiningMart, Dlib, TraMineR, Databionic ESOM, MALLET, streamDM, Jubatus, ADaM, Chemicalize.org, Modular toolkit for Data Processing, ML-Flex, Sentic API, ADaMSoft, and LIBSVM are some of the top free data mining software.
Free Data Mining Software
PAT Index™
 
Orange-Survey plot
 
R
 
Weka Data Visualiser
 
 
 
 
 
 
 
 
 
ELKI
 
 
 
 
 
 
 
 
 
 
SpagoBI Business Intelligence
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
GraphLab
 
Lavastorm Analytics Engine
 

 

1

Orange Data mining

Orange is an open source data visualization and analysis tool. Orange is developed at the Bioinformatics Laboratory at the Faculty of Computer and Information Science, University of Ljubljana, Slovenia, along with open source community. Data mining is done through visual programming or Python scripting. The tool has components for machine learning, add-ons for bioinformatics and text mining and it is packed with features for data analytics. Orange is a Python library. Python scripts can run in a terminal window, integrated environments like PyCharm and PythonWin, or shells like iPython. Orange consists of a canvas interface onto which the user places…

Bottom Line

Orange is an open source data visualization and analysis tool, where data mining is done through visual programming or Python scripting. The tool has components for machine learning, add-ons for bioinformatics and text mining and it is packed with features for data analytics.

9.5
Editor Rating
19.7
Aggregated User Rating
You have rated this

Orange Data mining

Orange-Survey plot

2

R Software Environment

R

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Some of the functionalities include an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either directly at the computer or on hardcopy, and well developed, simple and effective programming language which includes conditionals,…

Bottom Line

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

9.1
Editor Rating
8.6
Aggregated User Rating
You have rated this

R Software Environment

R

3

Weka Data Mining

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka features include machine learning, data mining, preprocessing, classification, regression, clustering, association rules, attribute selection, experiments, workflow and visualization. Weka is written in Java, developed at the University of Waikato, New Zealand. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes Weka provides access to SQL databases…

Bottom Line

Weka is a collection of machine learning algorithms for data mining tasks. Weka features include machine learning, data mining, preprocessing, classification, regression, clustering, association rules, attribute selection, experiments, workflow and visualization. Weka is written in Java, developed at the University of Waikato, New Zealand.

9.1
Editor Rating
Aggregated User Rating
You have rated this

Weka Data Mining

Weka Data Visualiser

4

SpagoBI Business Intelligence

SpagoBI Business Intelligence : SpagoBI is an Open Source Business Intelligence suite, which offers a large range of analytical functions, a functional semantic layer and a set of advanced data visualization features including geospatial analytics. The modules of SpagoBI suite are SpagoBI Server, SpagoBI Studio, SpagoBI Meta and SpagoBI SDK.SpagoBI Server is the main module of the suite, offering the core and analytical functionalities. It provides two conceptual models which are called Analytical Model and Behavioural Model, administration tools and cross-platform services. SpagoBI Business Intelligence SpagoBI Studio allows the developer to design and modify analytical documents such as reports, charts,…

Bottom Line

The modules of SpagoBI suite are SpagoBI Server, SpagoBI Studio, SpagoBI Meta and SpagoBI SDK.SpagoBI Server is the main module of the suite, offering the core and analytical functionalities. It provides two conceptual models which are called Analytical Model and Behavioural Model, administration tools and cross-platform services.

8.1
Editor Rating
72.9
Aggregated User Rating
You have rated this

SpagoBI Business Intelligence

SpagoBI Business Intelligence

5

Anaconda

Anaconda is an open data science platform powered by Python. The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science. There is also access to over 720 packages that can easily be installed with conda, the package, dependency and environment manager, that is included in Anaconda.Includes the most popular Python, R & Scala packages for stats, data mining, machine learning, deep learning, simulation & optimization, geospatial, text & NLP, graph & network, image analysis. Featured packages include: NumPy, SciPy,…

Bottom Line

Anaconda Distribution gives superpowers to people that change the world with high performance, cross-platform Python and R that includes the best innovative data science from open source.

7.7
Editor Rating
Aggregated User Rating
You have rated this

Anaconda

6

Shogun

Shogun is a free, open source toolbox written in C++. It offers numerous algorithms and data structures for machine learning problems. The focus of Shogun is on kernel machines such as support vector machines for regression and classification problems. Shogun also offers a full implementation of Hidden Markov models.The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms. It now offers features that span the whole space of Machine Learning methods, including many classical methods in classification, regression, dimensionality…

Bottom Line

Shogun also offers a full implementation of Hidden Markov models.The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms.

7.6
Editor Rating
Aggregated User Rating
You have rated this

Shogun

7

DataMelt

DataMelt, or DMelt, is a software for numeric computation, statistics, analysis of large data volumes ("big data") and scientific visualization. The program can be used in many areas, such as natural sciences, engineering, modeling and analysis of financial markets. DMelt is a computational platform. It can be used with different programming languages on different operating systems. Unlike other statistical programs, it is not limited by a single programming language. DMelt can be used with several scripting languages, such as Python/Jython, BeanShell, Groovy, Ruby, as well as with Java. Most comprehensive software. It includes more than 30,000 Java classes for computation…

Bottom Line

DataMelt, or DMelt, is a software for numeric computation, statistics, analysis of large data volumes ("big data") and scientific visualization. The program can be used in many areas, such as natural sciences, engineering, modeling and analysis of financial markets.

7.5
Editor Rating
8.2
Aggregated User Rating
You have rated this

DataMelt

8

Natural Language Toolkit

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum. Thanks to a hands-on guide introducing programming fundamentals alongside topics in computational linguistics, plus comprehensive API documentation, NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK…

Bottom Line

NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open source, community-driven project.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

Natural Language Toolkit

9

Apache Mahout

The Apache Mahout project’s goal is to build an environment for quickly creating scalable performant machine learning applications. Apache Mahout is a simple and extensible programming environment and framework for building scalable algorithms and contains a wide variety of premade algorithms for Scala and Apache Spark, H2O, Apache Flink. It also used Samsara which is a vector math experimentation environment with R-like syntax which works at scale. Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the MapReduce paradigm. While Mahout's core algorithms for clustering, classification and batch based collaborative filtering are…

Bottom Line

Apache Mahout introduces a new math environment called Samsara, for its theme of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized.

7.5
Editor Rating
8.2
Aggregated User Rating
You have rated this

Apache Mahout

10

GNU Octave

GNU Octave represents a high level language intended for numerical computations. Because of its command line interface, users can solve linear and nonlinear problems numerically and perform other numerical experiments through a language that is mostly compatible with Matlab. This software has features such as powerful mathematics-oriented syntax with built-in plotting and visualization tools, it is free software which runs on GNU/Linux, macOS, BSD, and Windows, compatible with many Matlab scripts. A syntax which is largely compatible with Matlab is the Octave syntax. It can be run in several ways - in GUI mode, as a console, or invoked as…

Bottom Line

Executable versions of GNU Octave for GNU/Linux systems are provided by the individual distributions. Distributions known to package Octave include Debian, Ubuntu, Fedora, Gentoo, and openSUSE.

7.5
Editor Rating
8.2
Aggregated User Rating
You have rated this

GNU Octave

11

RapidMiner Starter Edition

RapidMiner Studio provides a wealth of functionality to speed & optimize data exploration, blending & cleansing tasks – reducing the time spent importing and wrangling your data. RapidMiner provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization. Hundreds of machine learning, text analytics, predictive modeling algorithims, automation, and process control features help you build better…

Bottom Line

RapidMiner Studio ( Data Rows- 10,000) , RapidMiner Server (2 GB RAM) and RapidMiner Radoop (Limited to Single User) are available in starter edition with limitations.

7.5
Editor Rating
8.5
Aggregated User Rating
You have rated this

RapidMiner Starter Edition

12

GraphLab Create

GraphLab Create is a machine learning platform to build intelligent, predictive application involving cleaning the data, developing features, training a model, and creating and maintaining a predictive service. These intelligent applications provide predictions for use cases including recommenders, sentiment analysis, fraud detection, churn prediction and ad targeting. Trained models can be deployed on Amazon Elastic Compute Cloud (EC2) and monitored through Amazon CloudWatch. They can be queried in real-time via a RESTful API and the entire deployment pipeline is seen through a visual dashboard. The time from prototyping to production is dramatically reduced for GraphLab Create users. Dato is also…

Bottom Line

GraphLab Create is a machine learning platform to build intelligent, predictive application involving cleaning the data, developing features, training a model, and creating and maintaining a predictive service.

7.6
Editor Rating
Aggregated User Rating
You have rated this

GraphLab Create

GraphLab

13

Lavastorm Analytics Engine

Lavastorm Analytics Engine : Lavastorm is a visual data discovery solution that allows to rapidly integrate diverse data, easily discover elusive insights, and continuously detect anomalies, outliers, or patterns. Lavastorm Analytics Engine provides self-service capability for business users and rapid development capabilities for IT users in the areas of integration, analytics, and business control. Features include acquire, transform, combine, and enrich data from virtually any source, including Big Data sources without intensive modeling, pre-planning, or scripting. The solution discover data issues, such as completeness, inconsistent formats, accuracy, automate the evaluation and cleansing process. Lavastorm Analytics Engine use the visual analytic…

Bottom Line

Lavastorm Analytics Engine provides self-service capability for business users and rapid development capabilities for IT users in the areas of integration, analytics, and business control.

7.5
Editor Rating
8.2
Aggregated User Rating
You have rated this

Lavastorm Analytics Engine

Lavastorm Analytics Engine

14

Scikit-learn

Scikit-learn is an open source machine learning library for the Python programming language.It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Classification : Identifying to which category an object belongs to Applications: Spam detection, Image recognition. Algorithms: SVM, nearest neighbors, random forest. Regression : Predicting a continuous-valued attribute associated with an object. Applications: Drug response, Stock prices. Algorithms: SVR, ridge regression. Clustering :Automatic grouping of similar objects into sets. Applications: Customer segmentation, Grouping experiment outcomes.…

Bottom Line

Scikit-learn features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

7.6
Editor Rating
Aggregated User Rating
You have rated this

Scikit-learn

15

ELKI

The ELKI framework is written in Java and built around a modular architecture. Most currently included algorithms belong to clustering, outlier detection and database indexes. A key concept of ELKI is to allow the combination of arbitrary algorithms, data types, distance functions and indexes and evaluate these combinations. When developing new algorithms or index structures, the existing components can be reused and combined. ELKI is modeled around a database core, which uses a vertical data layout that stores data in column groups (similar to column families in NoSQL databases). This database core provides nearest neighbor search, range/radius search, and distance…

Bottom Line

ELKI is modeled around a database core, which uses a vertical data layout that stores data in column groups (similar to column families in NoSQL databases).

7.5
Editor Rating
8.3
Aggregated User Rating
You have rated this

ELKI

ELKI

16

KNIME Analytics Platform Community

KNIME Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. A vast arsenal of native nodes, community contributions, and tool integrations makes KNIME Analytics Platform the perfect toolbox for any data scientist. https://www.youtube.com/watch?v=fw0Vb2gLsgA KNIME

Bottom Line

A vast arsenal of native nodes, community contributions, and tool integrations makes KNIME Analytics Platform the perfect toolbox for any data scientist.

7.7
Editor Rating
8.3
Aggregated User Rating
You have rated this

KNIME Analytics Platform Community

17

Apache UIMA

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow…

Bottom Line

UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

Apache UIMA

18

LIBLINEAR

LIBLINEAR is an open source library that is used by data scientists, developers and end users to perform large scale linear classification. The easy to use command tools and library calls enables LIBLINEAR to be used by data scientists and developers to perform logistics, regression and linear support for vector machine. With LIBLINEAR developers and data scientists are able to same data format as the one in LIBSVM found in LINLINEAR general purpose SVM solver which also has similar usage. LINLINEAR presents several machine language interfaces that can be used by data scientists and developers. The machine language interfaces presented…

Bottom Line

LIBLINEAR is an open source library that comes with easy to use command tools and library calls that enable developers, end users, and data scientists perform large scale linear classification.

7.6
Editor Rating
Aggregated User Rating
You have rated this

LIBLINEAR

19

CMSR Data Miner

StarProbe Data Miner or CMSR Data Miner Suite is software which provides an integrated environment for predictive modeling, segmentation, data visualization, statistical data analysis, and rule-based model evaluation. For advanced power users integrated analytics and rule-engine environment is also provided. This software has many features such as: deep learning modeling RME-EP which represents very powerful expert system shell rule engine, supporting predictive modeling such as neural network, self organizing maps, decision tree, regression etc. It has been developed to use SQL-like expressions which users can learn very easily and quickly. Also, RME-EP expert system rules can be written by non-IT…

Bottom Line

StarProbe Data Miner or CMSR Data Miner Suite is software which provides an integrated environment for predictive modeling, segmentation, data visualization, statistical data analysis, and rule-based model evaluation.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

CMSR Data Miner

20

Rattle GUI

Rattle is Free Open Source Software and the source code is available from the Bitbucket repository. Rattle gives the user the freedom to review the code, use it for whatever purpose the user likes, and to extend it however they like, without restriction. Rattle is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets. One of the most important features is that all of the user’s interactions…

Bottom Line

Rattle - the R Analytical Tool To Learn Easily - is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

7.6
Editor Rating
8.1
Aggregated User Rating
You have rated this

Rattle GUI

21

TANAGRA

Tanagra represents free data mining software for academic and research purposes. It provides several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. It is a successor of SIPINA which means that various supervised learning algorithms are provided, especially an interactive and visual construction of decision trees. Because it contains supervised learning but also other paradigms such as clustering, factorial analysis, parametric and nonparametric statistics, association rule, feature selection and construction algorithms, Tanagra is very powerful. The main goal of this project is giving researchers and student’s easy-to-use data mining software and second goal is…

Bottom Line

TANAGRA is an "open source project" as every researcher can access to the source code, and add his own algorithms, as far as he agrees and conforms to the software distribution license.The main purpose of Tanagra project is to give researchers and students an easy-to-use data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyse either real or synthetic data.

7.5
Editor Rating
8.1
Aggregated User Rating
You have rated this

TANAGRA

22

Fityk

Fityk is a program for data processing and nonlinear curve fitting. It is primarily used by scientists who analyse data from powder diffraction, chromatography, photoluminescence and photoelectron spectroscopy, infrared and Raman spectroscopy, and other experimental techniques and also used to fit peaks – bell-shaped functions (Gaussian, Lorentzian, Voigt, Pearson VII, bifurcated Gaussian, EMG, Doniach-Sunjic, etc.), but it is suitable for fitting any curve to 2D (x,y) data. Fityk has the following features for users; intuitive graphical interface (and also command line interface), support for many data file formats, thanks to the xylib library, dozens of built-in functions and support for…

Bottom Line

Fityk is used by scientists who analyse data from powder diffraction, chromatography, photoluminescence and photoelectron spectroscopy, infrared and Raman spectroscopy, and other experimental techniques.

7.6
Editor Rating
8.1
Aggregated User Rating
You have rated this

Fityk

23

ROSETTA

ROSETTA is a toolkit for analyzing tabular data within the framework of rough set theory. It is designed for supporting the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to validation and analysis of the induced rules or patterns. This toolkit is not specifically towards any particular application domain, it is intended as a general-purpose tool for discernibility-based modeling. Highly intuitive GUI environment is offered and in this environment data-navigational abilities are emphasized. The main orientation of GUI is…

Bottom Line

ROSETTA is designed to support the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to validation and analysis of the induced rules or patterns.

7.6
Editor Rating
8.1
Aggregated User Rating
You have rated this

ROSETTA

24

DataPreparator

DataPreparator is a free software tool which is designed to assist with common tasks of data preparation (or data preprocessing) in data analysis and data mining. DataPreparator offers features such as character removal, text replacement, date conversion, remove selected attributes, move selected attributes, equal width, equal frequency, equal frequency from grouped data, delete records containing missing values, remove attributes containing missing values, impute missing values, predict missing values from model (dependence tree, Naive Bayes model), include missing value patterns, Z-score method, Box-plot method, create binary attributes, replace nominal values by indices, reduce number of labels, decimal, linear, hyperbolic tangent, soft-max,…

Bottom Line

DataPreparator includes operators for cleaning, discretization, numeration, scaling, attribute selection, missing values, outliers, statistics, visualization, balancing, sampling, row selection, and several other tasks.

7.6
Editor Rating
Aggregated User Rating
You have rated this

DataPreparator

25

Alteryx Project Edition

Alteryx Analytics provides analysts with the unique ability to easily prep, blend, and analyze all of their data using a repeatable workflow, then deploy and share analytics at scale for deeper insights in hours, not weeks. Analysts love the Alteryx Analytics platform because they can connect to and cleanse data from data warehouses, cloud applications, spreadsheets, and other sources, easily join this data together, then perform analytics –predictive, statistical, and spatial – using the same intuitive user interface, without writing any code. Save time by automating common data prep, blending and analysis; Easily combine multiple data sources in a repeatable…

Bottom Line

With Alteryx Designer, users can: Prepare, blend and analyze all data using a repeatable workflow; Run predictive, spatial and statistical analytics in one intuitive interface; Combine data from sources like Hadoop, Oracle, Salesforce, Excel and more; Dig into multiple data sources for deeper insights in hours, not weeks; Maximize Forecasting; Take advantage of simplified Time Series analysis to improve the accuracy of user forecasting capabilities.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

Alteryx Project Edition

26

Pandas

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas is a NUMFocus sponsored project. This will help ensure the success of development of pandas as a world-class open-source project, and makes it possible to donate to the project. Best way to get pandas is to install via conda Builds for osx-64,linux-64,linux-32,win-64,win-32 for Python 2.7, Python 3.4, and Python 3.5 are all available. This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large…

Bottom Line

Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

Pandas

27

OpenNN

OpenNN is an open source class library written in C++ programming language which implements neural networks, a main area of machine learning research. The library implements any number of layers of non-linear processing units for supervised learning. This deep architecture allows the design of neural networks with universal approximation properties. The main advantage of OpenNN is its high performance. It is developed in C++ for better memory management and higher processing speed, and implements CPU parallelization by means of OpenMP and GPU acceleration with CUDA. OpenNN has been written in ANSI C++. This means that the library can be built…

Bottom Line

The library implements any number of layers of non-linear processing units for supervised learning. This deep architecture allows the design of neural networks with universal approximation properties.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

OpenNN

28

mlpy

Mlpy know as Machine Learning Python represents a python method for machine learning built on top of NumPy/SciPy (Python-based ecosystem of open-source software for mathematics, science, and engineering) and the GNU Scientific Libraries (represents numerical library for C and C++ programmers where a wide range of mathematical routines such as random number generators, special functions and least-squares fitting are provided). Wide range of state-of-the-art machine learning methods are provided for supervised and unsupervised problems and mlpy is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. It provides high-level functions and classes allowing, with few lines…

Bottom Line

mlpy is multiplatform, it works with Python 2 and 3 and it is Open Source, distributed under the GNU General Public License version 3.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

mlpy

29

KEEL

KEEL (Knowledge Extraction based on Evolutionary Learning) is an open source (GPLv3) Java software tool that can be used for a large number of different knowledge data discovery tasks. KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behavior of the algorithms. It contains a wide variety of classical knowledge extraction algorithms, preprocessing techniques (training set selection, feature selection, discretization, imputation methods for missing values, among others), computational intelligence based learning algorithms, hybrid models, statistical methodologies for contrasting experiments…

Bottom Line

KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behavior of the algorithms.

7.6
Editor Rating
8.1
Aggregated User Rating
You have rated this

KEEL

30

Dataiku DSS Community

Dataiku DSS is the collaborative data science software platform for teams of data scientists, data analysts, and engineers to explore, prototype, build, and deliver their own data products more efficiently. Dataiku develops the unique advanced analytics software solution that enables companies to build and deliver their own data products more efficiently. Dataiku DSS is a collaborative and team-based user interface for data scientists and beginner analysts, to a unified framework for both development and deployment of data projects, and to immediate access to all the features and tools required to design data products from scratch. The visual interface of Dataiku…

Bottom Line

The visual interface of Dataiku DSS empowers people with a less technical background to learn the data mining process, and build projects from raw data to predictive application, without having to write a single line of code.

7.5
Editor Rating
Aggregated User Rating
You have rated this

Dataiku DSS Community

31

Vowpal Wabbit

The Vowpal Wabbit (VW) project is a fast out-of-core learning system sponsored by Microsoft Research and (previously) Yahoo! Research. Support is available through the mailing list. There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function (several are available),…

Bottom Line

There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation

7.5
Editor Rating
8.2
Aggregated User Rating
You have rated this

Vowpal Wabbit

32

CLUTO

Cluto is software package intended for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters. It is well-suited for clustering data sets, arisen in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology. CLUTO's distribution consists of both stand-alone programs and a library via which an application program can access directly the various clustering and analysis algorithms implemented in CLUTO. This software has several features such as multiple classes of clustering algorithms – partitional, agglomerative, & graph-partitioning based; multiple similarity/distance functions – Euclidean distance, cosine, correlation coefficient, extended Jaccard,…

Bottom Line

CLUTO is well-suited for clustering data sets arising in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology.CLUTO's distribution consists of both stand-alone programs and a library via which an application program can access directly the various clustering and analysis algorithms implemented in CLUTO.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

CLUTO

33

MiningMart

MiningMart can help to reduce this time. The MiningMart project aims at new techniques that give decision-makers direct access to information stored in databases, data warehouses, and knowledge bases. The main goal is to support users in making intelligent choices by offering following objectives: Operators for preprocessing with direct database access; Use of machine learning for the preprocessing; Detailed documentation of successful cases; High quality discovery results; Scalability to very large databases and Techniques that automatically select or change representations. MiningMart’s basic idea is to store best practice cases of preprocessing chains that where developed by experienced users. The data…

Bottom Line

MiningMart users choose a case and apply the corresponding transformation and learning chain to their application.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

MiningMart

34

Dlib

Dlib is a modern C++ toolkit which contains machine learning algorithms and tools in order of creating complex software in C++ for solving real world problems. It is used in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. It is free of any charges which mean that users can use it in any app. Major features of Dlib is: documentation – it provides complete and precise documentation for every class and function, lots of example programs are provided; high quality portable code – good unit test coverage, tested on MS Windows,…

Bottom Line

It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

Dlib

35

TraMineR

TraMineR represents R-package (free software environment for statistical computing and graphics which compiles and runs on a wide variety of platforms such as UNIX platforms, Windows and MacOS) intended for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. Analysis of biographical longitudinal, data such as data describing careers or family trajectories, in the social sciences is its primary goal. This platform has many features that can apply in many other kinds of categorical sequence data. These features include: handling of longitudinal data and conversion between various sequence formats; plotting sequences (density plot, frequency…

Bottom Line

Its primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories. However, most of its features also apply to many other kinds of categorical sequence data.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

TraMineR

36

Databionic ESOM

The Databionics ESOM Tools offer many data mining tasks using emergent self-organizing maps (ESOM). Visualization, clustering, and classification of high-dimensional data using databionic principles can be performed interactively or automatically. Its features include ESOM training, U-Matrix visualizations, explorative data analysis and clustering, ESOM classification, and creation of U-Maps. The Databionic ESOM Tools is a suite of programs to perform data mining tasks like clustering, visualization, and classification with Emergent Self-Organizing Maps (ESOM). Features include training of ESOM with different initialization methods, training algorithms, distance functions, parameter cooling strategies, ESOM grid topologies, and neighborhood kernels. The Databionics ESOM Tools also contain…

Bottom Line

Training of ESOM with different initialization methods, training algorithms, distance functions, parameter cooling strategies, ESOM grid topologies, and neighborhood kernels. Visualization of high dimensional dataspace with U-Matrix, P-Matrix, Component Planes, SDH, and more.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

Databionic ESOM

37

MALLET

MALLET known as Machine Learning for LanguagE Toolkit is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. Sophisticated tools for document classification are provided - efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. It also provides tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields and all…

Bottom Line

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

MALLET

38

streamDM

streamDM is an open source software for mining big data streams that uses Spark Streaming, developed at Huawei Noah's Ark Lab. This software is licensed under Apache Software License v2.0. Today, Big Data Stream learning is more challenging because data may not keep the same distribution over the lifetime of the stream. Learning algorithms needs to be very efficient because each example that comes in a stream can be processed once or these examples needs to be summarized with a small memory footprint. Spark Streaming, which makes building scalable fault – tolerant streaming applications easy, is an extension of the…

Bottom Line

Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD).

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

streamDM

39

Jubatus

Jubatus supports basic tasks including classification, regression, clustering, nearest neighbor, outlier detection, and recommendation. Jubatus is the first open source platform for online distributed machine learning on the data streams of Big Data. Jubatus uses a loose model sharing architecture for efficient training and sharing of machine learning models, by defining three fundamental operations; Update, Mix, and Analyze, in a similar way with the Map and Reduce operations in Hadoop. In addition, Jubatus supports scalable machine learning processing. It can handle 100000 or more data per second using commodity hardware clusters. It is designed for clusters of commodity, shared-nothing hardware.…

Bottom Line

Jubatus uses a loose model sharing architecture for efficient training and sharing of machine learning models, by defining three fundamental operations; Update, Mix, and Analyze, in a similar way with the Map and Reduce operations in Hadoop.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

Jubatus

40

ADaM

The Algorithm Development and Mining System (ADaM) developed by the Information Technology and Systems Center at the University of Alabama in Huntsville is used to apply data mining technologies to remotely-sensed and other scientific data. The mining and image processing toolkits consist of interoperable components that can be linked together in a variety of ways for application to diverse problem domains. ADaM has over 100 components that can be configured to create customized mining processes. Preprocessing and analysis utilities aid users in applying data mining to their specific problems. New components can easily be added to adapt the system to…

Bottom Line

ADaM's component architecture is designed to take advantage of emerging computational environments such as the Web and information Grids.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

ADaM

41

Chemicalize.org

Chemicalize provides instant cheminformatics solution. It is a powerful online platform for chemical calculations, search, and text processing. Calculation view provides structure-based predictions for any molecule structure. Available calculations include elemental analysis, names and identifiers, pKa, logP/logD, as well as solubility. Search view lets you perform text-based and structure-based searches against the Chemicalize database to find web page sources and associated structures of the results. You can even combine text-based and structural queries to achieve advanced search capabilities. Web viewer displays any web page with chemical structures highlighted on it. Recognized formats are IUPAC names, common names, InChI, and SMILES…

Bottom Line

Search view lets you perform text-based and structure-based searches against the Chemicalize database to find web page sources and associated structures of the results.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

Chemicalize.org

42

Modular toolkit for Data Processing

The Modular toolkit for Data Processing (MDP) is a library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software. From the user’s perspective, MDP consists of a collection of supervised and unsupervised learning algorithms, and other data processing units (nodes) that can be combined into data processing sequences (flows) and more complex feed-forward network architectures. Given a set of input data, MDP takes care of successively training or executing all nodes in the network. This allows the user to specify complex algorithms as a series of simpler…

Bottom Line

MDP consists of a collection of supervised and unsupervised learning algorithms, and other data processing units (nodes) that can be combined into data processing sequences (flows) and more complex feed-forward network architectures.

7.6
Editor Rating
8.3
Aggregated User Rating
You have rated this

Modular toolkit for Data Processing

43

ML-Flex

ML-Flex uses machine-learning algorithms to derive models from independent variables, with the purpose of predicting the values of a dependent (class) variable. For example, machine-learning algorithms have long been applied to the Iris data set, introduced by Sir Ronald Fisher in 1936, which contains four independent variables (sepal length, sepal width, petal length, petal width) and one dependent variable (species of Iris flowers = setosa, versicolor, or virginica). Deriving prediction models from the four independent variables, machine-learning algorithms can often differentiate between the species with near-perfect accuracy. One important aspect to consider in performing a machine-learning experiment is the validation…

Bottom Line

Machine-learning algorithms have been developed in a wide variety of programming languages and offer many incompatible ways of interfacing to them. ML-Flex makes it possible to interface with any algorithm that provides a command-line interface.

7.6
Editor Rating
8.2
Aggregated User Rating
You have rated this

ML-Flex

44

Sentic API

Sentic API provides the semantics and sentics such as the denotative and connotative information associated with the concepts of SenticNet 4, a semantic network of commonsense knowledge that contains 50,000 nodes in words and multiword expressions and thousands of connections in relationships between nodes. Sentic API is available in 40 different languages and lets users selectively access the latest version of the knowledge base online. Since polarity detection is the most common sentiment analysis task, Sentic API provides two fine-grained commands for it. For polarity detection intended as a binary classification problem (positive vs. negative), polarity can be obtained with…

Bottom Line

Sentic API provides the denotative and connotative information associated with the concepts of SenticNet 4 in 40 languages.

7.6
Editor Rating
Aggregated User Rating
You have rated this

Sentic API

45

ADaMSoft

ADaMSoft is a free and open-source system for data management, data and web mining, statistical analysis. ADaMSoft offers procedures such as Principal component analysis, Text mining, Web Mining, Analysis of three ways time arrays, Linear regression with fuzzy dependent variable, Utility, Synthesis table, Import a data table (file) in ADaMSoft (create a dictionary), Charts, Neural network (MLP), Association measures for qualitative variables, Linear algebra, Evaluate the results of function approximation, Data Management, Function fitting, Error localization and data imputation, Decision trees, Statistics on quantitative variables, Record linkage, Evaluate the result of classification models, Cluster analysis (k-means method), Correspondence analysis, Data…

Bottom Line

ADaMSoft stands for: Data Analysis and Statistical Modeling software (in italian: Analisi Dati e Modelli Statistici) which performs Principal component analysis, Text mining, Web Mining, Analysis of three ways time arrays, Linear regression with fuzzy dependent variable, Utility, Synthesis table, Import a data table (file) in ADaMSoft (create a dictionary), Charts and Neural network (MLP).

7.5
Editor Rating
Aggregated User Rating
You have rated this

ADaMSoft

46

LIBSVM

LIBSVM is a library for Support Vector Machines (SVMs). LIBSVM offers tools such as Multi-core LIBLINEAR, Distributed LIBLINEAR, LIBLINEAR for Incremental and Decremental Learning, LIBLINEAR for One-versus-one Multi-class Classification, Large-scale rankSVM, LIBLINEAR for more than 2^32 instances/features (experimental), Large linear classification when data cannot fit in memory, Weights for data instances, Fast training/testing for polynomial mappings of data, Cross Validation with Different Criteria (AUC, F-score), Cross Validation using Higher-level Information to Split Data, LIBSVM for dense data, LIBSVM for string data, Multi-label classification, LIBSVM Extensions at Caltech, Feature selection tool, LIBSVM data sets, SVM-toy based on Javascript, SVM-toy in 3D,…

Bottom Line

LIBSVM involves training a data set to obtain a model, using the model to predict information of a testing data set and can also output probability estimates for SVC and SVR.

7.6
Editor Rating
Aggregated User Rating
You have rated this

LIBSVM

47.SenticNet API

SenticNet API is a semantic and affective resource for opinion mining and sentiment analysis.

SenticNet API

 

48.Lattice Miner

Lattice Miner is a formal concept analysis software tool for the construction, visualization and manipulation of concept lattices. Lattice Miner allows also the drawing of nested line diagrams.

Lattice Miner

49.Gnome datamine tools

Gnome datamine tools is a growing collection of tools packaged to provide a freely available single collection of data mining tools.

Gnome datamine tools

50.yooreeka

yooreeka is a library for data mining, machine learning, soft computing, and mathematical analysis. The algorithms covered are Clustering :Hierarchical—Agglomerative and Divisive Partitional , Classification :Bayesian, Decision trees, Neural Networks, Rule based, Recommendation, Collaborative filtering : Content based, Search, PageRank, DocRank and Personalization.

yooreeka

51.AstroML

AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets.

AstroML

52.jHepWork

jHepWork is an environment for scientific computation, data analysis and data visualization. It is fully multiplatform, a 100% Java and integrated with the Jython (Python) scripting language.

jHepWork

jHepWork

53.ARMiner

ARMiner is a client-server data mining application specialized in finding association rule. ARMiner has been developped at UMass/Boston as a Software Engineering project.

ARMiner

54.arules

arules provides the infrastructure for representing, manipulating and analyzing transaction data and patterns in frequent itemsets and association rules.

arules

You may also like to review the top free data analysis freeware software list :
Top Free Data Analysis Software

You may also like to review the  top proprietary data mining software list:
Top Data Mining Software

Top Free Data Mining Software at a Glance

Free Data Mining Software : ALL
PAT Index™
 
1
Orange Data mining
 
2
R Software Environment
 
3
Weka Data Mining
 
4
Anaconda
 
5
Shogun
 
6
DataMelt
 
7
Natural Language Toolkit
 
8
Apache Mahout
 
9
RapidMiner Starter Edition
 
10
GNU Octave
 
11
Scikit-learn
 
12
ELKI
 
13
Apache UIMA
 
14
LIBLINEAR
 
15
CMSR Data Miner
 
16
Rattle GUI
 
17
TANAGRA
 
18
Fityk
 
19
DataPreparator
 
20
ROSETTA
 
21
Dataiku DSS Community
 
22
Knowage
 
23
Alteryx Project Edition
 
24
Pandas
 
25
OpenNN
 
26
KEEL
 
27
Vowpal Wabbit
 
28
mlpy
 
29
CLUTO
 
30
Dlib
 
31
MiningMart
 
32
TraMineR
 
33
Databionic ESOM
 
34
Chemicalize.org
 
35
streamDM
 
36
Jubatus
 
37
ADaM
 
38
Sentic API
 
39
Modular toolkit for Data Processing
 
40
ADaMSoft
 
41
ML-Flex
 
42
MALLET
 
43
LIBSVM
 
44
GraphLab Create
 
45
Lavastorm Analytics Engine
 
46
KNIME Analytics Platform Community
Heat Index
 
 
 
 
 
The Latest
 
Read More
69
SpagoBI Business Intelligence
Editor's Picks
 
SpagoBI Business Intelligence
 
 
 
Go To DataMining Software Free
6 Reviews
  • Mike
    March 17, 2014 at 9:23 am

    ADDITIONAL INFORMATION
    Hello bud, on your data mining softwares witch 1 would u recommend for email mining? Thank you

  • Phoenix
    April 1, 2014 at 11:50 pm

    ADDITIONAL INFORMATION
    Do any of these have non-English capabilities?

  • Venkatesh
    July 29, 2014 at 12:52 am

    ADDITIONAL INFORMATION
    Hi buddy! Are there any attempts to do cloud based data analytics softwares? I think such a thing can solve the problem Phoenix had mentioned.

  • K R Chin
    January 25, 2015 at 6:14 pm

    ADDITIONAL INFORMATION
    I’d like to know if there are any data mining programs which could be used to predict terrorist activities or analyze material movements (shipping, purchases, and orders) to search for indicators of suspicious activity.

    I’m a security consultant and advisor, this sort of information would be useful in my consultations.

  • Mahrez
    March 5, 2015 at 4:00 pm

    ADDITIONAL INFORMATION
    Hi KR Chin,

    To predict any activity you need to know which variables you want to base your prediction on. You also need a historical data to run your predictive analysis and find the possible correlations between different event. I know that somewhere in the US the police uses crime predictions based on historical criminality data (new Orleans if I am not mistaken)…bottom line : you need data to get the info ! have fun 🙂

  • February 17, 2017 at 11:50 am

    ADDITIONAL INFORMATION
    See AdvancedMiner by Algolytics. They provide free/community version http://algolytics.com/products/advancedminer/

What's your reaction?
Love It
27%
Very Good
45%
INTERESTED
13%
COOL
3%
NOT BAD
5%
WHAT !
6%
HATE IT
2%
About The Author
imanuel