Predictive Analytics
Now Reading
43 Top Free Data Mining Software
6

43 Top Free Data Mining Software

43 Top Free Data Mining Software
4.5 (89.42%) 206 ratings

Data Mining is the computational process of discovering patterns in large data sets involving methods using the artificial intelligence, machine learning, statistical analysis, and database systems with the goal to extract information from a data set and transform it into an understandable structure for further use.

In today business market, the level of engagement between customers and companies, services or even product has changed. The companies have made their presence online prominent by becoming easily accessible through social platforms such as Facebook, Twitter, and WhatsApp. These platforms provide valuable data which is unstructured. That is a reason why most companies require Data Mining tools.

Data mining software allows different business to collect the information from a different platform and use the data for various purposes such as market evaluation and analysis. Data mining help the user to keep track of all the important data and make use of the data to improve the business. In addition, the software has become important in making informed decisions in a business setting.

Data mining software help explore the unknown patterns that are significant to the success of the business. The actual data mining task is an automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as cluster analysis, unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining).

Top Free Data Mining Software: Orange Data mining, Anaconda, R Software Environment, Scikit-learn, Weka Data Mining, Shogun, DataMelt, Natural Language Toolkit, Apache Mahout, GNU Octave, GraphLab Create, ELKI, Apache UIMA, KNIME Analytics Platform Community, TANAGRA, Rattle GUI, CMSR Data Miner, OpenNN, Dataiku DSS Community, DataPreparator, LIBLINEAR, Chemicalize.org, Vowpal Wabbit, mlpy, Dlib, CLUTO, TraMineR, ROSETTA, Pandas, Fityk, KEEL, ADaMSoft, Sentic API, ML-Flex, Databionic ESOM, MALLET, streamDM, ADaM, MiningMart, Modular toolkit for Data Processing, Jubatus, LIBSVM, Arcadia Data Instant are some of the top free data mining software.

What are Data Mining Software?

Data mining is the process of identifying patterns, analyzing data and transforming unstructured data into structured and valuable information that can be used to make informed business decisions. Data Mining Software allows the organization to analyze data from a wide range of database and detect patterns.

The Data Mining Tools main aim is to find data, extract data, refine data, distribute the information and monetize it. Data Mining is important because It extracts insights from data whether structured or unstructured. Structured data refers to data that has been organized into columns and rows for efficient modification.

Most of the organisations that handle a large amount of data use data mining approaches where machines learning algorithms are used. Data mining is a method used to extract hidden unstructured data from large volume databases. It identifies any hidden correlations, patterns and trends and indicates them.

Data mining cannot be purely be identified as statistical but as an interdisciplinary science that comprises computer science and mathematics algorithms depicted by a machine.

You may like to read: Top Data Mining Software

  • Easy to use interface: Data mining software has easy to use GUI that allow quick analysis of data.
  • Preprocessing: Data preprocessing is an important step in data mining as it is a process that involves the transformation of raw data into an understandable format. It involves data cleaning where missing values and inconsistency are resolved. Data integration and transformation are also stepping in Data Preprocessing.
  • Scalable processing: data mining software allow scalable processing. This is from a single user system to a large organization processing. In other words, the software us scalable on the number of users and the size of data to be processed.
  • High Performance: Data mining software boost performance capabilities through high-performance data mining nodes, especially in companies that deal with a large amount of data. The mining tools develop an environment that leads to a faster generation of business results.
  • Anomaly detection: The identification of unusual data records, that might be interesting or data errors that require further investigation.
  • Association rule learning: Searches for relationships between variables..
  • Clustering: The task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
  • Classification: The task of generalizing known structure to apply to new data.
  • Regression: Attempts to find a function which models the data with the least error that is, for estimating the relationships among data or datasets.
  • Data Summarization: Data mining tools should be able to compress data into an informative representation. Often, methods such as tabulation are the common techniques used to summarize large dataset. The software provides interactive data preparation tools.


Top Free Data Mining Software

Top Free Data Mining Software


You may like to read: Top Data Mining Software

Sisense

Sisense empower the most non-technical user with the ability to access data and build interactive dashboards and business intelligence reports. Sisense provides a variety of dashboard widgets to pinpoint the best visualization for your data, such as: geographical maps, gauges to measure KPIs, line charts to determine trends, scatter plots to see correlations, and pie charts for clear comparisons.Sisense enables to customize dashboard layout with drag-and-drop features to place each widget exactly where you want for optimal representation.

Try Sisense Watch Demo

Top Free Data Mining Software

Orange Data mining, Anaconda, R Software Environment, Scikit-learn, Weka Data Mining, Shogun, DataMelt, Natural Language Toolkit, Apache Mahout, GNU Octave, GraphLab Create, ELKI, Apache UIMA, KNIME Analytics Platform Community, TANAGRA, Rattle GUI, CMSR Data Miner, OpenNN, Dataiku DSS Community, DataPreparator, LIBLINEAR, Chemicalize.org, Vowpal Wabbit, mlpy, Dlib, CLUTO, TraMineR, ROSETTA, Pandas, Fityk, KEEL, ADaMSoft, Sentic API, ML-Flex, Databionic ESOM, MALLET, streamDM, ADaM, MiningMart, Modular toolkit for Data Processing, Jubatus, LIBSVM, Arcadia Data Instant are some of the top free data mining software.
Free Data Mining Software
PAT Index™
 
Orange-Survey plot
 
 
R
 
 
Weka Data Visualiser
 
 
 
 
 
 
GraphLab
 
ELKI
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Sisense for Cloud Data Teams

Sisense for Cloud Data Teams formerly Periscope Data is an end-to-end BI and analytics solution that lets you quickly connect your data, then analyze, visualize and share insights. Periscope Data can securely connect and join data from any source, creating a single source of truth for your organization. Perform BI reporting and advanced analytics operations all from one integrated platform. Communicate insights more effectively by selecting from Periscope Data’s wide range of visualization options (including standard charts, statistical plots, maps and more) and instantly share real-time insights via direct linking, email or Slack.With Periscope Data you can also incorporate Natural Language Processing into your data analysis.

Free Sisense for Cloud Data Teams

1

Orange Data mining

Orange

Orange is an open source data visualization and analysis tool. Orange is developed at the Bioinformatics Laboratory at the Faculty of Computer and Information Science, University of Ljubljana, Slovenia, along with open source community. Data mining is done through visual programming or Python scripting. The tool has components for machine learning, add-ons for bioinformatics and text mining and it is packed with features for data analytics. Orange is a Python library. Python scripts can run in a terminal window, integrated environments like PyCharm and PythonWin, or shells like iPython. Orange consists of a canvas interface onto which the user places…

Overview
Features

• Open Source
• Interactive Data Visualization
• Visual Programming
• Supports Hands-on Training and Visual Illustrations
• Add-ons Extend Functionality

Price

Free

Website
What is best?

• Open Source
• Interactive Data Visualization
• Visual Programming

What are the benefits?

•For everyone- beginners and professionals
•Execute simple and complex data analysis
•Create beautiful and interesting graphics

Bottom Line

Orange is an open source data visualization and analysis tool, where data mining is done through visual programming or Python scripting. The tool has components for machine learning, add-ons for bioinformatics and text mining and it is packed with features for data analytics.

9.5
Editor Rating
8.3
Aggregated User Rating
120 ratings
You have rated this

Orange Data mining

2

Anaconda

Anaconda

Anaconda is an open data science platform powered by Python. The open source version of Anaconda is a high performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science. There is also access to over 720 packages that can easily be installed with conda, the package, dependency and environment manager, that is included in Anaconda. Includes the most popular Python, R & Scala packages for stats, data mining, machine learning, deep learning, simulation & optimization, geospatial, text & NLP, graph & network, image analysis. Featured packages include: NumPy,…

Overview
Features

• Analytics Workflows
• Analytics Interaction
• High Performance Distribution
• Data Engineering
• Advanced Analytics
• High Performance Scale Up
• Reproducibility
• Analytics Deployment

Price

Contact for Pricing

Website
What is best?

• Analytics Workflows
• Analytics Interaction
• High Performance Distribution

What are the benefits?

• Accelerate streamline of data science workflow from ingest through deployment
• Connect all data sources to extract the most value from data
• Create, collaborate and share with the entire team

Bottom Line

Anaconda Distribution gives superpowers to people that change the world with high performance, cross-platform Python and R that includes the best innovative data science from open source.

7.7
Editor Rating
7.9
Aggregated User Rating
24 ratings
You have rated this

Anaconda

3

R Software Environment

R

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Some of the functionalities include an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either directly at the computer or on hardcopy, and well developed, simple and effective programming language which includes conditionals,…

Overview
Features

• Open Source - Free Software
• Provides a wide variety of Statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and Graphical Techniques
• Effective data handling and storage facility
• Suite of operators for calculations on arrays, in particular matrices
• Large, coherent, integrated collection of intermediate tools for data analysis
• Graphical facilities for data analysis and display either on-screen or on hardcopy
• Well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities

Price

Free

Website
What is best?

• Open Source - Free Software
• Provides a wide variety of Statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and Graphical Techniques
• Effective data handling and storage facility

What are the benefits?

• Brings analytics to your data
• Runs on a wide variety of platforms- UNIX, Windows, MacOS
• Widely used statistical software

Bottom Line

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

9.1
Editor Rating
7.4
Aggregated User Rating
21 ratings
You have rated this

R Software Environment

4

Scikit-learn

Scikit-learn

Scikit-learn is an open source machine learning library for the Python programming language.It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Classification : Identifying to which category an object belongs to Applications: Spam detection, Image recognition. Algorithms: SVM, nearest neighbors, random forest. Regression : Predicting a continuous-valued attribute associated with an object. Applications: Drug response, Stock prices. Algorithms: SVR, ridge regression. Clustering :Automatic grouping of similar objects into sets. Applications: Customer segmentation, Grouping experiment outcomes.…

Overview
Bottom Line

Scikit-learn features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

7.6
Editor Rating
7.8
Aggregated User Rating
5 ratings
You have rated this

Scikit-learn

5

Weka Data Mining

Weka

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka features include machine learning, data mining, preprocessing, classification, regression, clustering, association rules, attribute selection, experiments, workflow and visualization. Weka is written in Java, developed at the University of Waikato, New Zealand. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes Weka provides access to SQL databases…

Overview
Features

• Data Pre-Processing
• Data Classification
• Data Regression
• Data Clustering
• Data Association rules
• Data Visualization

Price

Free

Website
What is best?

• Data Pre-Processing
• Data Classification
• Data Regression

What are the benefits?

•Portable
•Free to use
•Easy to use

Bottom Line

Weka is a collection of machine learning algorithms for data mining tasks. Weka features include machine learning, data mining, preprocessing, classification, regression, clustering, association rules, attribute selection, experiments, workflow and visualization. Weka is written in Java, developed at the University of Waikato, New Zealand.

9.1
Editor Rating
6.9
Aggregated User Rating
32 ratings
You have rated this

Weka Data Mining

6

Shogun

Shogun

Shogun is a free, open source toolbox written in C++. It offers numerous algorithms and data structures for machine learning problems. The focus of Shogun is on kernel machines such as support vector machines for regression and classification problems. Shogun also offers a full implementation of Hidden Markov models. The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms. It now offers features that span the whole space of Machine Learning methods, including many classical methods in classification, regression,…

Overview
Features

• Free software, community-based development and machine learning education
• Supports many languages from C++, Python, Octave, R, Java, Lua, C#, Ruby, Etc.
• Runs natively under Linux/Unix, Macos, and Windows
• Provides efficient implementation of all standard ml algorithms
• Libsvm/Liblinear, Svmlight, Libocas, Libqp, Vowpalwabbit, Tapkee, Slep, Gpml and more

Price

Free

Website
What is best?

• Free software, community-based development and machine learning education
• Supports many languages from C++, Python, Octave, R, Java, Lua, C#, Ruby, Etc.
• Runs natively under Linux/Unix, Macos, and Windows

What are the benefits?

•Completely free to use
•Goes on many operating systems
•Works on different platforms

Bottom Line

Shogun also offers a full implementation of Hidden Markov models.The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms.

7.6
Editor Rating
7.7
Aggregated User Rating
4 ratings
You have rated this

Shogun

7

DataMelt

DataMelt

DataMelt, or DMelt, is a software for numeric computation, statistics, analysis of large data volumes ("big data") and scientific visualization. The program can be used in many areas, such as natural sciences, engineering, modeling and analysis of financial markets. DMelt is a computational platform. It can be used with different programming languages on different operating systems. Unlike other statistical programs, it is not limited by a single programming language. DMelt can be used with several scripting languages, such as Python/Jython, BeanShell, Groovy, Ruby, as well as with Java. Most comprehensive software. It includes more than 30,000 Java classes for computation…

Overview
Features

•DMelt with all jar libraries and IDE. Mixed GPL and non-GPL licences (180 MB size)
•Online manual (basic introduction)
•Access to Java API of DMelt core library (600 classes)
•Community forum and bug tracker
•Updates of separate jar files via DMelt IDE NO YES YES
•Full version of DMelt manual
•Access to Java API (30,000 classes) with full search
•Access to Image gallery with code examples
•Web access to more than 500 DMelt examples with searchable database

Price

Many features are free. For all the features the user must pay for memebership.

Website
What is best?

•DMelt with all jar libraries and IDE. Mixed GPL and non-GPL licences (180 MB size)
•Online manual (basic introduction)
•Access to Java API of DMelt core library (600 classes)

What are the benefits?

•Access to Java API of DMelt core library
•Community forum and bug tracker
•Access to Image gallery with code examples

Bottom Line

DataMelt, or DMelt, is a software for numeric computation, statistics, analysis of large data volumes ("big data") and scientific visualization. The program can be used in many areas, such as natural sciences, engineering, modeling and analysis of financial markets.

7.5
Editor Rating
6.6
Aggregated User Rating
10 ratings
You have rated this

DataMelt

8

Natural Language Toolkit

Natural Language Toolkit

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum. Thanks to a hands-on guide introducing programming fundamentals alongside topics in computational linguistics, plus comprehensive API documentation, NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK…

Overview
Features

•Feature structure types
•Parsing feature structure strings
•Feature paths
•Reentrance
•Text classification

Price

Free

What is best?

•Feature structure types
•Parsing feature structure strings
•Feature paths

What are the benefits?

• Tokenization
• Stemming
• Tagging

Bottom Line

NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open source, community-driven project.

7.6
Editor Rating
7.4
Aggregated User Rating
7 ratings
You have rated this

Natural Language Toolkit

9

Apache Mahout

Apache Mahout

The Apache Mahout project’s goal is to build an environment for quickly creating scalable performant machine learning applications. Apache Mahout is a simple and extensible programming environment and framework for building scalable algorithms and contains a wide variety of premade algorithms for Scala and Apache Spark, H2O, Apache Flink. It also used Samsara which is a vector math experimentation environment with R-like syntax which works at scale. Apache™ Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop and using the MapReduce paradigm. While Mahout's core algorithms for clustering, classification and batch based collaborative filtering are…

Overview
Features

•Collaborative filtering
•Clustering
•Classification
•Frequent itemset timing
•Distributed Algebraic optimizer
•R-Like DSL Scala API
•Linear algebra operations

Price

Free

What is best?

•Distributed Algebraic optimizer
•R-Like DSL Scala API
•Linear algebra operations

What are the benefits?

• Access to extensible programming framework
• Build scalable algorithms
• Access many premade algorithms

Bottom Line

Apache Mahout introduces a new math environment called Samsara, for its theme of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized.

7.5
Editor Rating
8.8
Aggregated User Rating
5 ratings
You have rated this

Apache Mahout

10

GNU Octave

GNU Octave

GNU Octave represents a high level language intended for numerical computations. Because of its command line interface, users can solve linear and nonlinear problems numerically and perform other numerical experiments through a language that is mostly compatible with Matlab. This software has features such as powerful mathematics-oriented syntax with built-in plotting and visualization tools, it is free software which runs on GNU/Linux, macOS, BSD, and Windows, compatible with many Matlab scripts. A syntax which is largely compatible with Matlab is the Octave syntax. It can be run in several ways - in GUI mode, as a console, or invoked as…

Overview
Features

•High level language intended for numerical computations
•Solving linear and nonlinear problems numerically
•Powerful mathematics-oriented syntax
•Runs on GNU/Linux, macOS, BSD, and Windows
•Freely redistributable

Price

Free

Website
What is best?

•High level language intended for numerical computations
•Solving linear and nonlinear problems numerically
•Powerful mathematics-oriented syntax

What are the benefits?

•Drop-in compatible with Matlab scripts
•Comprehensive help installation
•Built-in plotting and visualization tools

Bottom Line

Executable versions of GNU Octave for GNU/Linux systems are provided by the individual distributions. Distributions known to package Octave include Debian, Ubuntu, Fedora, Gentoo, and openSUSE.

7.5
Editor Rating
7.7
Aggregated User Rating
9 ratings
You have rated this

GNU Octave

11

GraphLab Create

GraphLab Create is a machine learning platform to build intelligent, predictive application involving cleaning the data, developing features, training a model, and creating and maintaining a predictive service. These intelligent applications provide predictions for use cases including recommenders, sentiment analysis, fraud detection, churn prediction and ad targeting. Trained models can be deployed on Amazon Elastic Compute Cloud (EC2) and monitored through Amazon CloudWatch. They can be queried in real-time via a RESTful API and the entire deployment pipeline is seen through a visual dashboard. The time from prototyping to production is dramatically reduced for GraphLab Create users. Dato is also…

Overview
Features

• Scalable Data Structures
• Deep Learning
• Image Analytics
• Model Optimization
• Feature Engineering
• Nearest Neighbors
• Regression
• Machine Learning Visualizations
• Anomaly Detection
• Text Analytics
• Clustering
• C++ SDK Plugin Architecture
• Classification
• Pattern Mining
• Graph Analytics

Price

Contact for Pricing

What is best?

• Scalable Data Structures
• Deep Learning
• Image Analytics

What are the benefits?

• Predict the likelihood that customers will churn, understand the influential factors, and take action to prevent it from happening
• Transform images for tagging, search, and feature extraction
• Compose and share data pipelines

Bottom Line

GraphLab Create is a machine learning platform to build intelligent, predictive application involving cleaning the data, developing features, training a model, and creating and maintaining a predictive service.

7.6
Editor Rating
8.6
Aggregated User Rating
5 ratings
You have rated this

GraphLab Create

12

ELKI

ELKI

The ELKI framework is written in Java and built around a modular architecture. Most currently included algorithms belong to clustering, outlier detection and database indexes. A key concept of ELKI is to allow the combination of arbitrary algorithms, data types, distance functions and indexes and evaluate these combinations. When developing new algorithms or index structures, the existing components can be reused and combined. ELKI is modeled around a database core, which uses a vertical data layout that stores data in column groups (similar to column families in NoSQL databases). This database core provides nearest neighbor search, range/radius search, and distance…

Overview
Features

• Open source data mining software
• High performance and scalability
• Simple visualization window
• Data management tasks
• Standard Java API

Price

Free

Website
What is best?

• Open source data mining software
• High performance and scalability
• Simple visualization window

What are the benefits?

• JAVA data mining software
• Allows R code
• Data mining and data management are worked as separate tasks

Bottom Line

ELKI is modeled around a database core, which uses a vertical data layout that stores data in column groups (similar to column families in NoSQL databases).

7.5
Editor Rating
8.0
Aggregated User Rating
4 ratings
You have rated this

ELKI

13

Apache UIMA

Apache UIMA

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at UIMA enables applications to be decomposed into components, for example "language identification" => "language specific segmentation" => "sentence boundary detection" => "entity detection (person/place names etc.)". Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow…

Overview
Features

•Infrastructe
•Components
•Frameworks
•Annotators
•Tooling

Price

Free

What is best?

•Infrastructe
•Components
•Frameworks

What are the benefits?

• Development source code issue management
• Tooling
• Servers

Bottom Line

UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

7.6
Editor Rating
6.5
Aggregated User Rating
4 ratings
You have rated this

Apache UIMA

14

KNIME Analytics Platform Community

KNIME Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. A vast arsenal of native nodes, community contributions, and tool integrations makes KNIME Analytics Platform the perfect toolbox for any data scientist. https://www.youtube.com/watch?v=fw0Vb2gLsgA

Overview
Features

• Powerful Analytics
• Data & Tool Blending
• Open Platform
• Over 1000 Modules and Growing
•Connectors for all major file formats and databases
•Support for a wealth of data types: XML, JSON, images, documents, and many more
•Native and in-database data blending & transformation
•Math & statistical functions
•Advanced predictive and machine learning algorithms
•Workflow control
•Tool blending for Python, R, SQL, Java, Weka, and many more
•Interactive data views & reporting

Price

Free

What is best?

•Native and in-database data blending & transformation
•Math & statistical functions
•Advanced predictive and machine learning algorithms

What are the benefits?

• Churn analysis
• Social media sentiment analysis
• Credit scoring

Bottom Line

A vast arsenal of native nodes, community contributions, and tool integrations makes KNIME Analytics Platform the perfect toolbox for any data scientist.

8.5
Editor Rating
8.7
Aggregated User Rating
5 ratings
You have rated this

KNIME Analytics Platform Community

15

TANAGRA

TANAGRA

Tanagra represents free data mining software for academic and research purposes. It provides several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. It is a successor of SIPINA which means that various supervised learning algorithms are provided, especially an interactive and visual construction of decision trees. Because it contains supervised learning but also other paradigms such as clustering, factorial analysis, parametric and nonparametric statistics, association rule, feature selection and construction algorithms, Tanagra is very powerful. The main goal of this project is giving researchers and student’s easy-to-use data mining software and second goal is…

Overview
Features

•Free data mining software for academic and research purposes
•Provides several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area
•Acts more as an experimental platform
•Open source project

Price

Free

Website
What is best?

•Free data mining software for academic and research purposes
•Provides several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area
•Acts more as an experimental platform

What are the benefits?

• Easy to use data mining software
• Interactive utilization
• A wide set of data sources

Bottom Line

TANAGRA is an "open source project" as every researcher can access to the source code, and add his own algorithms, as far as he agrees and conforms to the software distribution license.The main purpose of Tanagra project is to give researchers and students an easy-to-use data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyse either real or synthetic data.

7.5
Editor Rating
7.8
Aggregated User Rating
7 ratings
You have rated this

TANAGRA

16

Rattle GUI

Rattle GUI

Rattle is Free Open Source Software and the source code is available from the Bitbucket repository. Rattle gives the user the freedom to review the code, use it for whatever purpose the user likes, and to extend it however they like, without restriction. Rattle is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets. One of the most important features is that all of the user’s interactions…

Overview
Features

•File Inputs
•Statistics
•Statistical tests
•Clustering
•Modeling
•Evaluation
•Charts
•Transformations

Price

Free

Website
What is best?

•File Inputs
•Statistics
•Statistical tests

What are the benefits?

• Learn and develop skills in R
• Provides ease of use
• Build your own models

Bottom Line

Rattle - the R Analytical Tool To Learn Easily - is a popular GUI for data mining using R. It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

7.6
Editor Rating
6.9
Aggregated User Rating
3 ratings
You have rated this

Rattle GUI

17

CMSR Data Miner

CMSR Data Miner

StarProbe Data Miner or CMSR Data Miner Suite is software which provides an integrated environment for predictive modeling, segmentation, data visualization, statistical data analysis, and rule-based model evaluation. For advanced power users integrated analytics and rule-engine environment is also provided. This software has many features such as: deep learning modeling RME-EP which represents very powerful expert system shell rule engine, supporting predictive modeling such as neural network, self organizing maps, decision tree, regression etc. It has been developed to use SQL-like expressions which users can learn very easily and quickly. Also, RME-EP expert system rules can be written by non-IT…

Overview
Features

• Deep Learning Modeling (RME-EP).
• Neural network (multi-hidden layer deep neural network support).
• Neural clustering and segmentation (Self Organizing Maps: SOM).
• (Cramer) Decision tree classification and Segmentation.
• Hotspot drill-down and profiling analysis.
• Regression.
• Business rules - Predictive expert systems shell rule engine.
• Rule-based predictive model evaluation.
• Powerful charts: 3D bars, bars, histograms, histobars, scatterplots, boxplots ...
• Segmentation and gains analysis.
• Response and profit analysis.
• Correlation analysis.
• Cross-sell Basket Analysis.
• Drill-down statistics.
• Cross tables with deviation/hotspot analysis.
• Groupby tables with deviation/hotspot analysis.
• SQL query/batch tools.
• Statistics: Mono, Bi, ANOVA, ...
• Database scoring: Apply predictive/segmentation models to database records).
• Connect to all major relational DBMS through JDBC.
• Data Miner optimized for MicroSoft MS SQL Server, MySQL, PostgreSQL, MS Office Access.
• Deploy/publish predictive models on Android phones and Android tablets with MyDataSay app. For downloads, click here.
• Database table import/export tools (Support character strings, integer and real numbers).
• Super fast and Big data (upto 2 billion records).
• Runs on multiple OS: Windows, Linux, Mac OS X, AIX, Solaris, HPUX.
"

Price

1 year free license for evaluation
Free academic use

What is best?

• Deep Learning Modeling (RME-EP).
• Neural network (multi-hidden layer deep neural network support).
• Neural clustering and segmentation (Self Organizing Maps: SOM).

What are the benefits?

• Perform statistical data analysis quickly
• Drill down into the most complex data
• Access visualization tools for data results

Bottom Line

StarProbe Data Miner or CMSR Data Miner Suite is software which provides an integrated environment for predictive modeling, segmentation, data visualization, statistical data analysis, and rule-based model evaluation.

7.6
Editor Rating
4.1
Aggregated User Rating
7 ratings
You have rated this

CMSR Data Miner

18

OpenNN

OpenNN

OpenNN is an open source class library written in C++ programming language which implements neural networks, a main area of machine learning research. The library implements any number of layers of non-linear processing units for supervised learning. This deep architecture allows the design of neural networks with universal approximation properties. The main advantage of OpenNN is its high performance. It is developed in C++ for better memory management and higher processing speed, and implements CPU parallelization by means of OpenMP and GPU acceleration with CUDA. OpenNN has been written in ANSI C++. This means that the library can be built…

Overview
Features

• Unified Modeling Language (UML)
• OpenNN is based on the multilayer perceptron
• The loss index

Price

Free

Website
What is best?

• Unified Modeling Language (UML)
• OpenNN is based on the multilayer perceptron
• The loss index

What are the benefits?

•Technology evaluation
•Proof of concept
•Design and implementation

Bottom Line

The library implements any number of layers of non-linear processing units for supervised learning. This deep architecture allows the design of neural networks with universal approximation properties.

7.6
Editor Rating
9.5
Aggregated User Rating
11 ratings
You have rated this

OpenNN

19

Dataiku DSS Community

Dataiku DSS is the collaborative data science software platform for teams of data scientists, data analysts, and engineers to explore, prototype, build, and deliver their own data products more efficiently. Dataiku develops the unique advanced analytics software solution that enables companies to build and deliver their own data products more efficiently. Dataiku DSS is a collaborative and team-based user interface for data scientists and beginner analysts, to a unified framework for both development and deployment of data projects, and to immediate access to all the features and tools required to design data products from scratch. The visual interface of Dataiku…

Overview
Features

•Data connectors
•Data transformation
•Transformation engines
•Data Visualization
•Data Mining
•Machine Learning

Price

Free

What is best?

•Data connectors
•Data transformation
•Transformation engines

What are the benefits?

•Connect to more than 25 data storage systems
•Extend with plugins
•Visualize and re-run Workflows

Bottom Line

The visual interface of Dataiku DSS empowers people with a less technical background to learn the data mining process, and build projects from raw data to predictive application, without having to write a single line of code.

7.5
Editor Rating
7.0
Aggregated User Rating
7 ratings
You have rated this

Dataiku DSS Community

20

DataPreparator

DataPreparator

DataPreparator is a free software tool which is designed to assist with common tasks of data preparation (or data preprocessing) in data analysis and data mining. DataPreparator offers features such as character removal, text replacement, date conversion, remove selected attributes, move selected attributes, equal width, equal frequency, equal frequency from grouped data, delete records containing missing values, remove attributes containing missing values, impute missing values, predict missing values from model (dependence tree, Naive Bayes model), include missing value patterns, Z-score metho. Box-plot method, create binary attributes, replace nominal values by indices, reduce number of labels, decimal, linear, hyperbolic tangent, soft-max,…

Overview
Features

• Data access from text files, relational databases, and Excel workbooks
• Handling of large volumes of data (since data sets are not stored in the computer memory, with the exception of Excel workbooks and result sets of some databases where database drivers do not support data streaming)
• Stand alone tool, independent of any other tools
• User friendly graphical user interface
• Operator chaining to create sequences of preprocessing transformations (operator tree)
• Creating of model tree for test/execution data

Price

• Free

What is best?

• Data access from text files, relational databases, and Excel workbooks
• Handling of large volumes of data (since data sets are not stored in the computer memory, with the exception of Excel workbooks and result sets of some databases where database drivers do not support data streaming)
• Stand alone tool, independent of any other tools

What are the benefits?

• Provides a variety of techniques for data cleaning, transformation, and exploration
• Chaining of preprocessing operators into a flow graph (operator tree)
• Handling of large volumes of data (since data sets are not stored in the computer memory)

Bottom Line

DataPreparator includes operators for cleaning, discretization, numeration, scaling, attribute selection, missing values, outliers, statistics, visualization, balancing, sampling, row selection, and several other tasks.

7.6
Editor Rating
9.6
Aggregated User Rating
3 ratings
You have rated this

DataPreparator

21

LIBLINEAR

LIBLINEAR

LIBLINEAR is an open source library that is used by data scientists, developers and end users to perform large scale linear classification. The easy to use command tools and library calls enables LIBLINEAR to be used by data scientists and developers to perform logistics, regression and linear support for vector machine. With LIBLINEAR developers and data scientists are able to same data format as the one in LIBSVM found in LINLINEAR general purpose SVM solver which also has similar usage. LINLINEAR presents several machine language interfaces that can be used by data scientists and developers. The machine language interfaces presented…

Overview
Features

• Multi-class classification: 1) one-vs-the rest. 2) Crammer& Singer
• Cross validation for model evaluation
• Automatic parameter selection
• Probability estimates (logistic regression only)
• Weights for unbalanced data
• MATLAB/Octave, Java, Python, Ruby interfaces

Price

Contact for Pricing

Website
What is best?

• Cross validation for model evaluation
• Automatic parameter selection
• Probability estimates (logistic regression only)

What are the benefits?

• Speedups the training on shared memory systems
• Supports L2-regularized classifiers
• Supports L2-loss linear SVR and L1-loss linear SVR

Bottom Line

LIBLINEAR is an open source library that comes with easy to use command tools and library calls that enable developers, end users, and data scientists perform large scale linear classification.

7.6
Editor Rating
9.1
Aggregated User Rating
1 rating
You have rated this

LIBLINEAR

22

Chemicalize.org

Chemicalize.org

Chemicalize provides instant cheminformatics solution. It is a powerful online platform for chemical calculations, search, and text processing. Calculation view provides structure-based predictions for any molecule structure. Available calculations include elemental analysis, names and identifiers, pKa, logP/logD, as well as solubility. Search view lets you perform text-based and structure-based searches against the Chemicalize database to find web page sources and associated structures of the results. You can even combine text-based and structural queries to achieve advanced search capabilities. Web viewer displays any web page with chemical structures highlighted on it. Recognized formats are IUPAC names, common names, InChI, and SMILES…

Overview
Features

•Calculations
•Chemical search
•Webpage annotation
•Compliance checker

Price

Free. Some features will cost. Visit webstie for further payment details.

What is best?

•Calculations
•Chemical search
•Webpage annotation

What are the benefits?

•Calculations
•Chemical Search
•Web Page Annotation

Bottom Line

Search view lets you perform text-based and structure-based searches against the Chemicalize database to find web page sources and associated structures of the results.

7.6
Editor Rating
8.2
Aggregated User Rating
2 ratings
You have rated this

Chemicalize.org

23

Vowpal Wabbit

Vowpal Wabbit

The Vowpal Wabbit (VW) project is a fast out-of-core learning system sponsored by Microsoft Research and (previously) Yahoo! Research. Support is available through the mailing list. There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation. There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function (several are available),…

Overview
Features

•Input format
•Speed
•Scalability
•Feature pairing

Price

Free

What is best?

•Input format
•Speed
•Scalability

What are the benefits?

•Input format
•Speed
•Scalability

Bottom Line

There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation

7.5
Editor Rating
8.1
Aggregated User Rating
3 ratings
You have rated this

Vowpal Wabbit

24

mlpy

mlpy

Mlpy know as Machine Learning Python represents a python method for machine learning built on top of NumPy/SciPy (Python-based ecosystem of open-source software for mathematics, science, and engineering) and the GNU Scientific Libraries (represents numerical library for C and C++ programmers where a wide range of mathematical routines such as random number generators, special functions and least-squares fitting are provided). Wide range of state-of-the-art machine learning methods are provided for supervised and unsupervised problems and mlpy is aimed at finding a reasonable compromise among modularity, maintainability, reproducibility, usability and efficiency. It provides high-level functions and classes allowing, with few lines…

Overview
Features

•Python method for machine learning
•Provides high-level functions and classes
•Works with Python 2 and 3
•Open Source
•Compatible with PyInstaller

Price

Free

Website
What is best?

•Python method for machine learning
•Provides high-level functions and classes
•Works with Python 2 and 3

What are the benefits?

• Provides a wide range of machine learning methods
• Perform many statistical analyses
• Easy to manipulate data

Bottom Line

mlpy is multiplatform, it works with Python 2 and 3 and it is Open Source, distributed under the GNU General Public License version 3.

7.6
Editor Rating
8.0
Aggregated User Rating
3 ratings
You have rated this

mlpy

25

Dlib

Dlib is a modern C++ toolkit which contains machine learning algorithms and tools in order of creating complex software in C++ for solving real world problems. It is used in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. It is free of any charges which mean that users can use it in any app. Major features of Dlib is: documentation – it provides complete and precise documentation for every class and function, lots of example programs are provided; high quality portable code – good unit test coverage, tested on MS Windows,…

Overview
Features

•Contains machine learning algorithms and tools in order of creating complex software in C++ for solving real world problems
•Provides complete and precise documentation for every class and function
•High quality portable code
•Graphical model inference algorithms

Price

Free

What is best?

•Contains machine learning algorithms and tools in order of creating complex software in C++ for solving real world problems
•Provides complete and precise documentation for every class and function
•High quality portable code

What are the benefits?

• Documentation for every class and function
• Debugging modes that check documented preconditions for functions
• Good unit test coverage

Bottom Line

It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments.

7.6
Editor Rating
8.1
Aggregated User Rating
2 ratings
You have rated this

Dlib

26

CLUTO

CLUTO

Cluto is software package intended for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters. It is well-suited for clustering data sets, arisen in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology. CLUTO's distribution consists of both stand-alone programs and a library via which an application program can access directly the various clustering and analysis algorithms implemented in CLUTO. This software has several features such as multiple classes of clustering algorithms – partitional, agglomerative, & graph-partitioning based; multiple similarity/distance functions – Euclidean distance, cosine, correlation coefficient, extended Jaccard,…

Overview
Features

•Intended for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters
•Multiple classes of clustering algorithms
•Multiple methods for effectively summarizing the clusters
•Can scale to very large datasets

Price

Free

Website
What is best?

•Intended for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters
•Multiple classes of clustering algorithms
•Multiple methods for effectively summarizing the clusters

What are the benefits?

• Can scale to very large datasets containing hundreds of thousands of objects and tens of thousands of dimensions.
• Multiple methods for effectively summarizing the clusters.
• Extensive cluster visualization capabilities and output options.

Bottom Line

CLUTO is well-suited for clustering data sets arising in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology.CLUTO's distribution consists of both stand-alone programs and a library via which an application program can access directly the various clustering and analysis algorithms implemented in CLUTO.

7.6
Editor Rating
8.4
Aggregated User Rating
2 ratings
You have rated this

CLUTO

27

TraMineR

TraMineR

TraMineR represents R-package (free software environment for statistical computing and graphics which compiles and runs on a wide variety of platforms such as UNIX platforms, Windows and MacOS) intended for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. Analysis of biographical longitudinal, data such as data describing careers or family trajectories, in the social sciences is its primary goal. This platform has many features that can apply in many other kinds of categorical sequence data. These features include: handling of longitudinal data and conversion between various sequence formats; plotting sequences (density plot, frequency…

Overview
Features

•Intended for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data
•Individual longitudinal characteristics of sequences
•Sequence transversal characteristics by age point
•Parallel coordinate plot of event sequences
•Identifying most discriminating event subsequences

Price

Free

Website
What is best?

•Intended for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data
•Individual longitudinal characteristics of sequences
•Sequence transversal characteristics by age point

What are the benefits?

• Visualize sequence data sets
• Explore the sequence data set by computing and visualizing descriptive statistics
• Build a typology of transitions from school to work

Bottom Line

Its primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories. However, most of its features also apply to many other kinds of categorical sequence data.

7.6
Editor Rating
7.1
Aggregated User Rating
3 ratings
You have rated this

TraMineR

28

ROSETTA

ROSETTA

ROSETTA is a toolkit for analyzing tabular data within the framework of rough set theory. It is designed for supporting the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to validation and analysis of the induced rules or patterns. This toolkit is not specifically towards any particular application domain, it is intended as a general-purpose tool for discernibility-based modeling. Highly intuitive GUI environment is offered and in this environment data-navigational abilities are emphasized. The main orientation of GUI is…

Overview
Features

•Toolkit for analyzing tabular data within the framework of rough set theory
•Intended as a general-purpose tool for discernibility-based modeling
•Import/export – partial integration with DBMSs via ODBC
•Completion of decision tables with missing values

Price

Free

Website
What is best?

•Toolkit for analyzing tabular data within the framework of rough set theory
•Intended as a general-purpose tool for discernibility-based modeling
•Import/export – partial integration with DBMSs via ODBC

What are the benefits?

• Import/export
• Preprocessing
• Computation

Bottom Line

ROSETTA is designed to support the overall data mining and knowledge discovery process: From initial browsing and preprocessing of the data, via computation of minimal attribute sets and generation of if-then rules or descriptive patterns, to validation and analysis of the induced rules or patterns.

7.6
Editor Rating
8.0
Aggregated User Rating
2 ratings
You have rated this

ROSETTA

29

Pandas

pandas

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas is a NUMFocus sponsored project. This will help ensure the success of development of pandas as a world-class open-source project, and makes it possible to donate to the project. Best way to get pandas is to install via conda Builds for osx-64,linux-64,linux-32,win-64,win-32 for Python 2.7, Python 3.4, and Python 3.5 are all available. This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large…

Overview
Features

•New .agg() API for Series/DataFrame similar to the groupby-rolling-resample API’s,
•Integration with the feather-format, including a new top-level pd.read_feather() and DataFrame.to_feather() methodThe .ix indexer has been deprecated,
•Panel has been deprecated
•Addition of an IntervalIndex and Interval scalar type,
•Improved user API when accessing levels in .groupby(),
•Improved support for UInt64 dtypes, A new orient for JSON serialization, orient='table' that uses the Table Schema spec,
•Experimental support for exporting DataFrame.style formats to Excel
•Window Binary Corr/Cov operations now return a MultiIndexed DataFrame rather than a Panel, as Panel is now deprecated,
•Support for S3 handling now uses s3fs,
•Google BigQuery support now uses the pandas-gbq library
•Switched the test framework to use pytest

Price

Free

Website
What is best?

•Improved user API when accessing levels in .groupby(),
•Improved support for UInt64 dtypes, A new orient for JSON serialization, orient='table' that uses the Table Schema spec,
•Experimental support for exporting DataFrame.style formats to Excel

What are the benefits?

• Perform fast, efficient data manipulation
• Access easy-to-use data structures
• Intelligent alignment of data

Bottom Line

Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form.

7.6
Editor Rating
8.2
Aggregated User Rating
4 ratings
You have rated this

Pandas

30

Fityk

Fityk

Fityk is a program for data processing and nonlinear curve fitting. It is primarily used by scientists who analyse data from powder diffraction, chromatography, photoluminescence and photoelectron spectroscopy, infrared and Raman spectroscopy, and other experimental techniques and also used to fit peaks – bell-shaped functions (Gaussian, Lorentzian, Voigt, Pearson VII, bifurcated Gaussian. EMG, Doniach-Sunjic, etc.), but it is suitable for fitting any curve to 2D (x,y) data. Fityk has the following features for users; intuitive graphical interface (and also command line interface), support for many data file formats, thanks to the xylib library, dozens of built-in functions and support for…

Overview
Features

• Intuitive graphical interface (and also command line interface),
• Support for many data file formats, thanks to the xylib library,
• Dozens of built-in functions and support for user-defined functions,
• Equality constraints,
• Ftting systematic errors of the x coordinate of points
• Manual, graphical placement of peaks and auto-placement using peak detection algorithm,
• Various optimization methods
• Handling series of datasets,
• Automation with macros (scripts) and embedded Lua for more complex scripting
• Open source licence (GPLv2+).

Price

•1 month subscription: $115 (≈ €90)
•1 year subscription: $199 (≈ €150)
•2 years subscription: $299 (≈ €225)

Website
What is best?

• Handling series of datasets,
• Automation with macros (scripts) and embedded Lua for more complex scripting
• Open source licence (GPLv2+).

What are the benefits?

• Operate on an intuitive graphical interface
• Support for many data file formats
• Access to dozens of built-in functions

Bottom Line

Fityk is used by scientists who analyse data from powder diffraction, chromatography, photoluminescence and photoelectron spectroscopy, infrared and Raman spectroscopy, and other experimental techniques.

7.6
Editor Rating
8.2
Aggregated User Rating
4 ratings
You have rated this

Fityk

31

KEEL

KEEL

KEEL (Knowledge Extraction based on Evolutionary Learning) is an open source (GPLv3) Java software tool that can be used for a large number of different knowledge data discovery tasks. KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behavior of the algorithms. It contains a wide variety of classical knowledge extraction algorithms, preprocessing techniques (training set selection, feature selection, discretization, imputation methods for missing values, among others), computational intelligence based learning algorithms, hybrid models, statistical methodologies for contrasting experiments…

Overview
Features

•Evolutionary Algorithms (EAs)
•Data pre-processing algorithms
•Statistical library
•User-friendly interface, oriented to the analysis of algorithms.
•Allows to create experiments in on-line mode, aiming an educational support in order to learn the operation of the algorithms included.
•Knowledge Extraction Algorithms Library. The main employment lines are:
•Different evolutionary rule learning models have been implemented
•Fuzzy rule learning models with a good trade-off between accuracy and interpretability.
•Evolution and pruning in neural networks, product unit neural networks, and radial base models.
•Genetic Programming: Evolutionary algorithms that use tree representations for extracting knowledge.
•Algorithms for extracting descriptive rules based on patterns subgroup discovery have been integrated.
•Data reduction (training set selection, feature selection and discretization). EAs for data reduction have been included.

Price

Free

Website
What is best?

•Allows to create experiments in on-line mode, aiming an educational support in order to learn the operation of the algorithms included.
•Knowledge Extraction Algorithms Library. The main employment lines are:
•Different evolutionary rule learning models have been implemented

What are the benefits?

• Data management
• Design of experiments
• Design of imbalanced experiments

Bottom Line

KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behavior of the algorithms.

7.6
Editor Rating
8.0
Aggregated User Rating
2 ratings
You have rated this

KEEL

32

ADaMSoft

ADaMSoft

ADaMSoft is a free and open-source system for data management, data and web mining, statistical analysis. ADaMSoft offers procedures such as Principal component analysis, Text mining, Web Mining, Analysis of three ways time arrays, Linear regression with fuzzy dependent variable, Utility, Synthesis table, Import a data table (file) in ADaMSoft (create a dictionary), Charts, Neural network (MLP), Association measures for qualitative variables. Linear algebra, Evaluate the results of function approximation, Data Management, Function fitting, Error localization and data imputation, Decision trees, Statistics on quantitative variables, Record linkage, Evaluate the result of classification models, Cluster analysis (k-means method), Correspondence analysis, Data…

Overview
Features

• Use the same package in different platforms; you just need to have installed the Java Runtime Environment
• Obtain a single product for Data Integration, Analytical ETL, Data Analysis, Reporting,...
• Consider a powerful syntax to recode, modify, transform your data, that is based on the Java language, enriched with many functions that access data sets
• Easily access to the most common data sources and associated to them proper meta data
• Use hundreds of statistical procedures to analyze your data, to visualize their internal relations, etc.

Price

• Free

Website
What is best?

• Use the same package in different platforms; you just need to have installed the Java Runtime Environment
• Obtain a single product for Data Integration, Analytical ETL, Data Analysis, Reporting,...
• Consider a powerful syntax to recode, modify, transform your data, that is based on the Java language, enriched with many functions that access data sets

Bottom Line

ADaMSoft stands for: Data Analysis and Statistical Modeling software (in italian: Analisi Dati e Modelli Statistici) which performs Principal component analysis, Text mining, Web Mining, Analysis of three ways time arrays, Linear regression with fuzzy dependent variable, Utility, Synthesis table, Import a data table (file) in ADaMSoft (create a dictionary), Charts and Neural network (MLP).

7.5
Editor Rating
Aggregated User Rating
2 ratings
You have rated this

ADaMSoft

33

Sentic API

Sentic API provides the semantics and sentics such as the denotative and connotative information associated with the concepts of SenticNet 4, a semantic network of commonsense knowledge that contains 50,000 nodes in words and multiword expressions and thousands of connections in relationships between nodes. Sentic API is available in 40 different languages and lets users selectively access the latest version of the knowledge base online. Since polarity detection is the most common sentiment analysis task, Sentic API provides two fine-grained commands for it.

Overview
Features

• Denotative and connotative information
• Return only semantics, sentics, moodtags, and polarity
• Available in 40 different languages
• Provides the semantics

Price

• Free

What is best?

• Denotative and connotative information
• Return only semantics, sentics, moodtags, and polarity,

What are the benefits?

• Provides the sentics
• Provides two fine-grained commands for polarity
• Also accessible online through a python package

Bottom Line

Sentic API provides the denotative and connotative information associated with the concepts of SenticNet 4 in 40 languages.

7.6
Editor Rating
6.2
Aggregated User Rating
2 ratings
You have rated this

Sentic API

34

ML-Flex

ML-Flex

ML-Flex uses machine-learning algorithms to derive models from independent variables, with the purpose of predicting the values of a dependent (class) variable. For example, machine-learning algorithms have long been applied to the Iris data set, introduced by Sir Ronald Fisher in 1936, which contains four independent variables (sepal length, sepal width, petal length, petal width) and one dependent variable (species of Iris flowers = setosa, versicolor, or virginica). Deriving prediction models from the four independent variables, machine-learning algorithms can often differentiate between the species with near-perfect accuracy. One important aspect to consider in performing a machine-learning experiment is the validation…

Overview
Features

•Configuring Algorithms
•Creating an Experiment File
•List of Experiment Settings
•Running an Experiment
•List of Command-line Arguments
•Executing Experiments Across Multiple Computers
•Modifying Java Source Code
•Creating a New Data Processor
•Third-party Machine Learning Software
Integrating with Third-party Machine Learning Software

Price

Free

Website
What is best?

•Configuring Algorithms
•Creating an Experiment File
•List of Experiment Settings

What are the benefits?

•Flexible processing of multiple data sets
•Delivering experiments across multiple systems
•Integrates with third-party machine learning software

Bottom Line

Machine-learning algorithms have been developed in a wide variety of programming languages and offer many incompatible ways of interfacing to them. ML-Flex makes it possible to interface with any algorithm that provides a command-line interface.

7.6
Editor Rating
8.2
Aggregated User Rating
2 ratings
You have rated this

ML-Flex

35

Databionic ESOM

Databionic ESOM

The Databionics ESOM Tools offer many data mining tasks using emergent self-organizing maps (ESOM). Visualization, clustering, and classification of high-dimensional data using databionic principles can be performed interactively or automatically. Its features include ESOM training, U-Matrix visualizations, explorative data analysis and clustering, ESOM classification, and creation of U-Maps. The Databionic ESOM Tools is a suite of programs to perform data mining tasks like clustering, visualization, and classification with Emergent Self-Organizing Maps (ESOM). Features include training of ESOM with different initialization methods, training algorithms, distance functions, parameter cooling strategies, ESOM grid topologies, and neighborhood kernels. The Databionics ESOM Tools also contain…

Overview
Features

•Different initialization methods
•Training algorithms
•Distance functions
•Parameter cooling strategies ESOM grid topologies
•Neighborhood kernels.

Price

Free

What is best?

•Different initialization methods
•Training algorithms
•Distance functions

What are the benefits?

•Creation of classifier and automated application to new data
•Creation of non-redundant U-Maps
•Training with different initialization methods

Bottom Line

Training of ESOM with different initialization methods, training algorithms, distance functions, parameter cooling strategies, ESOM grid topologies, and neighborhood kernels. Visualization of high dimensional dataspace with U-Matrix, P-Matrix, Component Planes, SDH, and more.

7.6
Editor Rating
8.3
Aggregated User Rating
2 ratings
You have rated this

Databionic ESOM

36

MALLET

MALLET

MALLET known as Machine Learning for LanguagE Toolkit is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. Sophisticated tools for document classification are provided - efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. It also provides tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields and all…

Overview
Features

•Java-based package for statistical natural language processing, document classification, clustering, topic modeling, •Information extraction, and other machine learning applications to text
•Provides tools for sequence tagging
•Routines for transforming text documents into numerical representations
•Add-on package called GRRM
•Open Source Software

Price

Free

Website
What is best?

•Java-based package for statistical natural language processing, document classification, clustering, topic modeling, •Information extraction, and other machine learning applications to text
•Provides tools for sequence tagging
•Routines for transforming text documents into numerical representations

What are the benefits?

• Perform document classification easily
• Transform text to numerical representations
• Optimize numerical representations

Bottom Line

MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

7.6
Editor Rating
7.4
Aggregated User Rating
2 ratings
You have rated this

MALLET

37

streamDM

streamDM

streamDM is an open source software for mining big data streams that uses Spark Streaming, developed at Huawei Noah's Ark Lab. This software is licensed under Apache Software License v2.0. Today, Big Data Stream learning is more challenging because data may not keep the same distribution over the lifetime of the stream. Learning algorithms needs to be very efficient because each example that comes in a stream can be processed once or these examples needs to be summarized with a small memory footprint. Spark Streaming, which makes building scalable fault – tolerant streaming applications easy, is an extension of the…

Overview
Features

•Open source software for mining big data streams
•Spark Streaming extension
•Implemented methods CluStream; Hoeffding Decision Trees; bagging; Stream KM ++; HyperplaneGenerator;

Price

Free

Website
What is best?

•Open source software for mining big data streams
•Spark Streaming extension
•Implemented methods CluStream; Hoeffding Decision Trees; bagging; Stream KM ++; HyperplaneGenerator;

What are the benefits?

• Open source software for mining big data streams
• Spark Streaming extension
• Implemented methods CluStream;Hoeffding Decision Trees;bagging;Stream KM ++; HyperplaneGenerator.

Bottom Line

Spark Streaming is an extension of the core Spark API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD).

7.6
Editor Rating
8.3
Aggregated User Rating
1 rating
You have rated this

streamDM

38

ADaM

ADaM

The Algorithm Development and Mining System (ADaM) developed by the Information Technology and Systems Center at the University of Alabama in Huntsville is used to apply data mining technologies to remotely-sensed and other scientific data. The mining and image processing toolkits consist of interoperable components that can be linked together in a variety of ways for application to diverse problem domains. ADaM has over 100 components that can be configured to create customized mining processes. Preprocessing and analysis utilities aid users in applying data mining to their specific problems. New components can easily be added to adapt the system to…

Overview
Features

• Component Architecture
• Distributed Services
• Custom Applications
• Grid-enabled Services

Price

Freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use ADaM only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval; on site download

Website
What is best?

• Component Architecture
• Distributed Services
• Custom Applications

What are the benefits?

•Data Mining and Image Processing Toolkits
•Component Architecture
•Distributed Services

Bottom Line

ADaM's component architecture is designed to take advantage of emerging computational environments such as the Web and information Grids.

7.6
Editor Rating
7.8
Aggregated User Rating
2 ratings
You have rated this

ADaM

39

MiningMart

MiningMart

MiningMart can help to reduce this time. The MiningMart project aims at new techniques that give decision-makers direct access to information stored in databases, data warehouses, and knowledge bases. The main goal is to support users in making intelligent choices by offering following objectives: Operators for preprocessing with direct database access; Use of machine learning for the preprocessing; Detailed documentation of successful cases; High quality discovery results; Scalability to very large databases and Techniques that automatically select or change representations. MiningMart’s basic idea is to store best practice cases of preprocessing chains that where developed by experienced users. The data…

Overview
Features

• Operators for preprocessing with direct database access
• Use of machine learning for the preprocessing
• Detailed documentation of successful cases
• High quality discovery results
• Scalability to very large databases
• Techniques that automatically select or change representations.

Price

Free

Website
What is best?

• Operators for preprocessing with direct database access
• Use of machine learning for the preprocessing
• Detailed documentation of successful cases

What are the benefits?

• Speed up data pre-processing
• Access to detailed documentation of successful cases
• Access to high quality discovery results

Bottom Line

MiningMart users choose a case and apply the corresponding transformation and learning chain to their application.

7.6
Editor Rating
6.7
Aggregated User Rating
2 ratings
You have rated this

MiningMart

40

Modular toolkit for Data Processing

Modular toolkit for Data Processing

The Modular toolkit for Data Processing (MDP) is a library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software. From the user’s perspective, MDP consists of a collection of supervised and unsupervised learning algorithms, and other data processing units (nodes) that can be combined into data processing sequences (flows) and more complex feed-forward network architectures. Given a set of input data, MDP takes care of successively training or executing all nodes in the network. This allows the user to specify complex algorithms as a series of simpler…

Overview
Features

• Modular toolkit for Data Processing (MDP)
• Implementation of new supervised and unsupervised learning algorithms easy and straightforward
• Valid educational tool

Price

Free

What is best?

• Modular toolkit for Data Processing (MDP)
• Implementation of new supervised and unsupervised learning algorithms easy and straightforward
• Valid educational tool

What are the benefits?

• Access simpler data processing steps
• Build more complex data processing software
• Perform parallel implementation of basic nodes and flows

Bottom Line

MDP consists of a collection of supervised and unsupervised learning algorithms, and other data processing units (nodes) that can be combined into data processing sequences (flows) and more complex feed-forward network architectures.

7.6
Editor Rating
8.3
Aggregated User Rating
2 ratings
You have rated this

Modular toolkit for Data Processing

41

Jubatus

Jubatus

Jubatus supports basic tasks including classification, regression, clustering, nearest neighbor, outlier detection, and recommendation. Jubatus is the first open source platform for online distributed machine learning on the data streams of Big Data. Jubatus uses a loose model sharing architecture for efficient training and sharing of machine learning models, by defining three fundamental operations. Update, Mix, and Analyze, in a similar way with the Map and Reduce operations in Hadoop. In addition, Jubatus supports scalable machine learning processing. It can handle 100000 or more data per second using commodity hardware clusters. It is designed for clusters of commodity, shared-nothing hardware.…

Overview
Features

•Scalable
•Real-Time
•Difference from Hadoop and Mahout
•Deep-Analysis

Price

Free

Website
What is best?

•Scalable
•Real-Time
•Difference from Hadoop and Mahout

What are the benefits?

•Scalable
•Real-Time
•Deep-Analysis

Bottom Line

Jubatus uses a loose model sharing architecture for efficient training and sharing of machine learning models, by defining three fundamental operations; Update, Mix, and Analyze, in a similar way with the Map and Reduce operations in Hadoop.

7.6
Editor Rating
8.2
Aggregated User Rating
2 ratings
You have rated this

Jubatus

42

LIBSVM

LIBSVM

LIBSVM is a library for Support Vector Machines (SVMs). LIBSVM offers tools such as Multi-core LIBLINEAR, Distributed LIBLINEAR, LIBLINEAR for Incremental and Decremental Learning, LIBLINEAR for One-versus-one Multi-class Classification, Large-scale rankSVM, LIBLINEAR for more than 2^32 instances/features (experimental), Large linear classification when data cannot fit in memory, Weights for data instances. Fast training/testing for polynomial mappings of data, Cross Validation with Different Criteria (AUC, F-score), Cross Validation using Higher-level Information to Split Data, LIBSVM for dense data, LIBSVM for string data, Multi-label classification, LIBSVM Extensions at Caltech, Feature selection tool, LIBSVM data sets, SVM-toy based on Javascript, SVM-toy in 3D,…

Overview
Features

• Different SVM formulations
• Efficient multi-class classification
• Cross validation for model selection
• Probability estimates
• Various kernels (including precomputed kernel matrix)
• Weighted SVM for unbalanced data
• Both C++ and Java sources
• GUI demonstrating SVM classification and regression
• Python, R, MATLAB, Perl, Ruby, Weka, Common LISP, CLISP, Haskell, OCaml, LabVIEW, and PHP interfaces. C# .NET code and CUDA extension is available.
• It's also included in some data mining environments: RapidMiner, PCP, and LIONsolver.
• Automatic model selection which can generate contour of cross validation accuracy.

Price

• Free

Website
What is best?

• Different SVM formulations
• Efficient multi-class classification
• Cross validation for model selection

What are the benefits?

• Solving SVM optimization problems,
• Solving theoretical convergence,
• Solving multi-class classification,

Bottom Line

LIBSVM involves training a data set to obtain a model, using the model to predict information of a testing data set and can also output probability estimates for SVC and SVR.

7.6
Editor Rating
9.4
Aggregated User Rating
1 rating
You have rated this

LIBSVM

43

Arcadia Data Instant

Arcadia Data Instan uses smart acceleration to enable ultra-fast analytics and BI with agile drag-and-drop access. Arcadia Data Instant provides an in-cluster execution engine for scale-out performance on Apache Hadoop and other modern data platforms with no data movement. Arcadia Data Instant supports visualizations on Apache Kafka. Through this, users have an excellent platform to download a kit quickly and get started with exploring visualizations of Kafka topics. The key features offered by Arcadia Data Instant include connect, discover, model, visualise, interact, manage, scale, optimize, security, share and publish, and advanced analytics. The connect feature allows accessing data inside Hadoop…

Overview
Features

• The discover feature provides browse data sources, structure and content, with full granularity and transparency
• Set hierarchies and logical datasets, for blending visualizations across sources
• The visualize feature provides easy to use familiar web-based self-service drag and drop authoring
• Flow and funnel algorithms that make it easy to measure correlation
• Create semantic relationships across multiple sources
• Assemble dashboards and applications of visuals that show the user’s work

Price

Contact for pricing

What is best?

• The discover feature provides browse data sources, structure and content, with full granularity and transparency
• Set hierarchies and logical datasets, for blending visualizations across sources
• The visualize feature provides easy to use familiar web-based self-service drag and drop authoring

What are the benefits?

• Provides an in-cluster execution engine for scale-out performance on Apache Hadoop
• Achieve linear scalability of records with native in-cluster execution
• Simplifies deployment and monitoring with certified integration

Bottom Line

Arcadia Data Instant is an email marketing platform that provides an in-cluster execution engine for scale-out performance on Apache Hadoop and other modern data platforms with no data movement.

7.6
Editor Rating
7.9
Aggregated User Rating
3 ratings
You have rated this

Arcadia Data Instant

You may like to read: Top Data Mining Software

What are Data Mining Software?

Data mining is the process of identifying patterns, analyzing data and transforming unstructured data into structured and valuable information that can be used to make informed business decisions. Data Mining Software allows the organization to analyze data from a wide range of database and detect patterns.

What are the top Free Data Mining Software?

Orange Data mining, Anaconda, R Software Environment, Scikit-learn, Weka Data Mining, Shogun, DataMelt, Natural Language Toolkit, Apache Mahout, GNU Octave, GraphLab Create, ELKI, Apache UIMA, KNIME Analytics Platform Community, TANAGRA, Rattle GUI, CMSR Data Miner, OpenNN, Dataiku DSS Community, DataPreparator, LIBLINEAR, Chemicalize.org, Vowpal Wabbit, mlpy, Dlib, CLUTO, TraMineR, ROSETTA, Pandas, Fityk, KEEL, ADaMSoft, Sentic API, ML-Flex, Databionic ESOM, MALLET, streamDM, ADaM, MiningMart, Modular toolkit for Data Processing, Jubatus, LIBSVM, Arcadia Data Instant are some of the top free data mining software.

6 Reviews
  • Mike
    March 17, 2014 at 9:23 am

    ADDITIONAL INFORMATION
    Hello bud, on your data mining softwares witch 1 would u recommend for email mining? Thank you

  • Phoenix
    April 1, 2014 at 11:50 pm

    ADDITIONAL INFORMATION
    Do any of these have non-English capabilities?

  • Venkatesh
    July 29, 2014 at 12:52 am

    ADDITIONAL INFORMATION
    Hi buddy! Are there any attempts to do cloud based data analytics softwares? I think such a thing can solve the problem Phoenix had mentioned.

  • K R Chin
    January 25, 2015 at 6:14 pm

    ADDITIONAL INFORMATION
    I’d like to know if there are any data mining programs which could be used to predict terrorist activities or analyze material movements (shipping, purchases, and orders) to search for indicators of suspicious activity.

    I’m a security consultant and advisor, this sort of information would be useful in my consultations.

  • Mahrez
    March 5, 2015 at 4:00 pm

    ADDITIONAL INFORMATION
    Hi KR Chin,

    To predict any activity you need to know which variables you want to base your prediction on. You also need a historical data to run your predictive analysis and find the possible correlations between different event. I know that somewhere in the US the police uses crime predictions based on historical criminality data (new Orleans if I am not mistaken)…bottom line : you need data to get the info ! have fun 🙂

  • February 17, 2017 at 11:50 am

    ADDITIONAL INFORMATION
    See AdvancedMiner by Algolytics. They provide free/community version http://algolytics.com/products/advancedminer/

What's your reaction?
Love It
26%
Very Good
44%
INTERESTED
13%
COOL
3%
NOT BAD
5%
WHAT !
6%
HATE IT
3%
About The Author
imanuel