Scikit-learn is an open source machine learning library for the Python programming language.It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Classification : Identifying to which category an object belongs to Applications: Spam detection, Image recognition. Algorithms: SVM, nearest neighbors, random forest. Regression : Predicting a continuous-valued attribute associated with an object. Applications: Drug [...]
Shogun is a free, open source toolbox written in C++. It offers numerous algorithms and data structures for machine learning problems. The focus of Shogun is on kernel machines such as support vector machines for regression and classification problems. Shogun also offers a full implementation of Hidden Markov models.The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms.
It now offers features that span the whole space of Machine Learning methods, [...]
The ELKI framework is written in Java and built around a modular architecture. Most currently included algorithms belong to clustering, outlier detection and database indexes. A key concept of ELKI is to allow the combination of arbitrary algorithms, data types, distance functions and indexes and evaluate these combinations. When developing new algorithms or index structures, the existing components can be reused and combined.
ELKI is modeled around a database core, which uses a vertical data layout that stores data in column groups (similar to column families in NoSQL databases). This database core provides nearest [...]
Dataiku DSS is the collaborative data science platform that enables teams to explore, prototype, build, and deliver their own data products more efficiently. Dataiku DSS provides an interactive visual interface where they can point, click, and build or use languages like SQL to data wrangle, model, easily re-run workflows, visualize results, and get up-to-date insights on demand.
Dataiku DSS provides tools to draft data preparation and modelisation in seconds, that wish to leverage their favorite ML libraries (scikitlearn, R, MLlib, H2O, and so on), and that rely on automating their work in a completely customizable [...]
SpagoBI Business Intelligence : SpagoBI is an Open Source Business Intelligence suite, which offers a large range of analytical functions, a functional semantic layer and a set of advanced data visualization features including geospatial analytics. The modules of SpagoBI suite are SpagoBI Server, SpagoBI Studio, SpagoBI Meta and SpagoBI SDK.SpagoBI Server is the main module of the suite, offering the core and analytical functionalities. It provides two conceptual models which are called Analytical Model and Behavioural Model, administration tools and cross-platform services.SpagoBI Business Intelligence
SpagoBI Studio [...]
RapidMiner : RapidMiner provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics and is used for business and industrial applications as well as for research, education, training, rapid prototyping, and application development. RapidMiner supports all steps of the data mining process including results visualization, validation and optimization.RapidMiner uses a client/server model with the server offered as Software as a Service or on cloud infrastructures. RapidMiner provides data mining and machine learning procedures including: data loading and transformation, data preprocessing and [...]
R Software Environment : R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Some of the functionalities include an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis, graphical facilities for data analysis and display either directly at the computer or on hardcopy, and well developed, simple and effective [...]
Weka Data Mining : Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka features include machine learning, data mining, preprocessing, classification, regression, clustering, association rules, attribute selection, experiments, workflow and visualization. Weka is written in Java, developed at the University of Waikato, New Zealand. All of Weka’s techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes Weka provides [...]