Predictive Analytics
Now Reading
What is Data Mining ?
0

What is Data Mining ?

What is Data Mining ?
4.4 (88.75%) 48 ratings

What is Data Mining ? Data Mining is the computational process of discovering patterns, trends and behaviors, in large data sets using artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Data mining is considered as a synonym for another popularly used term, known as KDD, knowledge discovery in databases. Data mining is an essential step in the process of predictive analytics. The process of mining and extraction of useful information from the existing data is an interdisciplinary work involving mathematicians, statisticians, computer programmers.

Data Mining

Data Mining

Data Mining Definition

The proper use of the term data mining is data discovery. But the term is used commonly for collection, extraction, warehousing, analysis, statistics, artificial intelligence, machine learning, and business intelligence. Statistics provide sufficient tools for data analysis and machine learning deals with the different learning methodologies. Before a data can be mined it needs to be cleaned. This removes the errors and ensures consistency. Data mining methods are generalization, characterization, classification, clustering, association, evolution, pattern matching, data visualization, and meta rule guided mining.

The knowledge discovery in databases is defined in various different themes.

Data Mining Definition- Simplified

(1) pre processing, (2) data mining, and (3) results validation.

Data Mining Process

Data Mining Definition- Simplified

Knowledge Discovery in Databases (KDD) process definition:

(1) Selection
(2) Pre processing
(3) Transformation
(4) Data Mining
(5) Interpretation/Evaluation.

Cross Industry Standard Process for Data Mining (CRISP-DM) definition:

(1) Business Understanding
(2) Data Understanding
(3) Data Preparation
(4) Modeling
(5) Evaluation
(6) Deployment

Data Mining Overall Plan

SAS Institute overall plan for data mining is known as SEMMA. This plan has 5 steps which are as follows: sample, explore, modify, model, and assess.

Step 1: Sample
Extract a portion of a large data set big enough to contain the significant information yet small enough to manipulate quickly.

Step 2: Explore
Search speculatively for unanticipated trends and anomalies so as to gain understanding and ideas.

Data Mining Knowledge Discovery

Data Mining Overall Plan

Step 3: Modify
Create, select, and transform the variables to focus the model construction process.

Step 4: Model
Search automatically for a variable combination that reliably predicts a desired outcome.

Step 5: Assess
Evaluate the usefulness and reliability of findings from the data mining process.

Tasks in Data Mining

The tasks in data mining are either automatic or semi automatic analysis of large volume of data which are extracted to check for previously unknown interesting patterns. These are cluster analysis, anomaly detection on unusual records and dependencies check using the association rule mining. This usually involves using database techniques such as spatial indices. These patterns thus identified provides a summary of the input data, and can used in further analysis or in machine language and predictive analytics. The use of data mining methods for samples of data are known as data dredging, data fishing, and data snooping .Mining techniques are employed in different kinds of databases, including relational, transaction, object-oriented, spatial, and active databases.

Data mining involves six common classes of tasks:

1.Anomaly detection- This is the Outlier or deviation detection, where the identification of unusual data records or data errors that require further investigation are identified.

2.Association rule learning- This is also called dependency modeling which searches for relationships between variables. Also known as market basket analysis.

3.Clustering – Clustering is the task of discovering groups and structures in the data that are in some way or other similar without using known structures in the data.

4.Classification- Classification is the task of generalizing known structure to apply to new data.

5.Regression- Regression is the task with an objective to find a function which models the data with the least error.

6.Summarization- Summarization tasks provides a more compact representation of the data set, including visualization and report generation.

Data Mining Software

Orange Data mining, R Software Environment, RapidMiner, Weka Data Mining, KNIME, SpagoBI Business Intelligence, Anaconda, Shogun, ELKI, Scikit-learn, CMSR Data Miner, Fityk, mlpy, Dlib, Rattle GUI, GNU Octave, Pandas, Natural Language Toolkit, OpenNN, TANAGRA, Alteryx Project Edition, Apache UIMA, Vowpal Wabbit, MALLET, CLUTO, streamDM, DataMelt, Chemicalize.org, Jubatus, MiningMart, Databionic ESOM, Apache Mahout, TraMineR, ROSETTA, KEEL, ADaM, ML-Flex, Modular toolkit for Data Processing, Dataiku, SenticNet API, LIBSVM and LIBLINEAR, Lattice Miner, Gnome datamine tools, yooreeka, AstroML, jHepWork, ARMiner, arules are some of the top free data mining software.

Top Free Data Mining Software

Orange

Orange

RapidMiner, SAS Enterprise Miner, KNIME, Dataiku, IBM SPSS Modeler, Neural Designer, Oracle Data Mining ODM, Angoss Predictive Analytics, TIBCO Spotfire, AdvancedMiner, STATISTICA, Alteryx Analytics, Analytic Solver, OpenText Big Data Analytics, Portrait Predictive Analytics, GMDH Shell, Viscovery Software Suite, Microsoft SQL Server Integration Services, PolyAnalyst, GhostMiner, Civis Platform, TIMi Suite, pSeven, Lavastorm Analytics Engine, LIONoso, Rapid Insight Veera, HP Vertica Advanced Analytics, Rialto, Teradata Warehouse Miner, FICO Data Management Solutions, Salford Systems SPM, QIWare, Valo, Think Enterprise Data Miner  are some of the top data mining software.

Top Data Mining Software
Rapidminer

Rapidminer

 

Orange Data mining, R Software Environment, Weka Data Mining, Tableau Public, Arcadia Data, Microsoft R, ITALASSI, Shogun, Trifacta, ELKI, Scikit-learn, Data Applied, Lavastorm Analytics Engine, Gephi, DataMelt, TANAGRA, Julia, RapidMiner Starter Edition, SciPy, KNIME Analytics Platform Community, Dataiku DSS Community, Google Fusion Tables, Massive Online Analysis, NodeXL, DataPreparator, NetworkX, NumPy, OpenRefine, DataWrangler, PAW, ILNumerics, ROOT, DataCracker, Scilab, FreeMat, Ipython, jMatLab, SymPy, Fluentd, EasyReg, Matplotlib

More Information on Predictive Analysis Process

Predictive Analytics Process Flow

Predictive Analytics Process Flow

For more information of predictive analytics process, please review the overview of each components in the predictive analytics process: data collection (data mining), data analysis, statistical analysis, predictive modeling and predictive model deployment.

What's your reaction?
Love It
36%
Very Good
9%
INTERESTED
36%
COOL
5%
NOT BAD
14%
WHAT !
0%
HATE IT
0%
About The Author
imanuel