Predictive Analytics

Now Reading

What is Statistical Analysis ?

Statistical Analysis is the study of the collection, organization, analysis, interpretation and presentation of data. Statistical Analysis begins with the identification of process or population in consideration. The population is collection of observation of the process at various times known as at time series and data from each of the observation serves as a member of the overall group.

In Statistical Analysis, there are Descriptive statistics and Inferential statistics. Descriptive statistics summarize the population data in consideration by describing what was observed in the sample graphically or numerically. Numerical descriptors are mean and standard deviation for continuous data types. Frequency and percentage are more useful and used while describing categorical data.

To draw inferences about the population represented inferential statistics, uses patterns in the sample data. This also takes in to consideration of randomness. In simple form, the inferences hypothesis testing consists of answering yes/no questions about the data. Inference can extend to forecasting, prediction. This can also include extrapolation and interpolation of time series or spatial data, and can also include data mining.

What is Statistical Analysis

Statistical Tests and Procedures

Some of the statistical tests and procedures used in predictive analytics are:

• Analysis of variance (ANOVA): ANOVA models are used to analyze the differences between group means and the variation among and between the groups.

• Chi-squared test: This is a hypothesis in where when the null hypothesis is true when the sampling distribution of the test statistic is a chi-squared distribution.

• Correlation: Correlation means the dependence between the statistical relationship between two random variables or two sets of data.

• Factor analysis : This describe the variability among observed and correlated variables with reference to factors which are unobserved variables.

• Mann–Whitney U : This a hypothesis that a particular population tends to have larger values than the other.

• Mean square weighted deviation (MSWD) : Measures of goodness of fit.

• Pearson product-moment correlation coefficient : This is a measure of the degree of linear dependence between two variables.

• Regression analysis : Estimating the relationships among variables.

• Spearman's rank correlation coefficient : Measure of statistical dependence between two variables.

• Student's t-test : This is used to determine if two sets of data are significantly different from each other.

• Time series analysis : This is a sequence of data points, measured at successive points in time.

• k-nearest neighbor algorithm (k-NN)is a non-parametric method for classification and regression, which predicts the objects values or class memberships based on the k closest training examples in the feature space.

• Majority classifier takes non anomalous data and incorporates it within its calculations. This ensures that the results produced by the predictive modeling system are as valid as possible.

• Group method of data handling algorithms for computer based mathematical modeling of multi-parametric datasets features fully automatic structural and parametric optimization of models. GMDH is used in fields such as data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition.

• Logistic regression is a technique in which unknown values of a discrete variable are predicted based on known values of one or more continuous and/or discrete variables.

• Uplift Modeling is a technique for modeling the change in probability caused by an action.

• Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong naive independence assumptions.

• Support vector machines re supervised learning models with associated learning algorithms that analyze data and recognize patterns, are used for classification and regression analysis.

Top Statistical Software

Top Free Statistical software

SAS University Edition adds a range of forecasting technologies

Top Free Statistical software

PAT Index™

SORT

SAS University Edition

Compare

9.5

5.7

GNU PSPP

Compare

9.0

8.0

Statistical Lab

Compare

8.1

7.7

Develve

Compare

7.5

7.8

Shogun

Compare

7.6

7.7

DataMelt

Compare

7.5

6.8

GNU Octave

Compare

7.5

6.1

SOFA Statistics

Compare

7.5

7.6

Dataplot

Compare

7.5

8.7

SciPy

Compare

7.5

8.8

Zelig

Compare

7.5

5.5

Scilab

Compare

7.5

6.9

Gretl

Compare

7.5

4.8

OpenStat

Compare

7.5

5.2

Past

Compare

7.5

7.3

MacAnova

Compare

7.5

6.3

MaxStat Lite version

Compare

7.5

6.3

SageMath

Compare

7.5

Epi Info

Compare

7.5

6.9

NIMBLE

Compare

7.5

6.5

Arc

Compare

7.5

7.7

ADaMSoft

Compare

7.5

—

CumFreq

Compare

7.5

8.0

OpenMx

Compare

7.5

7.4

Salstat

Compare

7.5

7.1

Statcato

Compare

7.5

6.3

Stan

Compare

7.5

6.5

IDAMS

Compare

7.5

5.7

OpenEpi

Compare

7.5

6.4

BV4.1

Compare

7.5

4.6

pbdR

Compare

7.5

7.9

GNU Data Language

Compare

7.5

3.1

Dap

Compare

7.5

6.4

Simfit

Compare

7.5

6.6

First Bayes

Compare

7.5

6.5

MicrOsiris

Compare

7.5

6.5

Ploticus

Compare

7.5

8.6

NCAR Command Language

Compare

7.5

5.4

Perl Data Language

Compare

7.5

9.1

Yorick

Compare

7.5

7.7

EasyReg

Compare

7.5

7.0

IVEware

Compare

7.5

4.4

ViSta

Compare

7.5

5.9

StatCVS

Compare

7.5

6.6

WinBUGS

Compare

7.5

7.3

JAGS

Compare

7.5

6.4

WINPEPI

Compare

7.5

6.2

ADMB

Compare

7.5

5.5

Top Free Statistical software

Top Statistical software

IBM SPSS Statistics

Top Statistical Software

PAT Index™

SORT

IBM SPSS Modeler

Compare

9.5

7.8

Minitab

Compare

7.9

7.5

Develve

Compare

7.5

7.8

XLSTAT

Compare

7.6

7.1

Forecast Pro

Compare

7.7

7.3

Analyse-it

Compare

7.8

8.0

SmartPLS

Compare

7.6

9.4

Regression Analysis of Time Series

Compare

7.7

8.2

SAS Visual Statistics

Compare

7.7

8.1

Stata

Compare

7.7

8.1

AcaStat

Compare

7.8

8.0

MATLAB

Compare

7.6

7.8

EViews

Compare

7.6

7.8

JMP

Compare

7.6

8.4

Mathematica

Compare

7.6

8.8

Qlucore

Compare

7.6

7.0

MedCalc

Compare

7.5

9.1

NCSS

Compare

7.6

8.0

EasyFit

Compare

7.6

7.2

MaxStat

Compare

7.6

8.2

Data Desk

Compare

7.6

7.9

StatPlus

Compare

7.5

7.8

GAUSS

Compare

7.6

7.5

Statgraphics Centurion

Compare

7.6

8.5

TurboStats

Compare

7.6

7.7

NLOGIT

Compare

7.6

7.9

Analytica

Compare

7.6

8.5

SigmaPlot

Compare

7.5

7.7

PolyAnalyst

Compare

7.6

7.7

GeneXproTools

Compare

7.6

7.8

WinSPC

Compare

7.6

8.6

GraphPad InStat

Compare

7.6

7.9

UNISTAT

Compare

7.5

8.0

StatsDirect

Compare

7.5

8.1

Statwing

Compare

7.5

8.2

StatXact

Compare

7.5

8.2

statistiXL

Compare

7.5

8.0

Statistix

Compare

7.6

8.1

Number Analytics

Compare

7.5

LIMDEP

Compare

7.5

6.8

SUDAAN

Compare

7.5

8.4

PASS

Compare

7.5

7.8

NLREG

Compare

7.5

5.6

ESBStats

Compare

7.5

8.0

Genedata Analyst

Compare

7.5

4.6

Origin

Compare

7.6

7.9

Maple

Compare

7.6

7.9

SuperCROSS

Compare

7.5

8.1

More Information on Predictive Analysis Process

Predictive Analytics Process Flow

For more information of predictive analytics process, please review the overview of each components in the predictive analytics process: data collection (data mining), data analysis, statistical analysis, predictive modeling and predictive model deployment.

What is Statistical Analysis?

What are Statistical Tests and Procedures?

Some of the statistical tests and procedures used in predictive analytics are: Analysis of variance (ANOVA), Chi-squared test, Correlation, Factor analysis, Mann–Whitney U, Mean square weighted deviation (MSWD), Pearson product-moment correlation coefficient, Regression analysis, Spearman's rank correlation coefficient, Student's t-test, Time series analysis and many more