Bigdata

Now Reading

Tools for Discovering Patterns in Data

Elder Research is presenting a 2-day course, “Tools for Discovering Patterns in Data: Extracting Value from Tables, Text, and Links,” on September 22 - 23 in Charlottesville, Virginia. Drawing on more than 20 years of experience, Dr. John Elder will explain techniques employed by experts to solve challenging problems.This course describes powerful analytic methods for classification and estimation drawn from Statistics and Data Mining. Dr. Elder will explain leading algorithms, compare their merits, and demonstrate their effectiveness on practical applications .Also review classical statistical techniques, and outline how they are modified and combined into modern methods. The course emphasizes practical advice and the essential techniques of Resampling, Visualization, and Ensembles. Actual scientific and business examples illustrate techniques employed by expert analysts. Key aspects of mining text and links will also be outlined, and major strengths of leading commercial software tools for Data Mining will be compared.

Elder Research 2 day course

Tools for Discovering Patterns in Data: Extracting Value from Tables, Text, and Links

Course Outline

I. Pattern Discovery: An Overview

Inducing Models from Data: Benefits and Dangers
Example Projects from Science and Business
Characteristics of successful projects
Leading Software Tools and Vendors

II. Classical Statistical Techniques (brief review)

Regression
Discriminant Analysis & Principle Components
Nearest Neighbors & Kernels

III. Modern Methods

Neural & Polynomial Networks
Decision Trees & MARS (Regression Splines)

IV. Key General Tools

Scientific Visualization: Grand Tour, Projection Pursuit, limitations
Bootstrapping/Resampling: Essential!
Bayes' Rule
Optimization: local and global
Overfit Control: Complexity Penalty, Smoothing, Shrinking, Generalized Degrees of Freedom

V. Data Trouble-Shooting

Case Diagnostics (Outlying, Influential, Leverage, & Missing points)
Feature Creation and Selection

VI. Text Mining

Stemming, Collocation, & Association Networks
Statistical vs. Language-dependent methods
“Bag of Words” & Vector Space
Focused Crawling & Active Learning

VII. Social Network Analysis

The power of the "network effect"
Visualization & modeling tools and examples

VIII. Comparing and Combining Algorithms

Adaptive model structure
Matching an algorithm to your application
Experimental test results
Combining models to improve accuracy
Bayesian Model Averaging
Bagging & Boosting
Why Ensembles work

IX. Top 10 Data Mining Mistakes

Lack data
Focus on Training
Rely on 1 technique
Ask the wrong question
Listen (only) to the data
Future leakage
Discount pesky cases
Extrapolate
Answer every inquiry
Sample without care
Believe the best model

Intended Audience

Those who work with data and wish to understand and use recent developments in predictive analytics. At the conclusion of this course, you should be able to discern the basic strengths of competing methods and select the appropriate tools for your applications. Space is limited, so reserve your place now! To register, call 434-973-7673, or download a registration form. You can also register online.

Course is presented by John F. Elder IV, Ph.D., head of a top data mining consulting team based in Charlottesville, Virginia, and Washington DC. Founded in 1995, Elder Research, Inc. focuses on commercial, investment, and security applications of advanced analytics including stock selection, text mining, social networks, image recognition, biometrics, process optimization, drug efficacy, credit scoring, and fraud detection.