DATASCIENCE COURSE CONTENT :
1.Descriptive Statistics and Probability Distributions:
Introduction about Statistics
Different Types of Variables
Measures of Central Tendency with examples
Measures of Dispersion
Probability & Distributions
Probability Basics
Binomial Distribution and its properties
Poisson distribution and its properties
Normal distribution and its properties
2.Inferential Statistics and Testing of Hypothesis
Sampling methods
Different methods of estimation
Testing of Hypothesis & Tests
Analysis of Variance
3.Covariance & Correlation
->> Predictive Modeling Steps and Methodology with Live example:
Data Preparation
Exploratory Data analysis
Model Development
Model Validation
Model Implementation
4.Supervised Techniques:
->> Multiple linear Regression
Linear Regression - Introduction - Applications
Assumptions of Linear Regression
Building Linear Regression Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global
hypothesis etc)
Validation of Linear Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers
etc)
Interpretation of Results - Business Validation - Implementation on new data
Real time case study of Manufacturing and Telecom Industry to estimate the future revenue using the
models
->> Logistic Regression - Introduction - Applications
Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
Building Logistic Regression Model
Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test,
Gini, KS, Misclassification etc)
Validation of Logistic Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, ROC Curve)
Probability Cut-offs, Lift charts, Model equation, drivers etc)
Interpretation of Results - Business Validation - Implementation on new data
Real time case study to Predict the Churn customers in the Banking and Retail industry
->> Partial Least Square Regression
Partial Least square Regression - Introduction - Applications
Difference between Linear Regression and Partial Least Square Regression
Building PLS Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global
hypothesis etc)
Interpretation of Results - Business Validation - Implementation on new data
Sharing the real time example to identify the key factors which are driving the Revenue
5.Variable Reduction Techniques
->> Factor Analysis
->> Principle component analysis
Assumptions of PCA
Working Mechanism of PCA
Types of Rotations
Standardization
Positives and Negatives of PCA
6.Supervised Techniques Classification:
->> CHAID
->> CART
->> Difference between CHAID and CART
->> Random Forest
Decision tree vs. Random Forest
Data Preparation
Missing data imputation
Outlier detection
Handling imbalance data
Random Record selection
Random Forest R parameters
Random Variable selection
Optimal number of variables selection
Calculating Out Of Bag (OOB) error rate
Calculating Out of Bag Predictions
->> Couple of Real time use cases which are related to Telecom and Retail Industry. Identification of
the Churn.
7.Unsupervised Techniques:
->> Segmentation for Marketing Analysis
Need for segmentation
Criterion of segmentation
Types of distances
Clustering algorithms
Hierarchical clustering
K-means clustering
Deciding number of clusters
Case study
->> Business Rules Criteria
->> Real time use case to identify the Most Valuable revenue generating Customers.
8.Time series Analysis:
->> Time Series Components( Trend, Seasonality, Cyclicity and Level) and Decomposition
->> Basic Techniques
Averages,
Smoothening etc
Advanced Techniques
AR Models,
ARIMA
UCM
->> Hybrid Model
->> Understanding Forecasting Accuracy - MAPE, MAD, MSE etc
->> Couple of use cases, To forecast the future sales of products
9.Text Analytics
->> Gathering text data from web and other sources
Processing raw web data
Collecting twitter data with Twitter API
->> Naive Bayes Algorithm
Assumptions and of Naïve Bayes
Processing of Text data
Handling Standard and Text data
Building Naïve Bayes Model
Understanding standard model metrics
Validation of the Models (Re running Vs. Scoring)
->> Sentiment analysis
Goal Setting
Text Preprocessing
Parsing the content
Text refinement
Analysis and Scoring
->> Use case of Health care industry, To identify the sentiment of the patients on Specified hospital
by extracting the data from the TWITTER.
10.Visualization Using Tableau:
->> Live connectivity from R to Tableau
Generating the Reports and Charts
Data Science Online Training Course Content
Module:1 – Descriptive & Inferential Statistics
1. Turning Data into Information
• Data Visualization
• Measures of Central Tendency
• Measures of Variability
• Measures of Shape
• Covariance, Correlation
• Using Software-Real Time Problems
2.Probability Distributions
• Probability Distributions: Discrete Random Variables
• Mean, Expected Value
• Binomial Random Variable
• Poisson Random Variable
• Continuous Random Variable
• Normal distribution
• Using Software-Real Time Problems
3.Sampling Distributions
• Central Limit Theorem
• Sampling Distributions for Sample Proportion, p-hat
• Sampling Distribution of the Sample Mean, x-bar
• Using Software-Real Time Problems
4.Confidence Intervals
• Statistical Inference
• Constructing confidence intervals to estimate a population Mean,
Variance, Proportion
• Using Software-Real Time Problems
5.Hypothesis Testing
• Hypothesis Testing
• Type I and Type II Errors
• Decision Making in Hypothesis Testing
• Hypothesis Testing for a Mean, Variance, Proportion
• Power in Hypothesis Testing
• Using Software-Real Time Problems
6.Comparing Two Groups
• Comparing Two Groups
• Comparing Two Independent Means, Proportions
• Pairs wise testing for Means
• Two Variances Test(F-Test)
• Using Software-Real Time Problems
7. Analysis of Variance (ANOVA)
• One-Way and Two-way ANOVA
• ANOVA Assumptions
• Multiple Comparisons (Tukey, Dunnett)
• Using Software-Real Time Problems
8.Association Between Categorical Variables
• Two Categorical Variables Relation
• Statistical Significance of Observed Relationship / Chi-Square Test
• Calculating the Chi-Square Test Statistic
• Contingency Table
• Using Software-Real Time Problems
Module:2 – Applied Regression Methods
1.Simple Linear Regression(SLR)
Prerequisite Mathematics
The Simple Linear Regression Model
What is The Common Error Variance?
The Coefficient of Determination
Hypothesis Test for the Population Correlation Coefficient
Using Software-Real Time Problems
2.SLR Model Evaluation
Inference for the Population Intercept and Slope
The Analysis of Variance (ANOVA) table and the F-test
Equivalent linear relationship tests
Decomposing the Error
The Lack of Fit F-test
Using Software-Real Time Problems
3.SLR Estimation & Prediction
Confidence Interval for the Mean Response
Prediction Interval for a New Response
Using Software-Real Time Problems
4.SLR Model Assumptions
Model Assumptions Diagnostics
Using Software-Real Time Problems
5.Multiple Linear
Regression(MLR)
The Multiple Linear Regression Model
Using Software-Real Time Problems
6.MLR Model Evaluation
The General Linear Test
Sequential (or Extra) Sums of Squares
The Hypothesis Tests for the Slopes
Partial R-squared
Lack of Fit Testing in the Multiple Regression Setting
Using Software-Real Time Problems
7.MLR Estimation, Prediction & Model Assumptions
Confidence Interval for the Mean Response
Prediction Interval for a New Response
Model Assumptions Diagnostics
Using Software-Real Time Problems
8.Categorical Predictors
Coding Qualitative Variables
Additive Effects
Interaction Effects
Using Software-Real Time Problems
9.Data Transformations
Using Software-Real Time Problems
10. Model Building
Forward Selection/Backward Elimination
Stepwise Regression
Adjusted R-Sq, Mallows Cp, PRESS, AIC, BIC, SBC, AICC
Outliers and Influential Data Points
Cooks Distance/DIFBETAS/DFFITS
Using Software-Real Time Problems
Module:3 – Applied Time Series Analysis
1. Time Series Basics
• Overview
• ACF and AR(1) Model
2. MA Models, PACF
• Moving Average Models (MA models)
• PACF
• Using Software-Real Time Problems
3. ARIMA models
• Non-seasonal ARIMA
• Diagnostics
• Forecasting
• Using Software-Real Time Problem
4. Seasonal Models
• Seasonal ARIMA
• Identifying Seasonal Models
• Using Software-Real Time Problems
5. Smoothing and Decomposition Methods
• Decomposition Models
• Smoothing Time Series
• Using Software-Real Time Problems
6. Periodogram
• Periodogram
• Using Software-Real Time Problems
7. Regression with ARIMA errors; CCF; 2 Time Series
• Linear Regression Models with Autoregressive Errors
• CCF and Lagged Regressions
• Using Software-Real Time Problems
Module:4 – Machine Learning
1.Introduction
• Application Examples
• Supervised Learning
• Unsupervised Learning
2.Regression Shrinkage Methods
• Ridge RegressionüLasso
• Using Software-Real Time Problems
3.Classification
• Logistic Regression
• Discriminant Analysis
• Nearest-Neighbor Methods
• Using Software-Real Time Problems
4. Tree-based Methods
• The Basics of Decision Trees
• Regression Trees
• Classification Trees
• Ensemble Methods
• Bagging, Boosting, Bootstrap, Random Forests
• Using Software-Real Time Problems
5. Neural Networks
• Introduction
• Single Layer Perceptron
• Multi-layer Perceptron
• Forward Feed and Backward Propagation
• Using Software-Real Time Problems
6.Support Vector Machine
• Support Vector Classifier
• Support Vector Machine
• SVMs with More than Two Classes
• Using Software-Real Time Problems
7.Dimension Reduction Methods
• Principal Components Regression (PCR)
• Partial Least Squares (PLS)
• Using Software-Real Time Problems
8.Association rules
• Market Basket Analysis
• Using Software-Real Time Problems
Module:5 – SAS/R Programming
1.Base SAS
• Working with SAS program syntax
• Examining SAS data sets
• Accessing SAS libraries
• Producing Detail Reports
• Sorting and grouping report data
• Enhancing reports
• Formatting Data Values
• Creating user-defined formats
• Reading SAS Data Sets
• Customizing a SAS data set
• Handling missing data
• Manipulating Data
• Combining SAS Data Sets
• Creating Summary Reports
• Controlling Input and Output
• Summarizing Data
• Reading Raw Data Files
• Data Transformations
• Debugging Techniques
• Using the PUTLOG statement
• Processing Data Iteratively
• Restructuring a Data Set
• Creating and Maintaining Permanent Formats
2.SAS SQL
• Working with SAS program syntax
• Basic Queries
• Examining SAS data sets
• Sub-Queries
• Accessing SAS libraries
• Joins (SQL)
• Producing Detail Reports
• Operators
• Sorting and grouping report data
• Creating Tables and Views
• Enhancing reports
• Managing Tables
• Formatting Data Values
3. SAS Macros
• Creating user-defined formats
• Macro Variables
• Reading SAS Data Sets
• Definitions
• Customizing a SAS data set
• Data Step and SQL Interfaces
• Handling missing data
4. R Programming
• Manipulating Data
• RCMDR Package
• Combining SAS Data Sets
• Rattle Package
• Creating Summary Reports
data science online training