0% found this document useful (0 votes)

140 views

Business Analytics: Advance: Simple & Multiple Linear Regression

Uploaded by

Ketan Bhalerao

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

140 views

Business Analytics: Advance: Simple & Multiple Linear Regression

Uploaded by

Ketan Bhalerao

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Business Analytics:

Advance
SIMPLE & MULTIPLE LINEAR REGRESSION

Copyright © 2016 Defour Analytics Pvt. Ltd

Recap
 What is Statistics?
STATISTICS
 Here,

Inferential Statistics
Descriptive Statistics Drawing conclusions
Presenting, organizing about a population based
and summarizing data on data observed in
sample

Measure of central Measure of dispersion

Summarizing data
tendency

COPYRIGHT 2017@ DEFOUR ANALYTICS PVT LTD 2

Recap

Standardization of scores

Normal Distribution

Hypothesis Testing

Basic R codes
Outline of the Supervised Learning Program
Day Topics (Professionals) Plan
1 Introduction to Analytics and its Applications Program Duration: 3 months
2 Basics of Data/Statistics/R (Analytical tool) - I On Every Saturday & Sunday from
3 Basics of Data/Statistics/R/Alteryx Demo(Analytical tool) - II 10 am to 1 pm IST
4 Linear Regression 8 Weeks of support after the
completion of the program (12 hrs,
5 Logistic Regression based on pre-booked appointment)
6 Clustering Change in dates will be notified in
7 Decision Tree advance as needed
8 Time series Modelling
9 Practical Session on Use cases
10 Market Basket Analysis
11 Text Mining
12 Data Visualization

COPYRIGHT 2017@ DEFOUR ANALYTICS PVT LTD 4

Regression: Agenda
What is regression?

Types of regression analysis

Purpose of regression analysis

Understanding Simple Linear Regression

Understanding Multiple Linear Regression

What is Regression?
 A statistical measure that attempts to determine the strength of the relationship between one
dependent variable (usually denoted by Y) and a series of other changing variables (known as
independent variables).
Y = f (x)
 It consists of 3 stages –
(1) analysing the correlation and directionality of the data, (correlation & covariance)
(2) estimating the model, i.e., fitting the line,
(3) evaluating the validity and usefulness of the model.
Covariance & Correlation
 Covariance: - is a measure of the joint variability of two random variables. It is a measure
which helps to find out the direction of relationship between two variables.
i.e. what happens to Y when X increases or decreases?

 Correlation: - Correlation signifies the strength and direction of linear relationship. (It only
captures linear relationship)
Population correlation is denoted by ρ. (rho)
Sample correlation is denoted by r.

The covariance is very hard to compare. E.g. when you compare

height and weight in different units (mtr-Kg & inch-Kg) the covariance
will differ.
The solution to this is to normalize the covariance by removing its unit
and get the values between -1 and 1, which is correlation.
Features of r (correlation)
 It is Unit free.
Ranges between -1 and 1.
a) The closer to 1, the stronger the
positive linear relationship.
b) The closer to -1, the stronger the
negative linear relationship.
c) The closer to 0, the weaker the linear
relationship.
Types of regression analysis
Type of
Sr. No. Type
Dependent Variable (DV) Independent Variable (IV)

1 Simple Linear Regression Interval or Ratio Interval or Ratio or dichotomous

2 Multiple Linear Regression Interval or Ratio Interval or Ratio or dichotomous

3 Logistic Regression Binary Interval or Ratio or dichotomous

4 Ordinal Regression Ordinal nominal or dichotomous

5 Multinomial Regression Nominal Interval or Ratio or dichotomous

Purpose of regression analysis
 The purpose of regression analysis is to analyse relationships among variables.
 Usually, the investigator seeks to ascertain the causal effect of one variable upon another.
(causal analysis)
 The analysis is carried out through the estimation of a relationship

y = f(x1, x2,..., xk)

 The results serve the following two purposes:
 Answer the question of how much y changes with changes in each of the xs (x1, x2,..., xk ). (forecasting an

effect / impact of an effect)

 Forecast or predict the value of y based on the values of the xs. (trend forecasting)
Understanding Simple Linear Regression
In Mathematics, Linear equation is :
Y= mx +b

We call it linear as the equation

represents a straight line.
Dependent Independent
Variable Variable

• Y = Dependent variable
• X = Independent variable
• b0 = Intercept or constant
• b1 = x’s slope or coefficient
• e = error term
Term use in regression analysis
 Explained variance = R2 (coefficient of determination).
 Unexplained variance = residuals (error).
 Adjusted R-Square = reduces the R2 by taking into account
the sample size and the number of independent variables in the
regression model (It becomes smaller as we have fewer
observations per independent variable).
 Standard Error of the Estimate (SEE) = a measure of the
accuracy of the regression predictions. It estimates the
variation of the dependent variable values around the
regression line.
Term use in regression analysis
 Total Sum of Squares (SST) = total amount of variation that
exists to be explained by the independent variables. TSS = the
sum of SSE and SSR.
 Sum of Squared Errors (SSE) = the variance in the dependent
variable not accounted for by the regression model = residual.
The objective is to obtain the smallest possible sum of squared
errors as a measure of prediction accuracy.
 Sum of Squares Regression (SSR) = the amount of
improvement in explanation of the dependent variable
attributable to the independent variables.
Term use in regression analysis
Total variation is made up of two parts:

SST  SSE  RSS

Total sum of Regression Sum of Squares
Sum of Squared Errors
Squares Also known as
Square Sum of Regression SSR

SST   ( y  y ) 2
SSE   ( y  yˆ ) 2 SSR   ( yˆ  y ) 2

Where:
y
= Average value of the dependent variable
ŷy = Observed values of the dependent variable
= Estimated value of y for the given x value
Coefficient of Determination)

The coefficient of determination is the ratio of total variation in the dependent variable that is

explained by variation in the independent variable.

R 
2 SSR sum of squares error explained by regression
 where 0  R 1 2
SST total sum of squares

i.e. if = 0.70 than 70% variance in Y is explained by X.

For model to be good / acceptable value should be closer to 1.

Note: - In the single independent variable case, the coefficient of determination is

R2  r 2

Where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Coefficient of determination (R2) and Adjusted R2
 Coefficient of determination(R2) can also be used to test the significance of the coefficients
collectively apart from using F-test.
SST - SSE RSS Sum of Squares explained by regression
R2   
SST SST Total Sum of Squares

 The drawback of using Coefficient of determination is that the value of the coefficient of
determination always increases as the number of independent variables are increased even if the
marginal contribution of the incoming variable is statistically insignificant.
 To take care of the above drawback, coefficient of determination is adjusted for the number of
independent variables taken. This adjusted measure of coefficient of determination is called adjusted
R2  n  1  2 
 Adjusted R is given by the following formula:
2 Ra
2
 1  
 n  k 1   1  R 
  
where
n = Number of Observations
k = Number of Independent Variables
= Adjusted R2
Multiple Linear Regression
 “Multiple regression” is a technique that allows additional
factors to enter the analysis separately so that the effect of
each can be estimated.

 It is valuable for quantifying the impact of various

simultaneous influences upon a single dependent variable.
Multiple Linear Regression
Multiple Regression:

= + + +…..+ +e

Where:
= the variable that we are trying to predict(DV)
= the variable that we are using to predict Y(IV)
= the intercept
= the slope (Coefficient of X1)
e = the regression residual (error term)
Assumptions in Multiple Regression Analysis

Linearity of the phenomenon measured.

Constant variance of the error terms.
Independence of the error terms.
Normality of the error term distribution.
Error term : e = ( – )
Typical Applications of Regression Analysis
Building of models to ascertain the pattern/behaviour of certain
performance measures
 Asset Management :- Predict performance, Maintenance cost
 Production operations :- Proactive alerts, Likelihood of Event (e.g. Trip,
Failure,…)
 Frequently used statistical tool in Asset-Liability Management, credit
scoring, etc
 Demand Forecasting :- Retail sales, ambulance dispatches, market value
of some product
 To model residential home prices as a function of home’s living area
 Etc…
Residual Analysis
Following assumptions for residual analysis should be checked.
 They are independent
 They are normally distributed
They have a constant variance σ2 for all settings of the independent
variables (Homoscedasticity)
They have a zero mean.
Multicollinearity
 Significant problem faced in the Regression Analysis is when the independent variables or the linear
combinations of the independent variables are correlated with each other.

 This correlation among the independent variables is called Multicollinearity which creates problems in
conducting t-statistic for statistical significance.

High correlation among the independent variables suggests the presence of multicollinearity but lower
values of correlations doesn't omit the chances of presence of multicollinearity.

 The most common method of correcting multicollinearity is by systematically removing the

independent variable until multicollinearity is minimized.
In R we use function called vif (variance inflation factor) to check if multicollinearity is present or not.
Normally, if vif < 5 than no multicollinearity
if vif > 5 than multicollinearity is present
Heteroskedasticity
 When the requirement of a constant variance is violated,we have a condition of
heteroskedasticity.
 We can diagnose heteroskedasticity by plotting the residual against the predicted y or by
Breusch-Pagan chi-square test.
 Breusch-Pagan test statistic follows a chi-square distribution with k degrees of freedom,
where k is the number of independent variables.

BP Chi Square Test Statistic  n  Rresid

where
2 n: number of observations
Rresid
:Coefficient of determination when residuals are regressed with independent
variables
CASE STUDY

Copyright © 2016 Defour Analytics Pvt. Ltd

Use Case – Simple Linear Regression
 This Sales data set contains Sales data in lakhs and Advertising cost in thousands.

This is a monthly data of sales and advertising cost for a particular product

 We need to predict, for a given new data for advertising cost, How much will be the
sales.
R code - Simple Linear Regression
setwd("E:\\Unnati\\Webinar\\Linear Regression") # Setting the working directory

sales.data = read.csv("Data.csv") # Reading sales data

View(sales.data)
head(sales.data)
plot(sales.data) # We would like to see the relationship between advertising cost and sales.
cor(sales.data) # correlation between variables
# And want to predict for how much sales will be incured, for advertising cost
simple.model <- lm(Sales.l~Adv.cost.k, sales.data) # creating Linear model
summary(simple.model) # checking R-square value and other parameters
new_data = read.csv("new_data.csv") # importing new data for prediction
View(new_data)

predictions <- predict.lm(simple.model,newdata = new_data) # Predicting the values

predictions <- as.data.frame(predictions) # Converting predictions to dataframe for next

step

cbind(new_data,predictions) # Combining new data and prediction values

#Checking Assumptions
qqnorm(simple.model$residuals) # Checking normality of residuals
hist(simple.model$residuals)
shapiro.test(simple.model$residuals)
Use Case – Multi-variate Regression
 The Walmart data set contains 200 entries of customer experience. This was
collected by a survey.

 In this data, we need to find out:

 First, What are the variables on which customer satisfaction depends?

 Secondly, Using those dependent variables, we need to build a regression model using some
training data and validate the same for testing data. This model will used for predicting
customer satisfaction.
R code – Multivariate Regression cont.
# Setting the working directory
setwd("E:\\Courses_material\\Modules\\Foundation_Final\\Day 4\\use case")
########################## Multi-variate Regression ##################################
# Importing and viewing the dataset
walmart.data <- read.sas7bdat(file="walmart.sas7bdat")
View(walmart.data) # the dataset contains 200 observations and 14 variables.

set.seed(25) # saving random generating variables in R (generating in next step)

temp <- sample(c("train","test"),size = nrow(walmart.data),replace = T, prob = c(0.8,0.2))
# creating a vector of size equal to observations of dataset - 80% for training & 20% for model validation
# Now creating training and testing dataset from orginal walmart dataset
training.data <- walmart.data[temp=="train",]
testing.data <- walmart.data[temp=="test",]

dim(training.data) # this function tells the dimensions of the data frame

dim(testing.data)

indp.var <- colnames(training.data[,2:14]) # copying the variable names in indp.var

indp.var <- paste(indp.var, collapse = " + ")

equation <- paste("Customer_Satisfaction ~ ",indp.var)

train.model <- lm(my.formula,training.data) # creating first model with all variables
my.formula <- as.formula(equation) # Converting the text format into formula format

cor(walmart.data) # Seeing correlation with all variables

train.model <- lm(my.formula,training.data) # creating first model with all variables
summary(step.train.model)
vif(train.model) # Checking for multicollinearity among variables

# If the vif values are greater than 5 (in our case, standards might differ for different case) for any varible, Then
multicollinearity is present.
# In our case multicollinearity is present as values are more than 5 for some variables
# Now, we will go for step wise regression. Let R decide which variables to be selected

step.train.model <- step(train.model,direction = "both")

summary(step.train.model) # Summary of the model looks good.

train.model <- lm(Customer_Satisfaction ~ Product_Quality+E_Commerce+Technical_Support+

Product_Line+Salesforce_Image+Order_Billing+Price_Flexibility ,training.data) # creating first model
with all variables

summary(train.model)

train.model1 <- lm(Customer_Satisfaction ~ Product_Quality+E_Commerce+

Product_Line+Salesforce_Image+Price_Flexibility ,training.data) # creating first model with all variables

summary(train.model1)
Now checking the assumption of Linear regression

vif(train.model1) # Checking multicollinearity

qqnorm(train.model1$residuals) # Checking normality of residuals

hist(train.model1$residuals)

durbinWatsonTest(step.train.model) # Checking autocorrelation of residuals

test.predict <- predict.lm(train.model1,newdata = testing.data)

test.predict
Accomplishments today!
What is regression?

Types of regression analysis

Purpose of regression analysis

Understanding Simple Linear Regression

Understanding Multiple Linear Regression

Use case with R

Quiz
1) Which of the graph below has very strong positive correlation?

A B C
Quiz
2) It is observed that there is a very high correlation between math test scores and amount of physical exercise
done by a student on the test day. What can you infer from this?

1. High correlation implies that after exercise the test scores are high.
2. Correlation does not imply causation.
3. Correlation measures the strength of linear relationship between amount of exercise and test scores.

A) Only 1
B) 1 and 3
C) 2 and 3
D) All the statements are true
Quiz
3) If the correlation coefficient (r) between scores in a math test and amount of physical exercise by a student is
0.86, what percentage of variability in math test is explained by the amount of exercise?
A) 86%
B) 74%
C) 14%
D) 26%
4) A regression analysis between weight (y) and height (x) resulted in the following least squares line: y = 120 +
5x. This implies that if the height is increased by 1 inch, the weight is expected to
A) increase by 1 pound
B) increase by 5 pound
C) increase by 125 pound
D) None of the above
THANK YOU

Homework 4
No ratings yet
Homework 4
4 pages
Association Rule Mining For Healthcare Data Analysis
No ratings yet
Association Rule Mining For Healthcare Data Analysis
16 pages
BRM Data Analysis Techniques
No ratings yet
BRM Data Analysis Techniques
53 pages
Problems Encountered by Researchers in India
No ratings yet
Problems Encountered by Researchers in India
9 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
35 pages
Sri Guru Tegh Bahadur Institute of Management & Information Technology (Ggsipu)
No ratings yet
Sri Guru Tegh Bahadur Institute of Management & Information Technology (Ggsipu)
63 pages
Common Statistical Tests
No ratings yet
Common Statistical Tests
14 pages
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
Assignment: Statistics and Probability
100% (1)
Assignment: Statistics and Probability
24 pages
Association Rule in Data Mining
No ratings yet
Association Rule in Data Mining
4 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Descriptive Statistics: Numerical Summary Measures: Single Numbers Which Quantify The
No ratings yet
Descriptive Statistics: Numerical Summary Measures: Single Numbers Which Quantify The
85 pages
Data Validation & Research
No ratings yet
Data Validation & Research
41 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
Discriminant Analysis Chapter-Seven
No ratings yet
Discriminant Analysis Chapter-Seven
7 pages
Mann Whitney U Test
No ratings yet
Mann Whitney U Test
13 pages
Elementary Statistics Project
No ratings yet
Elementary Statistics Project
1 page
Hypothesis Testing in The Multiple Regression
No ratings yet
Hypothesis Testing in The Multiple Regression
23 pages
Multivariate Analysis IBS
No ratings yet
Multivariate Analysis IBS
20 pages
Regression Analysis in Excel
No ratings yet
Regression Analysis in Excel
8 pages
Project
No ratings yet
Project
64 pages
Single Mean Large Sample Practice Problems
No ratings yet
Single Mean Large Sample Practice Problems
8 pages
Spss Presentation by Nafees Hashmi
100% (1)
Spss Presentation by Nafees Hashmi
35 pages
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
100% (1)
Univariate, Bivariate and Multivariate Methods in Corpus-Based Lexicography - A Study of Synonymy
614 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Sampling and Data Management
100% (1)
Sampling and Data Management
48 pages
Hypothesis Test
83% (6)
Hypothesis Test
15 pages
AnalytixLabs - Sports Equipment Case Study
No ratings yet
AnalytixLabs - Sports Equipment Case Study
2 pages
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
100% (1)
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
120 pages
Multiple Regression
No ratings yet
Multiple Regression
12 pages
Chapter 1. Biostatistics
No ratings yet
Chapter 1. Biostatistics
34 pages
Sample Design and Sampling Procedures
No ratings yet
Sample Design and Sampling Procedures
43 pages
Unit 5-Introduction To Biological Databases
No ratings yet
Unit 5-Introduction To Biological Databases
14 pages
Hypothesis Testing Parametric and Non Parametric Tests
No ratings yet
Hypothesis Testing Parametric and Non Parametric Tests
14 pages
Protein Immunoblotting: (Western Blotting)
No ratings yet
Protein Immunoblotting: (Western Blotting)
19 pages
[FREE PDF sample] IBM SPSS Statistics 25 Step by Step A Simple Guide and Reference 15th Edition Darren George ebooks
100% (1)
[FREE PDF sample] IBM SPSS Statistics 25 Step by Step A Simple Guide and Reference 15th Edition Darren George ebooks
32 pages
Statistics
No ratings yet
Statistics
163 pages
Correlation & Regression
100% (1)
Correlation & Regression
23 pages
0210108402-24-Ind426-2018-04-Ppt 3 Conjoint Analysis
No ratings yet
0210108402-24-Ind426-2018-04-Ppt 3 Conjoint Analysis
12 pages
PG Teacher Approval
No ratings yet
PG Teacher Approval
4 pages
Sampling Distribution and Simulation in R
No ratings yet
Sampling Distribution and Simulation in R
10 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Analysis of Quantitative Research
No ratings yet
Analysis of Quantitative Research
3 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Comparison of Clinical Trial Phases
No ratings yet
Comparison of Clinical Trial Phases
1 page
Data Science With R - Course Materials
No ratings yet
Data Science With R - Course Materials
25 pages
17ME-ENV-48 SPSS Practical
No ratings yet
17ME-ENV-48 SPSS Practical
41 pages
Hypothesis Testing Results Analysis Using SPSS RM Dec 2017
No ratings yet
Hypothesis Testing Results Analysis Using SPSS RM Dec 2017
66 pages
Notes For Mba (Business Research-524) : Q-1 What Is Business Research? Define / Types of Business Research?
No ratings yet
Notes For Mba (Business Research-524) : Q-1 What Is Business Research? Define / Types of Business Research?
5 pages
Tests For Specification Errors in Classical Linear Least-Squares Regression Analysis (Ramsey)
No ratings yet
Tests For Specification Errors in Classical Linear Least-Squares Regression Analysis (Ramsey)
23 pages
PHD Progress Report
No ratings yet
PHD Progress Report
26 pages
DataScience With R (Assignment 5-Report)
No ratings yet
DataScience With R (Assignment 5-Report)
9 pages
Sample Size Determination: BY DR Zubair K.O
100% (1)
Sample Size Determination: BY DR Zubair K.O
43 pages
G.L Bajaj Institute of Management and Research
No ratings yet
G.L Bajaj Institute of Management and Research
4 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
Bioprocess Complete Self-Assessment Guide
From Everand
Bioprocess Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Econometrics for Mgt ppt-2 (1)
No ratings yet
Econometrics for Mgt ppt-2 (1)
58 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
Laser Therapy Physiotherapy 2
100% (1)
Laser Therapy Physiotherapy 2
41 pages
MCQ of Project Management Part 1 1 Compress
No ratings yet
MCQ of Project Management Part 1 1 Compress
19 pages
Pune Housing and Area Development Board Payment Receipt: Application Details
No ratings yet
Pune Housing and Area Development Board Payment Receipt: Application Details
1 page
PySpark SQL Cheat Sheet Python
No ratings yet
PySpark SQL Cheat Sheet Python
1 page
Statistics Part1
No ratings yet
Statistics Part1
28 pages
Resume Pooja Jain PDF
No ratings yet
Resume Pooja Jain PDF
2 pages
Akshay Arora: Data Science & Machine Learning Practitioner
No ratings yet
Akshay Arora: Data Science & Machine Learning Practitioner
2 pages
Akanksha: Technical Skills
No ratings yet
Akanksha: Technical Skills
2 pages
Sandhaya - Resume ML - DS PDF
0% (1)
Sandhaya - Resume ML - DS PDF
2 pages
Suraj.V.Singh Negi: Career Objective
No ratings yet
Suraj.V.Singh Negi: Career Objective
2 pages
Business Analytics: Advance: Logistic Regression
100% (1)
Business Analytics: Advance: Logistic Regression
26 pages
Set PYTHON PDF
No ratings yet
Set PYTHON PDF
2 pages
How Handling Missing Data May Impact Conclusions - A Comparison of Six Different Imputation Methods For Categorical Questionnaire Data
No ratings yet
How Handling Missing Data May Impact Conclusions - A Comparison of Six Different Imputation Methods For Categorical Questionnaire Data
20 pages
Memon Et Al. - Moderation Analysis Issue Adn Guidelines PDF
No ratings yet
Memon Et Al. - Moderation Analysis Issue Adn Guidelines PDF
11 pages
Soluble CD14 in Breast Milk and Its Relation To Atopic Manifestation in Early Infancy
No ratings yet
Soluble CD14 in Breast Milk and Its Relation To Atopic Manifestation in Early Infancy
8 pages
The Chi Square Statistic
No ratings yet
The Chi Square Statistic
7 pages
Brehm Alday 2022 Contrast Coding Choices in A Decade of Mixed Models
No ratings yet
Brehm Alday 2022 Contrast Coding Choices in A Decade of Mixed Models
13 pages
Healthcare Information Management
No ratings yet
Healthcare Information Management
68 pages
12308700_33.int-375
No ratings yet
12308700_33.int-375
21 pages
How To Interpret and Report The Results From Multi Variable Analysis
0% (1)
How To Interpret and Report The Results From Multi Variable Analysis
6 pages
Psychology 300-Midterm Exam I
No ratings yet
Psychology 300-Midterm Exam I
12 pages
Q1 Practical Research 2 - Module 3
No ratings yet
Q1 Practical Research 2 - Module 3
20 pages
Test Bank for Essential Statistics, 3rd Edition, Robert N. Gould, Colleen Ryan Rebecca Wong - Latest Version Can Be Downloaded Immediately
100% (4)
Test Bank for Essential Statistics, 3rd Edition, Robert N. Gould, Colleen Ryan Rebecca Wong - Latest Version Can Be Downloaded Immediately
44 pages
Supplementary Data-Chained Manuscript B
No ratings yet
Supplementary Data-Chained Manuscript B
21 pages
Mandilectomy Pat
No ratings yet
Mandilectomy Pat
7 pages
lecture1
No ratings yet
lecture1
79 pages
2017 LSC Curriculum Map Year 9-Science Unit 2
No ratings yet
2017 LSC Curriculum Map Year 9-Science Unit 2
15 pages
CHAPTER FIVE: Sources and Methods of Data Collections 1.1 Data Collection
No ratings yet
CHAPTER FIVE: Sources and Methods of Data Collections 1.1 Data Collection
14 pages
Gtsummary
No ratings yet
Gtsummary
2 pages
Business Analytics: Finding Relationships Among Variables
No ratings yet
Business Analytics: Finding Relationships Among Variables
39 pages
Statistics For Economics
100% (1)
Statistics For Economics
214 pages
Example Exploratory Data Analysis PDF
No ratings yet
Example Exploratory Data Analysis PDF
17 pages
Sta630 Midterm Short Notes
No ratings yet
Sta630 Midterm Short Notes
19 pages
Eng Ed482454
No ratings yet
Eng Ed482454
38 pages
4.Introduction to Biostatistics
No ratings yet
4.Introduction to Biostatistics
30 pages
Solving Equations Unit Plan
No ratings yet
Solving Equations Unit Plan
10 pages
1 ASAP Business Analytics Introduction
No ratings yet
1 ASAP Business Analytics Introduction
25 pages
BBFH 404 Business Research Methods 1
No ratings yet
BBFH 404 Business Research Methods 1
10 pages
SBE - 11e ch01
No ratings yet
SBE - 11e ch01
36 pages
Logistic Regression Using SPSS Level1 MASH
No ratings yet
Logistic Regression Using SPSS Level1 MASH
7 pages
PreProcessing With R
No ratings yet
PreProcessing With R
6 pages
2-Practical Research
No ratings yet
2-Practical Research
24 pages

Business Analytics: Advance: Simple & Multiple Linear Regression

Uploaded by

Business Analytics: Advance: Simple & Multiple Linear Regression

Uploaded by

Business Analytics:

Copyright © 2016 Defour Analytics Pvt. Ltd

Measure of central Measure of dispersion

COPYRIGHT 2017@ DEFOUR ANALYTICS PVT LTD 2

COPYRIGHT 2017@ DEFOUR ANALYTICS PVT LTD 4

Types of regression analysis

Purpose of regression analysis

Understanding Simple Linear Regression

Understanding Multiple Linear Regression

The covariance is very hard to compare. E.g. when you compare

1 Simple Linear Regression Interval or Ratio Interval or Ratio or dichotomous

2 Multiple Linear Regression Interval or Ratio Interval or Ratio or dichotomous

3 Logistic Regression Binary Interval or Ratio or dichotomous

4 Ordinal Regression Ordinal nominal or dichotomous

5 Multinomial Regression Nominal Interval or Ratio or dichotomous

y = f(x1, x2,..., xk)

effect / impact of an effect)

We call it linear as the equation

SST  SSE  RSS

explained by variation in the independent variable.

i.e. if = 0.70 than 70% variance in Y is explained by X.

Note: - In the single independent variable case, the coefficient of determination is

 It is valuable for quantifying the impact of various

Linearity of the phenomenon measured.

 The most common method of correcting multicollinearity is by systematically removing the

BP Chi Square Test Statistic  n  Rresid

Copyright © 2016 Defour Analytics Pvt. Ltd

sales.data = read.csv("Data.csv") # Reading sales data

predictions <- predict.lm(simple.model,newdata = new_data) # Predicting the values

predictions <- as.data.frame(predictions) # Converting predictions to dataframe for next

cbind(new_data,predictions) # Combining new data and prediction values

 In this data, we need to find out:

set.seed(25) # saving random generating variables in R (generating in next step)

dim(training.data) # this function tells the dimensions of the data frame

indp.var <- colnames(training.data[,2:14]) # copying the variable names in indp.var

indp.var <- paste(indp.var, collapse = " + ")

equation <- paste("Customer_Satisfaction ~ ",indp.var)

cor(walmart.data) # Seeing correlation with all variables

step.train.model <- step(train.model,direction = "both")

summary(step.train.model) # Summary of the model looks good.

train.model <- lm(Customer_Satisfaction ~ Product_Quality+E_Commerce+Technical_Support+

train.model1 <- lm(Customer_Satisfaction ~ Product_Quality+E_Commerce+

vif(train.model1) # Checking multicollinearity

qqnorm(train.model1$residuals) # Checking normality of residuals

durbinWatsonTest(step.train.model) # Checking autocorrelation of residuals

test.predict <- predict.lm(train.model1,newdata = testing.data)

Types of regression analysis

Purpose of regression analysis

Understanding Simple Linear Regression

Understanding Multiple Linear Regression

Use case with R

Copyright © 2016 Defour Analytics Pvt. Ltd

You might also like