0% found this document useful (0 votes)
25 views

Project 2 - Advanced Statistics

The document outlines an analysis of a dataset called "Factor-Hair-Revised" using R. It includes: 1. Importing and exploring the data, including univariate analyses like histograms and bivariate analyses like scatterplots. 2. Eleven simple linear regression models with customer satisfaction as the dependent variable and different variables as predictors. These identify some variables more correlated with satisfaction than others. 3. Plans to also conduct multicollinearity analysis, principal component analysis, factor analysis, and multiple linear regression on the data. The goal is to generate insights into the relationships between variables in the data and identify key drivers of customer satisfaction.

Uploaded by

Devanshi Daulat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Project 2 - Advanced Statistics

The document outlines an analysis of a dataset called "Factor-Hair-Revised" using R. It includes: 1. Importing and exploring the data, including univariate analyses like histograms and bivariate analyses like scatterplots. 2. Eleven simple linear regression models with customer satisfaction as the dependent variable and different variables as predictors. These identify some variables more correlated with satisfaction than others. 3. Plans to also conduct multicollinearity analysis, principal component analysis, factor analysis, and multiple linear regression on the data. The goal is to generate insights into the relationships between variables in the data and identify key drivers of customer satisfaction.

Uploaded by

Devanshi Daulat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

1

Table of Contents

1. Project Objective

2. Assumptions

3. Exploratory Data Analysis

3.1 Environment Set up and Data Import

3.1.1 Install necessary Packages and Invoke Libraries

3.1.2. Set up working Directory

3.1.3 Import and Read the Dataset

4. Data Summary, Univariate, Bivariate analysis, graphs

4.1. Histogram

4.2. Boxplot

4.3 Scatterplot

5. Simple Linear Regression Analysis

5.1- 5.11 Model 1 Analysis to Model 11 Analysis

6. Analysis based on Multicollinearity

7. KMO Test

8. Factor Interpretation

9. Principal Component Analysis

10. Factor Component Analysis – Names

11. Multiple Regression Analysis

12. Output Interpretation in Business Terms

13. Validity of the Model

14. Prediction of the model – Backtracking Model


2
1. Project Objective

The objective of the report is to explore the data set (“Factor-Hair-Revised”) in R and generate
insights about the data set. This exploration report will consists of the following:
 Importing the dataset in R
 Understanding the structure of dataset
 Graphical exploration
 Exploratory Data Analysis
 Multicollinerality
 Simple Linear Regression
 PCA/Factor Analysis
 Multiple Linear Regression
 Model output and validity

2. Assumptions

Data is distributed normally

3. Exploratory Data Analysis

Step by step approach. A Typical Data exploration activity consists of the following
steps:

3.1 Environment Set up and Data Import

3.1.1 Install necessary Packages and Invoke Libraries

Necessary packages need to be installed invoked using Library command. Please refer R
code for compete set of packages.

3.1.2 Set up working Directory

Setting a working directory on starting of the R session makes importing and exporting data files
and code files easier. Basically, working directory is the location/ folder on the PC where you
have the data, codes etc. related to the project.

Please refer R Code.

3.1.3 Import and Read the Dataset

The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing the file.

Please refer R Code.

3
4. Data Summary, Univariate, Bivariate analysis, graphs

Following are the observation of summary: All the 12 variables are number and 1 variable is
integer.

 There are no missing values


 Outliers possible in SalesForce image as per the summary.
 Customer satisfaction is the dependent Variable.
 Working Data will not have ID variable - as its just sequence numbers.

4.1. Histogram for all the 12 variables:

Satisfaction is the dependent variable, therefore we have a graph here for all the
variables. The data (Rating in scale of 1-10) is given by 100 customers for all the
variables depicted in the graph above.

4.2: Boxplot Graph

Outliers can be seen in Ecommerce, Sales Force Image and Order billing.

4
4.3: Bivariate Analysis

Scatter plot below infers that there is correlation between certain variables:

 Ecommerce with Salesforce image


 Technical Support and Warranty and claims
 Complaint resolution with order billing and Delivery Speed
 Product line with Delivery Speed
 Competitive pricing and Product line to some extent

5
5. Simple Liner regression:

Customer Satisfaction is the dependent variable.

Below models are the analysis of all the independent variables with dependent variable.

6
5.1 – Model 1 Customer Satisfaction and product Quality

lm(formula = Customer.Satisfaction ~ Product.Quality)

Residuals:
Min 1Q Median 3Q Max
-1.88746 -0.72711 -0.01577 0.85641 2.25220

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.67593 0.59765 6.151 1.68e-08 ***
Product.Quality 0.41512 0.07534 5.510 2.90e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.047 on 98 degrees of freedom


Multiple R-squared: 0.2365, Adjusted R-squared: 0.2287
F-statistic: 30.36 on 1 and 98 DF, p-value: 2.901e-07
Multiple Rsquare is 23% and Adjusted Rsquare is around 22.87% not highly corelated. But the
model is valid since p value is much less than 0.05.

5.2 – Model 2 Customer Satisfaction and Ecommerce

lm(formula = Customer.Satisfaction ~ Ecommerce)

Residuals:
Min 1Q Median 3Q Max
-2.37200 -0.78971 0.04959 0.68085 2.34580

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.1516 0.6161 8.361 4.28e-13 ***
Ecommerce 0.4811 0.1649 2.918 0.00437 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.149 on 98 degrees of freedom


Multiple R-squared: 0.07994, Adjusted R-squared: 0.07056
F-statistic: 8.515 on 1 and 98 DF, p-value: 0.004368

7
Multiple Rsquare is 7.9% and Adjusted Rsquare is around 7% signifies both the factors are not
corelated. But the model is valid since P value is much less than 0.05.

5.3 – Model 3 Customer Satisfaction and Technical Support.

Multiple Rsquare is 1.3% and Adjusted Rsquare is around 0.26% signifies both the factors are
not corelated. The model is not valid since p value is more than 0.05.

lm(formula = Customer.Satisfaction ~ Technical.Support)

Residuals:
Min 1Q Median 3Q Max
-2.26136 -0.93297 0.04302 0.82501 2.85617

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.44757 0.43592 14.791 <2e-16 ***
Technical.Support 0.08768 0.07817 1.122 0.265
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.19 on 98 degrees of freedom


Multiple R-squared: 0.01268, Adjusted R-squared: 0.002603
F-statistic: 1.258 on 1 and 98 DF, p-value: 0.2647

8
5.4 – Model 4 Customer Satisfaction and Complaint Resolution.

Multiple Rsquare is 36.4% and Adjusted Rsquare is around 35.8% signifies both the factors are
lightly corelated. But the model is valid since p value is much less than 0.05.

lm(formula = Customer.Satisfaction ~ `Complain Resolution`)

Residuals:
Min 1Q Median 3Q Max
-2.40450 -0.66164 0.04499 0.63037 2.70949

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.68005 0.44285 8.310 5.51e-13 ***
`Complain Resolution` 0.59499 0.07946 7.488 3.09e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9554 on 98 degrees of freedom


Multiple R-squared: 0.3639, Adjusted R-squared: 0.3574
F-statistic: 56.07 on 1 and 98 DF, p-value: 3.085e-11

5.5 – Model 5 Customer Satisfaction and Advertising.

9
Multiple Rsquare is 9.3% and Adjusted Rsquare is around 8.4% signifies both the factors are not
corelated. The model is valid since p value is less than 0.05.

lm(formula = Customer.Satisfaction ~ Advertising)

Residuals:
Min 1Q Median 3Q Max
-2.34033 -0.92755 0.05577 0.79773 2.53412

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.6259 0.4237 13.279 < 2e-16 ***
Advertising 0.3222 0.1018 3.167 0.00206 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.141 on 98 degrees of freedom


Multiple R-squared: 0.09282, Adjusted R-squared: 0.08357
F-statistic: 10.03 on 1 and 98 DF, p-value: 0.002056

5.6 – Model 6 Customer Satisfaction and Product Line.

Multiple Rsquare is 30.3% and Adjusted Rsquare is around 29.6% signifies both the factors are
lightly corelated. The model is valid since p value is less than 0.05.

lm(formula = Customer.Satisfaction ~ Product.Line)

Residuals:
Min 1Q Median 3Q Max
-2.3634 -0.7795 0.1097 0.7604 1.7373

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.02203 0.45471 8.845 3.87e-14 ***

10
Product.Line 0.49887 0.07641 6.529 2.95e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1 on 98 degrees of freedom


Multiple R-squared: 0.3031, Adjusted R-squared: 0.296
F-statistic: 42.62 on 1 and 98 DF, p-value: 2.953e-09

5.7 – Model 7 Customer Satisfaction and Sales Force Image.

Multiple Rsquare is 25% and Adjusted Rsquare is around 24.3% signifies both the factors are
relatively corelated. The model is valid since p value is less than 0.05.

lm(formula = Customer.Satisfaction ~ Sales.Force.Image)

Residuals:
Min 1Q Median 3Q Max
-2.2164 -0.5884 0.1838 0.6922 2.0728

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.06983 0.50874 8.000 2.54e-12 ***
Sales.Force.Image 0.55596 0.09722 5.719 1.16e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.037 on 98 degrees of freedom


Multiple R-squared: 0.2502, Adjusted R-squared: 0.2426
F-statistic: 32.7 on 1 and 98 DF, p-value: 1.164e-07

11
5.8 – Model 8 Customer Satisfaction and Competitive pricing.

lm(formula = Customer.Satisfaction ~ Competitive.Pricing)

Residuals:
Min 1Q Median 3Q Max
-1.9728 -0.9915 -0.1156 0.9111 2.5845

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.03856 0.54427 14.769 <2e-16 ***
Competitive.Pricing -0.16068 0.07621 -2.108 0.0376 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.172 on 98 degrees of freedom


Multiple R-squared: 0.04339, Adjusted R-squared: 0.03363
F-statistic: 4.445 on 1 and 98 DF, p-value: 0.03756

Multiple Rsquare is 4.4% and Adjusted Rsquare is around 3.4% signifies both the factors are not
corelated. The model is not valid since p value is less than 0.05.

5.9 – Model 9 Customer Satisfaction and Warranty & Claims.

12
Multiple Rsquare is 3.2% and Adjusted Rsquare is around 2.1% signifies both the factors are not
corelated. The model is not valid since p value is more than 0.05.

lm(formula = Customer.Satisfaction ~ `Warranty&Claims`)

Residuals:
Min 1Q Median 3Q Max
-2.36504 -0.90202 0.03019 0.90763 2.88985

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.3581 0.8813 6.079 2.32e-08 ***
`Warranty&Claims` 0.2581 0.1445 1.786 0.0772 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.179 on 98 degrees of freedom


Multiple R-squared: 0.03152, Adjusted R-squared: 0.02164
F-statistic: 3.19 on 1 and 98 DF, p-value: 0.0772

5.10 – Model 10 Customer Satisfaction and Order and Billing.

lm(formula = Customer.Satisfaction ~ Order.Billing)

Residuals:
Min 1Q Median 3Q Max
-2.4005 -0.7071 -0.0344 0.7340 2.9673

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0541 0.4840 8.377 3.96e-13 ***
Order.Billing 0.6695 0.1106 6.054 2.60e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.022 on 98 degrees of freedom


Multiple R-squared: 0.2722, Adjusted R-squared: 0.2648
F-statistic: 36.65 on 1 and 98 DF, p-value: 2.602e-08

Multiple Rsquare is 27.2% and Adjusted Rsquare is around 26.5% signifies both the factors are
relatively corelated. The model is valid since p value is less than 0.05.

13
5.11 – Model 11 Customer Satisfaction and Delivery Speed.

lm(formula = Customer.Satisfaction ~ Delevery.Speed)

Residuals:
Min 1Q Median 3Q Max
-2.22475 -0.54846 0.08796 0.54462 2.59432

Coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept) 3.2791 0.5294 6.194 1.38e-08 ***
Delevery.Speed 0.9364 0.1339 6.994 3.30e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9783 on 98 degrees of freedom


Multiple R-squared: 0.333, Adjusted R-squared: 0.3262
F-statistic: 48.92 on 1 and 98 DF, p-value: 3.3e-10

Multiple Rsquare is 33.3% and Adjusted Rsquare is around 32.6% signifies both the factors are
relatively corelated. The model is not valid since p value is less than 0.05.

14
6. Analysis based on Multicollinearity:

As an early indicator we run the correlation matrix in R. We observe that some of the variables
are significant.

Below mentioned variables are closer to 1 – therefore there is significance.

 Salesforce image and Ecommerce – 0.79


 Technical support and Warranty and claims – 0.8
 Complaint resolution with order billing is 0.76 and with Delivery speed is 0.87.

6.1 Bartlett test:

$chisq
[1] 619.2726

$p.value
[1] 1.79337e-96

$df
[1] 55

15
As the P value is less than 0.05, as per the Bartlett test results its ideal for dimension reduction.
This test is to check if correlation Matrix is different from Identity Matrix.

7. KMO test – Kaiser-Meyer-Olkin test.

To check if the sample size is adequate we run the KMO test.

As the overall MSA is more than 0.5: The value is 0.65 as per the KMO, we consider the given
sample is adequate and there is no need of additional samples.

Kaiser-Meyer-Olkin factor adequacy


Call: KMO(r = depvardata)
Overall MSA = 0.65
MSA for each item =
ProdQual Ecom TechSup CompRes Advertising ProdLine SalesFImage ComPricin
g WartyClaim
0.51 0.63 0.52 0.79 0.78 0.62 0.62 0.75 0.51
OrdBilling DelSpeed
0.76 0.67

8. Factor interpretation

8.1 Extract Eigen Values:

EV = Value$values
> EV
[1] 3.42697133 2.55089671 1.69097648 1.08655606 0.60942409 0.55188378 0.40151815 0.246
95154 0.20355327 0.13284158
[11] 0.09842702

As per above data out of 11 variables 4 variables are above 1.

As per Kaiser Normalization rule we will consider 4 factors.

8.2 Elbow rule:

16
As per the elbow rule in the graph 5 factors can be taken. But we will follow the Kaiser rule and
consider only 4 factors.

9. Principal Component Analysis.

Below are the results of Principal component analysis without rotation. We observe that 12
variables are reduced to 4 factors wi
principal(r = depvardata, nfactors = 4, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 PC4 h2 u2 com
Product.Quality 0.25 -0.50 -0.08 0.67 0.77 0.232 2.2
Ecommerce 0.31 0.71 0.31 0.28 0.78 0.223 2.1
Technical.Support 0.29 -0.37 0.79 -0.20 0.89 0.107 1.9
Complain Resolution 0.87 0.03 -0.27 -0.22 0.88 0.119 1.3
Advertising 0.34 0.58 0.11 0.33 0.58 0.424 2.4
Product.Line 0.72 -0.45 -0.15 0.21 0.79 0.213 2.0
Sales.Force.Image 0.38 0.75 0.31 0.23 0.86 0.141 2.1
Competitive.Pricing -0.28 0.66 -0.07 -0.35 0.64 0.359 1.9
Warranty&Claims 0.39 -0.31 0.78 -0.19 0.89 0.108 2.0
Order.Billing 0.81 0.04 -0.22 -0.25 0.77 0.234 1.3
Delevery.Speed 0.88 0.12 -0.30 -0.21 0.91 0.086 1.4

PC1 PC2 PC3 PC4


SS loadings 3.43 2.55 1.69 1.09
Proportion Var 0.31 0.23 0.15 0.10
Cumulative Var 0.31 0.54 0.70 0.80
Proportion Explained 0.39 0.29 0.19 0.12
Cumulative Proportion 0.39 0.68 0.88 1.00

Mean item complexity = 1.9


Test of the hypothesis that 4 components are sufficient.

The root mean square of the residuals (RMSR) is 0.06


with the empirical chi square 39.02 with prob < 0.0018

Fit based upon off diagonal values = 0.9

17
To make the factors more significant and move the variables closer to 0 and 1 we will rotate the
values without affecting the communality.
Principal Components Analysis
Call: principal(r = depvardata, nfactors = 4, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
RC1 RC2 RC3 RC4 h2 u2 com
Product.Quality 0.00 -0.01 -0.03 0.88 0.77 0.232 1.0
Ecommerce 0.06 0.87 0.05 -0.12 0.78 0.223 1.1
Technical.Support 0.02 -0.02 0.94 0.10 0.89 0.107 1.0
Complain Resolution 0.93 0.12 0.05 0.09 0.88 0.119 1.1
Advertising 0.14 0.74 -0.08 0.01 0.58 0.424 1.1
Product.Line 0.59 -0.06 0.15 0.64 0.79 0.213 2.1
Sales.Force.Image 0.13 0.90 0.08 -0.16 0.86 0.141 1.1
Competitive.Pricing -0.09 0.23 -0.25 -0.72 0.64 0.359 1.5
Warranty&Claims 0.11 0.05 0.93 0.10 0.89 0.108 1.1
Order.Billing 0.86 0.11 0.08 0.04 0.77 0.234 1.1
Delevery.Speed 0.94 0.18 0.00 0.05 0.91 0.086 1.1

RC1 RC2 RC3 RC4


SS loadings 2.89 2.23 1.86 1.77
Proportion Var 0.26 0.20 0.17 0.16
Cumulative Var 0.26 0.47 0.63 0.80
Proportion Explained 0.33 0.26 0.21 0.20
Cumulative Proportion 0.33 0.59 0.80 1.00

Mean item complexity = 1.2


Test of the hypothesis that 4 components are sufficient.

The root mean square of the residuals (RMSR) is 0.06


with the empirical chi square 39.02 with prob < 0.0018

Fit based upon off diagonal values = 0.97

>

The communality values haven’t changed. –H2 column.

10. Factor Component Analysis.

We will name the factors as per the components

1. RC1 = This factor relates to Delivery Speed, complain resolution and Order Billing.
Name: Post.Sale
2. RC2 = This factor relates to Sales force image, Ecommerce and advertising.
Name: Marketing
3. RC3 = This factor relates to technical support and warranty claims
Name: Support
4. RC4 = This factor relates to product quality, competitive pricing (negatively) and product
line.
Name: Product.attributes

18
11. Multiple Linear Regression Analysis
lm(formula = `Customer Satisfaction` ~ Post.sale + Marketing +
Support + Product.attributes)

Residuals:
Min 1Q Median 3Q Max
-1.6346 -0.5021 0.1368 0.4617 1.5235

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.91813 0.07087 97.617 < 2e-16 ***
Post.sale 0.61799 0.07122 8.677 1.11e-13 ***
Marketing 0.50994 0.07123 7.159 1.71e-10 ***
Support 0.06686 0.07120 0.939 0.35
Product.attributes 0.54014 0.07124 7.582 2.27e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7087 on 95 degrees of freedom


Multiple R-squared: 0.6607, Adjusted R-squared: 0.6464
F-statistic: 46.25 on 4 and 95 DF, p-value: < 2.2e-16

The equation is y^ = 6.91813+0.61799X1+0.50994X2 +0.06686X3+0.54014X4

19
12. Output Interpretation in business terms

We have taken four new factors as independent variable and Customer satisfaction as
dependent variable:

 R square: In the above model the R Square is 66.07% which is quite significant.
66.07% of variation in Customer satisfaction is explained by all the four factors.
 Probability (F value > 46.25) = P value 2.2e-16 is much less than 5% significance
level – Hence we reject the null hypothesis and at least one beta is non zero, so we
accept alternative Hypothesis.
There is evidence that regression model exists in the population
 Regression has 4 degrees of freedom and has total 99(100 observation -1) degrees of
freedom. Hence error or residual has (99-4) has 95 degrees of freedom.
 Adjusted R square gives after adjusting value for the degrees of freedom for every
value added – 64.64%
 Individual coefficients for all 3 factors are highly significant as the individual T-
stats are less than alpha 5% except for support Factor.

13. Validity of the Model

1. We split the data into Train and Test in the ration of 70:30
2. Train Data – 70% of data will be used for Model development and 30% of data
for validation purposes.
Model Development:
The R square is 72.4% and Pvalue much lesser than alpha level. The Model is Valid.
lm(formula = lm(`Customer Satisfaction` ~ Post.sale + Marketing +
Support + Product.attributes), data = Train)

Residuals:
Min 1Q Median 3Q Max
-1.41367 -0.51245 0.02767 0.46999 1.55876

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.89801 0.07918 87.121 < 2e-16 ***
Post.sale 0.53704 0.08348 6.433 1.63e-08 ***
Marketing 0.60431 0.07481 8.078 1.92e-11 ***
Support 0.05046 0.07916 0.637 0.526
Product.attributes 0.51975 0.08093 6.422 1.70e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.66 on 66 degrees of freedom
Multiple R-squared: 0.7245, Adjusted R-squared: 0.7078
F-statistic: 43.39 on 4 and 66 DF, p-value: < 2.2e-16

New equation is ^Y = 6.89801 + 0.53704X1 + 0.60431X2 + 0.05046X3 + 0.51975X4

20
14. Prediction of the Model:

We take the rest 30% of the data for Test and predict the Model.

With confidence Level of 95%. The accepted values – Proper fit:Upper Limit :Lowerlimit –
are given. As an analyst we can choose any of the values as per organization rules.

Predtest = predict(LM_Model, newdata = Test,interval = "confidence")

Predtest

Backtracking of the Model:


Actual values are in blue and predicted in red.
Both the values are moving closely with each other hence the Regression Model is
Valid.

21

You might also like