0% found this document useful (0 votes)

18 views18 pages

Project

Linear regression was used to predict yearly customer spending on an e-commerce platform based on engagement metrics. The key predictors were average session length, time spent on the app and website, and length of membership. The data was cleaned by removing NA values and duplicate rows. Variables like customer email and avatar that did not predict spending were removed. Two linear regression models were created - one using all engagement predictors and one using just average session length. Both aimed to uncover patterns in customer behavior and spending to help optimize the customer experience and increase revenue.

Uploaded by

Salma Shaheen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views18 pages

Project

Uploaded by

Salma Shaheen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Predictive Modelling with Linear Regression

Project 1 (Multivariate Statistics

STAT8031 - Fall 2023 - Section 1)
Salma Shaheen

Student ID| 8913789

Table of Contents

1. Data Set: E-Commerce Customers --------------------------------------------------------------2

2. Initial Modeling for E-commerce Dataset -----------------------------------------------------4
3. Diagnostics for E-commerce Dataset-----------------------------------------------------------11
4. Model Selection for E-commerce Dataset-----------------------------------------------------14
5. Predictions and Summary ------------------------------------------------------------------------17

1
Data Set: E-commerce Customers
The given dataset seems to be a simulated collection of customer data in an e-commerce setting,
likely used for analytical or modeling purposes. There are 500 observations in the dataset. It appears
that each entry(observation) in the dataset represents a distinct customer, and the dataset contains
multiple variables or attributes that provide information about different aspects of these customers'
interactions and behaviors on the e-commerce platform. Here is a detailed breakdown of the
dataset:

1. Email: This variable contains the email addresses of the customers, serving as unique
identifiers for each customer in the dataset.
2. Address: This variable includes the physical addresses of the customers, providing additional
contact information.
3. Avatar: This variable represents the chosen avatars or profile pictures of the customers,
potentially for personalization or identification purposes.
4. Avg. Session Length: This variable records the average length of time each customer spends
per session on the e-commerce platform.
5. Time on App: This variable represents the time spent by each customer using the e-
commerce application.
6. Time on Website: This variable indicates the time spent by each customer on the e-
commerce website.
7. Length of Membership: This variable denotes the duration for which each customer has
been a member of the e-commerce platform.
8. Yearly Amount Spent: This variable serves as the quantitative response, representing the
total amount of money spent by each customer on the e-commerce platform within a year.

The dataset includes both quantitative variables, such as time metrics and the amount spent, and
qualitative variables, such as email addresses, addresses, and avatars.

In the e-commerce dataset, the following variables can be considered for regression analysis:
 Avg. Session Length
 Time on App
 Time on Website
 Length of Membership
 Yearly amount spent.

These variables are suitable for regression analysis as they are quantitative in nature, providing
measurable insights into customer behavior and engagement on the e-commerce platform. Utilizing
these variables in a regression analysis can help in understanding the relationships between
customer engagement metrics and the amount spent yearly on the platform. By examining how
changes in these variables influence the yearly amount spent, businesses can gain valuable insights

2
into customer preferences, behavior patterns, and the effectiveness of the e-commerce platform in
driving customer spending.

In the e-commerce dataset, there is at least one categorical variable, which is the "Avatar" variable.
This variable represents the chosen avatars or profile pictures of the customers. Categorical variables
are those that have distinct categories or groups, and in this dataset, the "Avatar" variable is a
categorical attribute representing different visual representations chosen by the customers.

Analyzing the impact of this categorical variable on customer behavior or the amount spent might
provide insights into the potential influence of visual representations on customer engagement and
spending patterns. Including categorical variables in the analysis can help understand the role of
visual cues or personalization in customer interactions and provide valuable insights for improving
the user experience on the e-commerce platform. While it might be possible to use this variable for
categorical regression, it's important to note that using it as a predictor in a regression analysis
might not be straightforward.

Categorical regression, also known as logistic regression, typically requires categorical variables to
be encoded as dummy or indicator variables. This process involves creating binary variables that
represent the presence or absence of a category. However, the "Avatar" variable, as it stands, may
not be directly applicable for this type of analysis without appropriate preprocessing.

Following are the things that I hope to predict using this dataset:

 Given the rich array of dataset, the primary aim would be to predict the yearly amount spent
by customers based on their engagement metrics and membership duration.
 By leveraging the predictors such as the time spent on the app and website, session length,
and length of membership, it becomes possible to uncover patterns and trends that drive
customer spending behavior.
 Understanding the factors that influence customer spending can offer valuable insights for e-
commerce businesses to optimize their platform, improve user experience, and tailor
marketing strategies to boost customer engagement and maximize revenue.

Through comprehensive analysis and modeling techniques, it is possible to uncover hidden patterns
within the data, such as identifying customer segments with varying spending habits, preferences, or
levels of engagement. By uncovering these insights, businesses can refine their marketing strategies,
streamline user experiences, and enhance customer satisfaction, ultimately leading to increased
customer loyalty and higher overall revenue generation. Understanding the dynamics of customer
spending within the e-commerce platform can provide businesses with a competitive edge in a
rapidly evolving and highly competitive online marketplace.

3
Initial Modeling for E-commerce Dataset

For the e-commerce dataset, the "Yearly Amount Spent" serves as the dependent (response)
variable.

Based on the e-commerce dataset, several variables seem crucial as predictors for understanding
customer spending behavior. The "Time on App" and "Time on Website" are likely to be significant,
as they represent the duration of customer interactions on these platforms, indicating the level of
engagement. The "Length of Membership" is another essential predictor as it can provide insights
into customer loyalty and the potential impact of long-term engagement on spending. The “average
session length” is another useful predictor in understanding customer behavior and potentially
predicting the yearly amount spent by customers on the e-commerce platform. The average session
length reflects the duration of time customers spend during each interaction or session, providing
insights into their engagement and activity levels. Longer average session lengths might indicate
higher levels of customer involvement, interest, or satisfaction with the platform, potentially leading
to increased spending.

Additionally, considering possible non-linear relationships, it might be beneficial to include the

squared or cubed terms of these variables to account for any curvilinear patterns. Interaction terms
between the "Time on App" and "Time on Website" could capture any combined effect of these
variables on the yearly amount spent.

On the other hand, variables such as the customer's email, address, or avatar might not be directly
useful for predicting spending behavior and could be excluded from the model to simplify the
analysis and avoid unnecessary complexity.

Cleaning Data:
Firstly, I uploaded data in “R” by importing it. After that, I renamed the data for easiness.

 DT = E_commerece_Customers

After that, I noticed that there are many places having “NA” (may be its just happens when I
converted the data file into Excel). So, I removed the “NA” values by using the following code:

 library(dplyr)
 Remove_NA = na.omit(DT)
 Remove_NA

To make it sure that no rows are duplicate, I use the following command:

4
 Remove_duplicate = distinct(Remove_NA)
 Remove_duplicate

As variables such as the customer's email, address, or avatar might not be directly useful for
predicting spending behavior, so I removed these columns, by the following command:

Again, I renamed the data, and I used the following codes to see that my data is in accurate form
having 500 observation and five variables:
 DTC = Remove_columns
 DTC
 head(DTC)
 str(DTC)
 nrow(DTC)

Modeling:

Linear Regression model for Avg. Session Length, Time on App, Time on Website,
Length of Membership & Yearly amount spent:

The following code will be used to find the regression line:

 model1= lm(`Yearly Amount Spent` ~ `Avg. Session Length`+ `Time on App`+

`Time on Website`+ `Length of Membership`, data=DTC)
 model1

The coefficients are:

Intercept Avg. Session Length Time on App Time on Website `Length of Membership`
-1051.5943 25.7343 38.7092 0.4367 61.5773

Linear Regression model for Avg. Session Length & Yearly amount spent :

The following code will be used to find the regression line:

 m1= lm(DTC$`Yearly Amount Spent` ~ DTC$`Avg. Session Length`)

 m1

The coefficients are:

5
Intercept Slope
-438.56 28.37

This indicates that if we increase Average session Length by 1 unit, then Yearly amount spent will
increase on average of 28.37.

The code for plot along with regression line is:

 plot(DTC$`Avg. Session Length`, DTC$`Yearly Amount Spent`,

main = "Yearly Amount vs Average Length", xlab = "Average Length",
ylab = "Yealry Amount")
 abline(lm(`Yearly Amount Spent` ~ `Avg. Session Length`, data = DTC), col = "blue")

The result is:

This relationship seems to be nonlinear. So, the model after a suitable transformation is:

 DTCSqrModel1 = lm(`Yearly Amount Spent` ~ `Avg. Session Length`

+ I(`Avg. Session Length`^2), data = DTC)
 coef(DTCSqrModel1)

The coefficients are:

6
Intercept Avg. Session Length I(Avg. Session Length^2)
334.632106 -18.4748455 0.7090415
5

Linear Regression model for Time on App & Yearly amount spent:

The following code will be used to find the regression line:

 m2= lm(DTC$`Yearly Amount Spent` ~ DTC$`Time on App`)

 m2

The coefficients are:

Intercept Slope
19.21 39.83

This indicates that if we increase Average time on App by 1 unit, then Yearly amount spent will
increase on average of 39.83.

The code for plot is:

 plot(DTC$`Time on App`, DTC$`Yearly Amount Spent`,
main = "Yearly Amount vs Time on App", xlab = "Time on App",
ylab = "Yealry Amount")
 abline(lm(`Yearly Amount Spent` ~ `Time on App`, data = DTC), col = "blue")

The result is:

7
This relationship seems to be nonlinear. So, the model after a suitable transformation is:

 DTCSqrModel2 = lm(`Yearly Amount Spent` ~ `Time on App`

+ I(`Time on App`^2), data = DTC)
 coef(DTCSqrModel2)

The coefficients are:

Intercept Time on App I(Time on App^2)

217.111103 6.646338 1.381877

Linear Regression model for Time on Website & Yearly amount spent:

The following code will be used to find the regression line:

 m3= lm(DTC$`Yearly Amount Spent` ~ DTC$`Time on Website`)

 m3

The coefficients are:

Intercept Slope
506.9961 -0.2073

This indicates that if we increase Average time on Web by 1 unit, then Yearly amount spent will
decrease on average of 0.2073.

8
The code for plot is:
 plot(DTC$`Time on Website`, DTC$`Yearly Amount Spent`,
main = "Yearly Amount vs Time on Website", xlab = "Time on Website",
ylab = "Yealry Amount")
 abline(lm(`Yearly Amount Spent` ~ `Time on Website`, data = DTC), col = "blue")

The result is:

This relationship seems to be nonlinear. So, the model after a suitable transformation is:

 DTCSqrModel3 = lm(`Yearly Amount Spent` ~ `Time on App`

+ I(`Time on App`^2), data = DTC)
 coef(DTCSqrModel3)

The coefficients are:

Intercept Time on App I(Time on App^2)

2289.68515 -96.467231 1.298474
4

Linear Regression model for Length of Membership & Yearly amount spent:

The following code will be used to find the regression line:

9
 m4= lm(DTC$`Yearly Amount Spent` ~ DTC$`Length of Membership`)
 m4

The coefficients are:

Intercept Slope
272.40 64.22

This indicates that if we increase Length of Membership by 1 unit, then Yearly amount spent will
increase on average of 64.22.
The code for plot is:
 plot(DTC$`Length of Membership`, DTC$`Yearly Amount Spent`,
main = "Yearly Amount vs Length of Membership", xlab = "Time on Website",
ylab = "Yealry Amount")
 abline(lm(`Yearly Amount Spent` ~ `Length of Membership`, data = DTC), col =
"blue")

The result is:

This relationship seems to be linear.

10
Diagnostics for E-commerce Dataset
To perform diagnostic tests on a linear regression model in R (model1), We will use Residual Analysis
(Residue vs fitted values plot), Histogram and Quantile-Quantile (QQ) plot to check the major
assumptions of the linear model (model1).

The Residue vs fitted plot:

The code in “R” is:

 model1Resids= model1$residuals
 model1Resids
 model1fitted=model1$fitted.values
 model1fitted
 plot(model1fitted, model1Resids, main="Scatter Plot", xlab = "Fitted Values", ylab =
"Residuals")

The resulted plot is:

Figure 1 (Residue vs Fitted Values)

11
The homoscedasticity assumption is violated. The variance (i.e. spread) of the residuals decreases as
the predicted values increase. Further, the model as constructed is having a hard time predicting
certain observations in the data, leading to outliers.

Histogram:

The code in “R “ is:

 hist(model1Resids, main = "Histogram of Residues", xlab = "Residues", ylab =

"Frequency")

The result is:

Figure 2(Histogram)

The histogram for the residuals might appear normal at first glance but it is actually “long tailed”. It
is more common for a distribution like this to generate outliers than a normal distribution. This
suggests that our residuals are not normally distributed.

Quantile-Quantile plot:

The code in “R” is:

 qqnorm(model1Resids)

The result is:

12
The qq-plot here also suggests the same as histogram. If the residuals were normal, they points
would cluster around the line. Instead, we see the effects of the outliers.

QQ plot should be straight….

13
Model Selection for E-commerce Dataset
Now, we will consider the problem of choosing from several different models.

Firstly, train a linear model with response Yearly Amount Spent and a single predictor Avg. Session
Length, with 10-fold cross-validatio.

The code is:

 library(stargazer)
 library(caret)
 library(leaps)
 set.seed(12)

 small_model= train(form = `Yearly Amount Spent` ~ `Avg. Session Length`,data =

DTC, method = "lm", trControl = trainControl(method = "cv", number = 10))
 small_model

The resampling results are:

RMSE R squared MAE

74.01713 0.1314221 57.38752

Further, we will train a linear model with response Yearly Amount Spent and every other variable
as predictor, with 10-fold cross-validation.

The code is:

 Model2 = train( form = `Yearly Amount Spent` ~ `Avg. Session Length` +

`Time on App` + `Time on Website` + `Length of Membership`,
data = DTC, method = "lm", trControl = trainControl(method = "cv",
number = 10))
 Model2

The resampling results are:

RMSE R squared MAE

9.963529 0.9832007 7.955738

Next, we will use best subset model selection for Yearly Amount spent and plot this using adj2 scale.

14
The code is:
 Model2 = train( form = `Yearly Amount Spent` ~ `Avg. Session Length` +
`Time on App` + `Time on Website` + `Length of Membership`,
data = DTC, method = "lm", trControl = trainControl(method = "cv",
number = 10))
 subsetmodel = regsubsets(`Yearly Amount Spent` ~ `Avg. Session Length` +
`Time on App` + `Time on Website` + `Length of Membership`, data =
DTC, nvmax=5)
 plot(subsetmodel, scale = "adjr2")
 summary(subsetmodel)

 bestM = which.max(summary(subsetmodel)$adjr2);
 bestM

The plot is:

Since the bestM value is 3. So, the code for new model is:

 ModelN = train( form = `Yearly Amount Spent` ~ `Avg. Session Length` +

`Time on App` + `Length of Membership`,

15
data = DTC, method = "lm", trControl = trainControl(method = "cv", number
= 10))
 ModelN

RMSE R squared MAE

9.950781 0.9841474 7.938899

The values of RMSE and MAE are slightly less as compared to the previous model. So, this is the
better model.

16
Predictions and Summary
To make Predictions or calculate predicted values, we will use the following code:

 predicted_values = predict(ModelN, newdata = DTC)

 predicted_values
Furthermore, we can compare these with the actual with the help of this code:
 comparison = data.frame(Actual = DTC$`Yearly Amount Spent`, Predicted =
predicted_values)

Thus, from the results we observe:

 The predicted values should reflect the estimated responses for the provided data points.
 RMSE indicates the model's accuracy in predicting the response variable.
 R-squared measures the proportion of the variance in the dependent variable that is
predictable from the independent variables.
 MAE represents the average of the absolute errors between predicted and observed values.

BPMN: the Business Process Modeling Notation Pocket Handbook
From Everand
BPMN: the Business Process Modeling Notation Pocket Handbook
Patrice Briol
No ratings yet
Technology Business Management: The Four Value Conversations Cios Must Have With Their Businesses
From Everand
Technology Business Management: The Four Value Conversations Cios Must Have With Their Businesses
Todd Tucker
4.5/5 (2)
Black Friday Sales
No ratings yet
Black Friday Sales
26 pages
SS Teamproject Documentation
No ratings yet
SS Teamproject Documentation
33 pages
Grade 5 Properties of Matter Worksheets
57% (7)
Grade 5 Properties of Matter Worksheets
3 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
ML project stage 2
No ratings yet
ML project stage 2
9 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
E-Commerce Customer Prediction
No ratings yet
E-Commerce Customer Prediction
5 pages
E Commerce Project
No ratings yet
E Commerce Project
12 pages
Advanced E-Commerce Business Questions and Analytical Hints
From Everand
Advanced E-Commerce Business Questions and Analytical Hints
Zemelak Goraga
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
From Everand
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
Yang Yiming
No ratings yet
Effective Analytics for Marketing
From Everand
Effective Analytics for Marketing
Sucheta Kakkar
No ratings yet
Telecommunication transaction processing systems A Complete Guide
From Everand
Telecommunication transaction processing systems A Complete Guide
Gerardus Blokdyk
No ratings yet
ML Project
100% (1)
ML Project
10 pages
Predictive Model For E-Commerce
100% (1)
Predictive Model For E-Commerce
3 pages
Electronic Reporting System A Complete Guide
From Everand
Electronic Reporting System A Complete Guide
Gerardus Blokdyk
No ratings yet
Performance Evaluation: Queues and Markov
From Everand
Performance Evaluation: Queues and Markov
Pasquale De Marco
No ratings yet
PM Guided Project
No ratings yet
PM Guided Project
25 pages
AIML7
No ratings yet
AIML7
12 pages
Mobile User Objective System Second Edition
From Everand
Mobile User Objective System Second Edition
Gerardus Blokdyk
No ratings yet
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
Open-Source Telecom Operations Management Systems A Clear and Concise Reference
From Everand
Open-Source Telecom Operations Management Systems A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
The Subscription Model Masterclass: Building Sustainable eCommerce Revenue
From Everand
The Subscription Model Masterclass: Building Sustainable eCommerce Revenue
Anthony Bailey
No ratings yet
Managed Machine-to-Machine Communication Services The Ultimate Step-By-Step Guide
From Everand
Managed Machine-to-Machine Communication Services The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
The Software Enigma: Navigating the Metrics Maze
From Everand
The Software Enigma: Navigating the Metrics Maze
Pasquale De Marco
No ratings yet
Transmission system operator A Complete Guide
From Everand
Transmission system operator A Complete Guide
Gerardus Blokdyk
No ratings yet
Statistical energy analysis A Complete Guide
From Everand
Statistical energy analysis A Complete Guide
Gerardus Blokdyk
No ratings yet
Integrated Transport Information System Complete Self-Assessment Guide
From Everand
Integrated Transport Information System Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Network Rail A Complete Guide
From Everand
Network Rail A Complete Guide
Gerardus Blokdyk
No ratings yet
Marketing Analytics Unit 3
No ratings yet
Marketing Analytics Unit 3
54 pages
Building a Retirement Planner Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Retirement Planner
From Everand
Building a Retirement Planner Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Retirement Planner
Lumavalle Press
No ratings yet
IT Infrastructure Monitoring The Ultimate Step-By-Step Guide
From Everand
IT Infrastructure Monitoring The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Wireless Data Communication The Ultimate Step-By-Step Guide
From Everand
Wireless Data Communication The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Data Breach A Complete Guide - 2020 Edition
From Everand
Data Breach A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Wireless Expense Management Third Edition
From Everand
Wireless Expense Management Third Edition
Gerardus Blokdyk
No ratings yet
Customer-Centric Web and Mobile Technologies Third Edition
From Everand
Customer-Centric Web and Mobile Technologies Third Edition
Gerardus Blokdyk
No ratings yet
Oracle CRM On Demand Administration Essentials
From Everand
Oracle CRM On Demand Administration Essentials
Padmanabha Rao
No ratings yet
Revenue Predictor - Udit Ennam PDF
No ratings yet
Revenue Predictor - Udit Ennam PDF
30 pages
Computer network operations A Complete Guide
From Everand
Computer network operations A Complete Guide
Gerardus Blokdyk
No ratings yet
Data model Second Edition
From Everand
Data model Second Edition
Gerardus Blokdyk
No ratings yet
GSM services A Clear and Concise Reference
From Everand
GSM services A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Provider Network Management Applications Complete Self-Assessment Guide
From Everand
Provider Network Management Applications Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Transaction Network Services Third Edition
From Everand
Transaction Network Services Third Edition
Gerardus Blokdyk
No ratings yet
Reference Report 2
No ratings yet
Reference Report 2
43 pages
02-Linear Regression Project - Solutions
No ratings yet
02-Linear Regression Project - Solutions
12 pages
CS229 Project Final Write-Up Predictive Analytics For E-Commerce Customer Behavior and Demand Forecasting Team Members
No ratings yet
CS229 Project Final Write-Up Predictive Analytics For E-Commerce Customer Behavior and Demand Forecasting Team Members
6 pages
Grid Computing Second Edition
From Everand
Grid Computing Second Edition
Gerardus Blokdyk
No ratings yet
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
Mobile and Wireless Infrastructure Software Platforms Third Edition
From Everand
Mobile and Wireless Infrastructure Software Platforms Third Edition
Gerardus Blokdyk
No ratings yet
Business Intelligence Questions, Analytical & Reporting Hint
From Everand
Business Intelligence Questions, Analytical & Reporting Hint
Dr. Zemelak Goraga
No ratings yet
Management Accounting: Business Strategy & Performance: Decision-Making by Numbers
From Everand
Management Accounting: Business Strategy & Performance: Decision-Making by Numbers
Commerce Central
No ratings yet
Electronic Data Capture EDC A Clear and Concise Reference
From Everand
Electronic Data Capture EDC A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Data grid The Ultimate Step-By-Step Guide
From Everand
Data grid The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Telecommunications service Standard Requirements
From Everand
Telecommunications service Standard Requirements
Gerardus Blokdyk
No ratings yet
Cloud Infrastructure Management Interface The Ultimate Step-By-Step Guide
From Everand
Cloud Infrastructure Management Interface The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
assignment-1
No ratings yet
assignment-1
4 pages
Coding and Communication in Statistics Presentation 2024
No ratings yet
Coding and Communication in Statistics Presentation 2024
11 pages
Construction Protocol of Circle Area and Circumference in Geogebra
No ratings yet
Construction Protocol of Circle Area and Circumference in Geogebra
3 pages
8 PDF
No ratings yet
8 PDF
5 pages
Classification of Handtools
No ratings yet
Classification of Handtools
20 pages
SSC Schedule
No ratings yet
SSC Schedule
3 pages
Spandrel Panel
No ratings yet
Spandrel Panel
1 page
Calorimetry, Heat Transfer, Thermal Expansion, Elasticity
No ratings yet
Calorimetry, Heat Transfer, Thermal Expansion, Elasticity
1 page
Chapter 4 Exercise 11
No ratings yet
Chapter 4 Exercise 11
5 pages
edmo jacks
No ratings yet
edmo jacks
5 pages
Math 10 - Division Summative Test 2023
100% (1)
Math 10 - Division Summative Test 2023
7 pages
The Language of Water Chemistry: - Gramme-Equivalent
No ratings yet
The Language of Water Chemistry: - Gramme-Equivalent
5 pages
The Adventure To Neptune Math Essay by Grisela Sendjojo Isel
No ratings yet
The Adventure To Neptune Math Essay by Grisela Sendjojo Isel
3 pages
Xu Et Al 2023
No ratings yet
Xu Et Al 2023
7 pages
Thermal Design, Analysis and Test Validation of Turksat-3Usat Satellite
No ratings yet
Thermal Design, Analysis and Test Validation of Turksat-3Usat Satellite
15 pages
Steam Distribution and Primary Equipment Index: Index Pg1 Biography Pg2 Preface Pg2 Conclusion - Final Thought PG 37
No ratings yet
Steam Distribution and Primary Equipment Index: Index Pg1 Biography Pg2 Preface Pg2 Conclusion - Final Thought PG 37
38 pages
Problem Set For General Physics 1
No ratings yet
Problem Set For General Physics 1
1 page
Manual Seisan 8.1
No ratings yet
Manual Seisan 8.1
259 pages
2024_April_UG_919 B.Sc (Interior Design)_919 B.Sc (Interior Design)
No ratings yet
2024_April_UG_919 B.Sc (Interior Design)_919 B.Sc (Interior Design)
24 pages
Configuring Omron InduSoft Web Studio To Communicate by Tag Name
No ratings yet
Configuring Omron InduSoft Web Studio To Communicate by Tag Name
12 pages
Mini project 1
No ratings yet
Mini project 1
18 pages
Aiwa nsx-dr5 SM
No ratings yet
Aiwa nsx-dr5 SM
19 pages
Can Seaming. Moran. (1999)
No ratings yet
Can Seaming. Moran. (1999)
19 pages
Chapter 4 PowerPoint
No ratings yet
Chapter 4 PowerPoint
76 pages
Nuclei - DPP 01 - (Lakshya JEE 2.0 2023)
No ratings yet
Nuclei - DPP 01 - (Lakshya JEE 2.0 2023)
3 pages
4 - Regression Analysis
No ratings yet
4 - Regression Analysis
27 pages
Piperack's Basic Requirements
No ratings yet
Piperack's Basic Requirements
3 pages
LTE 574-IP Transport Network Measurements
No ratings yet
LTE 574-IP Transport Network Measurements
10 pages
Mice and Mystics Flowchart
No ratings yet
Mice and Mystics Flowchart
5 pages
Arrays and Jagged Array
No ratings yet
Arrays and Jagged Array
8 pages
Ecs S21ii1 Rev B SCH
100% (1)
Ecs S21ii1 Rev B SCH
36 pages

Project

Uploaded by

Project

Uploaded by

Predictive Modelling with Linear Regression

Project 1 (Multivariate Statistics

Student ID| 8913789

1. Data Set: E-Commerce Customers --------------------------------------------------------------2

Additionally, considering possible non-linear relationships, it might be beneficial to include the

The following code will be used to find the regression line:

 model1= lm(`Yearly Amount Spent` ~ `Avg. Session Length`+ `Time on App`+

The coefficients are:

The following code will be used to find the regression line:

 m1= lm(DTC$`Yearly Amount Spent` ~ DTC$`Avg. Session Length`)

The coefficients are:

The code for plot along with regression line is:

 plot(DTC$`Avg. Session Length`, DTC$`Yearly Amount Spent`,

The result is:

 DTCSqrModel1 = lm(`Yearly Amount Spent` ~ `Avg. Session Length`

The coefficients are:

The following code will be used to find the regression line:

 m2= lm(DTC$`Yearly Amount Spent` ~ DTC$`Time on App`)

The coefficients are:

The code for plot is:

The result is:

 DTCSqrModel2 = lm(`Yearly Amount Spent` ~ `Time on App`

The coefficients are:

Intercept Time on App I(Time on App^2)

The following code will be used to find the regression line:

 m3= lm(DTC$`Yearly Amount Spent` ~ DTC$`Time on Website`)

The coefficients are:

The result is:

 DTCSqrModel3 = lm(`Yearly Amount Spent` ~ `Time on App`

The coefficients are:

Intercept Time on App I(Time on App^2)

The following code will be used to find the regression line:

The coefficients are:

The result is:

This relationship seems to be linear.

The Residue vs fitted plot:

The code in “R” is:

The resulted plot is:

Figure 1 (Residue vs Fitted Values)

The code in “R “ is:

 hist(model1Resids, main = "Histogram of Residues", xlab = "Residues", ylab =

The result is:

The code in “R” is:

The result is:

QQ plot should be straight….

The code is:

 small_model= train(form = `Yearly Amount Spent` ~ `Avg. Session Length`,data =

The resampling results are:

RMSE R squared MAE

The code is:

 Model2 = train( form = `Yearly Amount Spent` ~ `Avg. Session Length` +

The resampling results are:

RMSE R squared MAE

The plot is:

 ModelN = train( form = `Yearly Amount Spent` ~ `Avg. Session Length` +

RMSE R squared MAE

 predicted_values = predict(ModelN, newdata = DTC)

Thus, from the results we observe:

You might also like