0% found this document useful (0 votes)
26 views

Simple$MultipleLinearRegression RobinTeotia

The document discusses using simple and multiple linear regression models to analyze a dataset containing information on 50 startups in order to determine which factors like R&D spending, administration costs, and marketing spending most impact profit levels and provide recommendations to a startup founder on where to focus investments. Simple linear regression found marketing spending alone did not sufficiently explain profit variation, while multiple linear regression identified R&D spending as promising for increasing profit but found administration spending was not advisable.

Uploaded by

hani.sharma324
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Simple$MultipleLinearRegression RobinTeotia

The document discusses using simple and multiple linear regression models to analyze a dataset containing information on 50 startups in order to determine which factors like R&D spending, administration costs, and marketing spending most impact profit levels and provide recommendations to a startup founder on where to focus investments. Simple linear regression found marketing spending alone did not sufficiently explain profit variation, while multiple linear regression identified R&D spending as promising for increasing profit but found administration spending was not advisable.

Uploaded by

hani.sharma324
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Simple and Multiple Linear Regression

BUSI650-Robin Teotia
BUSINESS PROBLEM ?
Your Pointy-Haired Boss, is going to start a new start up in USA.
For that, he wants you (Strategic Business Analyst) to do a data
modeling research to generate a clear insight in which domain of
business, he should priorities his plan of financial investments to
have a good profit.

He also provided you the hints of the domains he planning to


invest like research, marketing etc.

You are Captain Sparrow, a Strategic Business Analyst with a


graduate degree in business analysis from UCW. You like taking
new challenges with the data and getting messy with it. You took
the responsibly of the data modeling research from your pointy-
haired boss and starts working on it.

As a result of your research, you found a 50 startups data of USA


and the journey of modeling begins…
This dataset has data collected from USA about 50 business Startups.
About 4 features
Data Quantitative in nature

Target/Dependent Variable
• Profit

Predictor/ Independent Variables


• R&D Spend,
• Administration, and
• Marketing Spend
Visualization and Exploratory Data Analysis
Correlation Coefficient
• a number between −1 and +1 calculated so
as to represent the linear dependence of
two variables or sets of data.
Data Visualization…
Simple Linear Regression Model… Estimating the coefficients…
• A straight line model with one independent variable is • We estimate β0 on b0 and β1 on b1, the y-intercept
called a first order linear model or a simple linear and slope of the least square or regression line
regression model. given by :
• 𝑦ො = b0 +b1x

• Note: this is an application of least square method


and it produces a straight line that minimizes the
sum of the squared difference between the points
and the line.

Profit = β0 + β1Marketing Spent + ε

Estimation of the parameters

Thus, for a unit increase in “marketing


spent”, the profit is going to increase by
0.25$ on an average.

Profit = 6003.55 + 0.25Marketing Spent


Multiple Linear Regression Model… • Estimation of the parameters β0,…,βp by the method
• a generalization of simple linear regression, of least squares is based on the same principle as
• it makes possible to relate one variable (dependent) that of simple linear regression, but applied
with several variables (independent) through a linear to p dimensions.
function in its parameters.
• Instead of finding the best fitting line, finding the p-
dimensional plane which passes closest to the
coordinate points (yi,xi1,…,xip).

We interpret βj as the average effect on Y of a one unit increase in Xj , holding all other predictors fixed.

Profit = β0 + β1R&D Spend + β2Administration + β3Marketing Spent + ε

Estimation of the parameters

Profit = 50122.19 + 0.80R&D Spend - 0.03Administration + 0.03Marketing Spent


Performance of the regression model.

Coefficient of determination (R2) Mean squared error (MSE)


The R2 value is the statistical measure that computes provides a measurement of how far the predictions are from
the percentage of variance in the dependent variable the actual values on average.
around the mean that is explained by the model.

R-squared values range from 0 to 1 and are commonly


stated as percentages from 0% to 100%

n = number of observations
yi = actual value
𝑦𝑖=
ො predicted value
RSS = Sum of squares of residuals 𝑦ത = mean of the actual values
TSS = Total sum of squares
R-squared: 55.92 R-squared: 95.07
Mean Square Error: 701870011.19~701 m Mean Square Error: 78417126.01~78 m
You come up with your report after modelling the data from the 50 startups and
present your finding to your Pointy-Haired Boss.

You presented:
1. Spending on marketing alone would not be beneficial.(from simple linear
regression, the R2 is around 56%, this means that the model is only able to
explain 56 % of the variance in the dependent variable (Profit) around the
mean.)
2. This means, there are also some other factors/features which attributes to
the profit.
3. To spend on R&D is promising to increase profit. (since the p-value is
statistically significant).
4. To spend on administration is not advisable. (as the p-value is very high,
therefore, fails to reject the null
hypothesis- stating the slope of
administration is 0.)
Profit )
Your Own Thoughts:
• After seeing the above summary, you may think of
removing administration from the MLR modeling,
• and then want to perform the MLR again and see
model performance.
Appendix…
Assumptions for Simple Linear Regression and Multiple Linear Regression

• Linearity of the relationships between the dependent and independent variables.


• Independence of the observations
• Normality of the residuals
• Homoscedasticity of the residuals
• No influential points (outliers)
But there is one more condition for multiple linear regression:
• No multicollinearity: Multicollinearity arises when there is a strong
linear correlation between the independent variables, conditional on the other
variables in the model.
Appendix…

Link to the Google colab notebook

Programming Used- Python


Link:
https://colab.research.google.com/drive/1GiiE1t9GfdCqGKYsTrcrgFfrrnBMZjj
Y?usp=sharing
Data:
raw.githubusercontent.com/RobinTeotia/Data-Science-
Projects/main/50_Startups.csv

You might also like