Multiple Linear Regression

Multiple Linear Regression (MLR) extends Simple Linear Regression by using multiple independent variables to predict a single dependent variable. It is essential for modeling linear relationships and involves steps such as data pre-processing, fitting the model, and predicting outcomes. The document also outlines an implementation example using a dataset of start-up companies to predict profits based on various expenditures and states.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Multiple Linear Regression

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Multiple Linear Regression

In the previous topic, we have learned about Simple Linear Regression, where a single
Independent/Predictor(X) variable is used to model the response variable (Y). But there may
be various cases in which the response variable is affected by more than one predictor variable;
for such cases, the Multiple Linear Regression algorithm is used.
Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it takes
more than one predictor variable to predict the response variable. We can define it as:
Multiple Linear Regression is one of the important regression algorithms which models the
linear relationship between a single dependent continuous variable and more than one
independent variable.
Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.
Some key points about MLR:
o For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
o Each feature variable must model the linear relationship with the dependent variable.
o MLR tries to fit a regression line through a multidimensional space of data-points.
MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple
predictor variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so
the same is applied for the multiple linear regression equation, the equation becomes:
1. Y= b0+b1x1+ b2x2
+ b3x3+...... bnxn ............... (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3 , bn....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable
Assumptions for Multiple Linear Regression:
o A linear relationship should exist between the Target and predictor variables.
o The regression residuals must be normally distributed.
o MLR assumes little or no multicollinearity (correlation between the independent
variable) in data.
Implementation of Multiple Linear Regression model using Python:
To implement MLR using Python, we have below problem:
Problem Description:
We have a dataset of 50 start-up companies. This dataset contains five main
information: R&D Spend, Administration Spend, Marketing Spend, State, and Profit for
a financial year. Our goal is to create a model that can easily determine which company has a
maximum profit, and which is the most affecting factor for the profit of a company.
Since we need to find the Profit, so it is the dependent variable, and the other four variables are
independent variables. Below are the main steps of deploying the MLR model:
1. Data Pre-processing Steps
2. Fitting the MLR model to the training set
3. Predicting the result of the test set
Step-1: Data Pre-processing Step:
The very first step is data pre-processing, which we have already discussed in this tutorial. This
process contains the below steps:
o Importing libraries: Firstly we will import the library which will help in building the
model. Below is the code for it:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
o Importing dataset: Now we will import the dataset(50_CompList), which contains all
the variables. Below is the code for it:
1. #importing datasets
2. data_set= pd.read_csv('50_CompList.csv')
Output: We will get the dataset as:
In above output, we can clearly see that there are five variables, in which four variables are
continuous and one is categorical variable.
o Extracting dependent and independent Variables:
1. #Extracting Independent and dependent Variable
2. x= data_set.iloc[:, :-1].values
3. y= data_set.iloc[:, 4].values
Output:
Out[5]:
array([[165349.2, 136897.8, 471784.1, 'New York'],
[162597.7, 151377.59, 443898.53, 'California'],
[153441.51, 101145.55, 407934.54, 'Florida'],
[144372.41, 118671.85, 383199.62, 'New York'],
[142107.34, 91391.77, 366168.42, 'Florida'],
[131876.9, 99814.71, 362861.36, 'New York'],
[134615.46, 147198.87, 127716.82, 'California'],
[130298.13, 145530.06, 323876.68, 'Florida'],
[120542.52, 148718.95, 311613.29, 'New York'],
[123334.88, 108679.17, 304981.62, 'California'],
[101913.08, 110594.11, 229160.95, 'Florida'],
[100671.96, 91790.61, 249744.55, 'California'],
[93863.75, 127320.38, 249839.44, 'Florida'],
[91992.39, 135495.07, 252664.93, 'California'],
[119943.24, 156547.42, 256512.92, 'Florida'],
[114523.61, 122616.84, 261776.23, 'New York'],
[78013.11, 121597.55, 264346.06, 'California'],
[94657.16, 145077.58, 282574.31, 'New York'],
[91749.16, 114175.79, 294919.57, 'Florida'],
[86419.7, 153514.11, 0.0, 'New York'],
[76253.86, 113867.3, 298664.47, 'California'],
[78389.47, 153773.43, 299737.29, 'New York'],
[73994.56, 122782.75, 303319.26, 'Florida'],
[67532.53, 105751.03, 304768.73, 'Florida'],
[77044.01, 99281.34, 140574.81, 'New York'],
[64664.71, 139553.16, 137962.62, 'California'],
[75328.87, 144135.98, 134050.07, 'Florida'],
[72107.6, 127864.55, 353183.81, 'New York'],
[66051.52, 182645.56, 118148.2, 'Florida'],
[65605.48, 153032.06, 107138.38, 'New York'],
[61994.48, 115641.28, 91131.24, 'Florida'],
[61136.38, 152701.92, 88218.23, 'New York'],
[63408.86, 129219.61, 46085.25, 'California'],
[55493.95, 103057.49, 214634.81, 'Florida'],
[46426.07, 157693.92, 210797.67, 'California'],
[46014.02, 85047.44, 205517.64, 'New York'],
[28663.76, 127056.21, 201126.82, 'Florida'],
[44069.95, 51283.14, 197029.42, 'California'],
[20229.59, 65947.93, 185265.1, 'New York'],
[38558.51, 82982.09, 174999.3, 'California'],
[28754.33, 118546.05, 172795.67, 'California'],
[27892.92, 84710.77, 164470.71, 'Florida'],
[23640.93, 96189.63, 148001.11, 'California'],
[15505.73, 127382.3, 35534.17, 'New York'],
[22177.74, 154806.14, 28334.72, 'California'],
[1000.23, 124153.04, 1903.93, 'New York'],
[1315.46, 115816.21, 297114.46, 'Florida'],
[0.0, 135426.92, 0.0, 'California'],
[542.05, 51743.15, 0.0, 'New York'],
[0.0, 116983.8, 45173.06, 'California']], dtype=object)
As we can see in the above output, the last column contains categorical variables which are not
suitable to apply directly for fitting the model. So we need to encode this variable.
Encoding Dummy Variables:
As we have one categorical variable (State), which cannot be directly applied to the model, so
we will encode it. To encode the categorical variable into numbers, we will use
the LabelEncoder class. But it is not sufficient because it still has some relational order, which
may create a wrong model. So in order to remove this problem, we will use OneHotEncoder,
which will create the dummy variables. Below is code for it:
1. #Catgorical data
2. from sklearn.preprocessing import LabelEncoder, OneHotEncoder
3. labelencoder_x= LabelEncoder()
4. x[:, 3]= labelencoder_x.fit_transform(x[:,3])
5. onehotencoder= OneHotEncoder(categorical_features= [3])
6. x= onehotencoder.fit_transform(x).toarray()
Here we are only encoding one independent variable, which is state as other variables are
continuous.
Output:
As we can see in the above output, the state column has been converted into dummy variables
(0 and 1). Here each dummy variable column is corresponding to the one State. We can
check by comparing it with the original dataset. The first column corresponds to the California
State, the second column corresponds to the Florida State, and the third column corresponds
to the New York State.
Note: We should not use all the dummy variables at the same time, so it must be 1 less than the
total number of dummy variables, else it will create a dummy variable trap.
o Now, we are writing a single line of code just to avoid the dummy variable trap:
1. #avoiding the dummy variable trap:
2. x = x[:, 1:]
If we do not remove the first dummy variable, then it may introduce multicollinearity in the
model.
As we can see in the above output image, the first column has been removed.
o Now we will split the dataset into training and test set. The code for this is given below:
1. # Splitting the dataset into training and test set.
2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)
The above code will split our dataset into a training set and test set.
Output: The above code will split the dataset into training set and test set. You can check the
output by clicking on the variable explorer option given in Spyder IDE. The test set and training
set will look like the below image:
Test set:
Training set:

Note: In MLR, we will not do feature scaling as it is taken care by the library, so we don't need
to do it manually.
Step: 2- Fitting our MLR model to the Training set:
Now, we have well prepared our dataset in order to provide training, which means we will fit
our regression model to the training set. It will be similar to as we did in Simple Linear
Regression model. The code for this will be:
1. #Fitting the MLR model to the training set:
2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)
Output:
Out[9]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
Now, we have successfully trained our model using the training dataset. In the next step, we
will test the performance of the model using the test dataset.
Step: 3- Prediction of Test set results:
The last step for our model is checking the performance of the model. We will do it by
predicting the test set result. For prediction, we will create a y_pred vector. Below is the code
for it:
1. #Predicting the Test set result;
2. y_pred= regressor.predict(x_test)
By executing the above lines of code, a new vector will be generated under the variable explorer
option. We can test our model by comparing the predicted values and test set values.
Output:

In the above output, we have predicted result set and test set. We can check model performance
by comparing these two value index by index. For example, the first index has a predicted value
of 103015$ profit and test/real value of 103282$ profit. The difference is only of 267$, which
is a good prediction, so, finally, our model is completed here.
o We can also check the score for training dataset and test dataset. Below is the code for
it:
1. print('Train Score: ', regressor.score(x_train, y_train))
2. print('Test Score: ', regressor.score(x_test, y_test))
Output: The score is:
Train Score: 0.9501847627493607
Test Score: 0.9347068473282446
The above score tells that our model is 95% accurate with the training dataset and 93%
accurate with the test dataset.
Note: In the next topic, we will see how we can improve the performance of the model using
the Backward Elimination process.
Applications of Multiple Linear Regression:
There are mainly two applications of Multiple Linear Regression:
o Effectiveness of Independent variable on prediction:
o Predicting the impact of changes:

Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Campey - Imants Shockwave 100-155-210 - Operators Manual
No ratings yet
Campey - Imants Shockwave 100-155-210 - Operators Manual
30 pages
P630 Testing Procedure
100% (1)
P630 Testing Procedure
18 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds Jquery
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds Jquery
12 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
CSL0777 L16
No ratings yet
CSL0777 L16
25 pages
ML manoj
No ratings yet
ML manoj
51 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Multiple_Linear_Regression - Colaboratory
No ratings yet
Multiple_Linear_Regression - Colaboratory
5 pages
Unit 5
No ratings yet
Unit 5
171 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
11 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Regression Model
No ratings yet
Regression Model
6 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
LR-LogReg
No ratings yet
LR-LogReg
53 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
04 Multiple Linear Regression
No ratings yet
04 Multiple Linear Regression
17 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
Ilovepdf_merged (1)_merged - Copy
No ratings yet
Ilovepdf_merged (1)_merged - Copy
30 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
Model_learning_steps
No ratings yet
Model_learning_steps
12 pages
ML Experiment No 1 Linear Regression Analysis
No ratings yet
ML Experiment No 1 Linear Regression Analysis
3 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
ml
No ratings yet
ml
17 pages
Experiment1 Explanation
No ratings yet
Experiment1 Explanation
6 pages
ML Practical File
No ratings yet
ML Practical File
30 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
ML Unit
No ratings yet
ML Unit
23 pages
ML LAB FILE (2)
No ratings yet
ML LAB FILE (2)
48 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
03 Multiple Linear Regression
No ratings yet
03 Multiple Linear Regression
7 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
Handling Missing Values in A Real-Time Dataset During
No ratings yet
Handling Missing Values in A Real-Time Dataset During
5 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
CH - En.u4cse19101 Cheduri Linearregression
No ratings yet
CH - En.u4cse19101 Cheduri Linearregression
8 pages
SMEC ML LAB MANUAL R22
No ratings yet
SMEC ML LAB MANUAL R22
21 pages
Mod2_Multiple Linear Regression
No ratings yet
Mod2_Multiple Linear Regression
10 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
ML Practical File
100% (2)
ML Practical File
43 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
ML Using Python Unit3 pdf
No ratings yet
ML Using Python Unit3 pdf
8 pages
Data analytics
No ratings yet
Data analytics
10 pages
D.Y. Patil College of Engineering, Akurdi Department of Electronics & Telecommunication Engineering
No ratings yet
D.Y. Patil College of Engineering, Akurdi Department of Electronics & Telecommunication Engineering
41 pages
Vertopal.com C1 W2 Lab02 Multiple Variable Soln
No ratings yet
Vertopal.com C1 W2 Lab02 Multiple Variable Soln
11 pages
DA_012307
No ratings yet
DA_012307
8 pages
Linear Regression Mca Lab - Jupyter Notebook
No ratings yet
Linear Regression Mca Lab - Jupyter Notebook
2 pages
19BCS2059 DL1
No ratings yet
19BCS2059 DL1
4 pages
Machine
100% (1)
Machine
45 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
XLR Ju 1
No ratings yet
XLR Ju 1
7 pages
External Alarm Commands - Cen
No ratings yet
External Alarm Commands - Cen
16 pages
4X - The Best Penta Scanner Ever: Standard Equipment
No ratings yet
4X - The Best Penta Scanner Ever: Standard Equipment
2 pages
04 Light and Shading 2023
No ratings yet
04 Light and Shading 2023
29 pages
Book 7 4
No ratings yet
Book 7 4
36 pages
Elna (Radial Thru-Hole) RJJ Series
No ratings yet
Elna (Radial Thru-Hole) RJJ Series
3 pages
BDM Sig Awning Window Training Booklet v13
No ratings yet
BDM Sig Awning Window Training Booklet v13
4 pages
SSINA - Designer's Handbook - Stainless Steel Fasteners
No ratings yet
SSINA - Designer's Handbook - Stainless Steel Fasteners
23 pages
EdiVet MOOC - Sheep Notes
No ratings yet
EdiVet MOOC - Sheep Notes
2 pages
CPath Basic Examination of Urine Canvas
No ratings yet
CPath Basic Examination of Urine Canvas
1 page
Perry Sprawls Chapter 1: MRI Image Characteristics Introduction and
No ratings yet
Perry Sprawls Chapter 1: MRI Image Characteristics Introduction and
94 pages
Shit Goes On
No ratings yet
Shit Goes On
20 pages
A An The or Nothing New Grammar Practice Grammar Drills - 100923
No ratings yet
A An The or Nothing New Grammar Practice Grammar Drills - 100923
1 page
Electro List
No ratings yet
Electro List
848 pages
Specifications: Titan 265/230/200
No ratings yet
Specifications: Titan 265/230/200
4 pages
Pathophysiology
No ratings yet
Pathophysiology
2 pages
My Knight The Lioness
100% (1)
My Knight The Lioness
221 pages
Team Together 1 Unit 6
No ratings yet
Team Together 1 Unit 6
3 pages
FCE Progress Test 2 PDF
No ratings yet
FCE Progress Test 2 PDF
2 pages
Fisher Temperament Inventory Results
No ratings yet
Fisher Temperament Inventory Results
1 page
Make Her Cum
No ratings yet
Make Her Cum
8 pages
De Thi Giua Ky 2 Anh 9
No ratings yet
De Thi Giua Ky 2 Anh 9
12 pages
Population Explosion in India
0% (1)
Population Explosion in India
17 pages
SKKL 2 Nov
No ratings yet
SKKL 2 Nov
44 pages
MVPS4400 - 5500-A5-TA-en-11
No ratings yet
MVPS4400 - 5500-A5-TA-en-11
40 pages
Chapter 2. Basic Drawing and Editing Commands Updated
No ratings yet
Chapter 2. Basic Drawing and Editing Commands Updated
40 pages
Center For Interpretation of The Battle of Atoleiros - Gonçalo Byrne Arquitectos + Oficina Ideias em Linha - ArchDaily
No ratings yet
Center For Interpretation of The Battle of Atoleiros - Gonçalo Byrne Arquitectos + Oficina Ideias em Linha - ArchDaily
14 pages
Cement-Treated Base (CTB)
100% (1)
Cement-Treated Base (CTB)
2 pages

Multiple Linear Regression

Uploaded by

Multiple Linear Regression

Uploaded by

Multiple Linear Regression

You might also like