0% found this document useful (0 votes)
16 views

Multivariate Linear Regression

The document shows code for analyzing a home price dataset using pandas and scikit-learn. It cleans the data by filling NaN values, then performs linear regression to predict home prices based on features like area, bedrooms, and age of home.

Uploaded by

jaymehta1444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Multivariate Linear Regression

The document shows code for analyzing a home price dataset using pandas and scikit-learn. It cleans the data by filling NaN values, then performs linear regression to predict home prices based on features like area, bedrooms, and age of home.

Uploaded by

jaymehta1444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3/10/24, 4:35 PM MlYtLec3.

ipynb - Colaboratory

keyboard_arrow_down Yt lec
import pandas as pd
import numpy as np
import math
from sklearn import linear_model

df = pd.read_csv("homeprices.csv")
df

area bedrooms age price

0 2600 3.0 20 550000

1 3000 4.0 15 565000

2 3200 NaN 18 610000

3 3600 3.0 30 595000

4 4000 5.0 8 760000

5 4100 6.0 8 810000

Next steps: Generate code with df


toggle_off View recommended plots

med = math.floor(df.bedrooms.median()) # finding the median of the col bedrooms


med

df.bedrooms = df.bedrooms.fillna(med) # filling all the NaN values with the median
df

area bedrooms age price

0 2600 3.0 20 550000

1 3000 4.0 15 565000

2 3200 4.0 18 610000

3 3600 3.0 30 595000

4 4000 5.0 8 760000

5 4100 6.0 8 810000

https://colab.research.google.com/drive/1mWjym7fs-JIEhrgZV1pu0Ey6ErZKC5EL#scrollTo=79BoMxPt-T8E&printMode=true 1/5
3/10/24, 4:35 PM MlYtLec3.ipynb - Colaboratory

Next steps: Generate code with df


toggle_off View recommended plots

reg = linear_model.LinearRegression()
reg.fit(df[['area','bedrooms', 'age']], df.price)

▾ LinearRegression
LinearRegression()

reg.coef_

array([ 112.06244194, 23388.88007794, -3231.71790863])

reg.intercept_

221323.00186540396

reg.predict([[3000,3,40]])

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
array([498408.25158031])

reg.predict([[2500,4,5]])

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
array([578876.03748933])

keyboard_arrow_down Exercise
import pandas as pd
import numpy as np
import math
from sklearn import linear_model

df1 = pd.read_csv("hiring.csv")
df1

https://colab.research.google.com/drive/1mWjym7fs-JIEhrgZV1pu0Ey6ErZKC5EL#scrollTo=79BoMxPt-T8E&printMode=true 2/5
3/10/24, 4:35 PM MlYtLec3.ipynb - Colaboratory

experience test_score(out of 10) interview_score(out of 10) salary($)

0 NaN 8.0 9 50000

1 NaN 8.0 6 45000

2 five 6.0 7 60000

3 two 10.0 10 65000

4 seven 9.0 6 70000

5 three 7.0 10 62000

6 ten NaN 7 72000

7 eleven 7.0 8 80000

Next steps: Generate code with df1


toggle_off View recommended plots

df1.experience = df1.experience.fillna("zero") # we replace all the NaN values in 'experience' col with 'zero'
df1

experience test_score(out of 10) interview_score(out of 10) salary($)

0 zero 8.0 9 50000

1 zero 8.0 6 45000

2 five 6.0 7 60000

3 two 10.0 10 65000

4 seven 9.0 6 70000

5 three 7.0 10 62000

6 ten NaN 7 72000

7 eleven 7.0 8 80000

Next steps: Generate code with df1


toggle_off View recommended plots

pip install word2number

Requirement already satisfied: word2number in /usr/local/lib/python3.10/dist-packages (1.1)

from word2number import w2n

https://colab.research.google.com/drive/1mWjym7fs-JIEhrgZV1pu0Ey6ErZKC5EL#scrollTo=79BoMxPt-T8E&printMode=true 3/5
3/10/24, 4:35 PM MlYtLec3.ipynb - Colaboratory
df1.experience = df1.experience.apply(w2n.word_to_num) # .apply() applies a function to arguments. w2n.word_to_num converts every text number into numeric value
df1

experience test_score(out of 10) interview_score(out of 10) salary($)

0 0 8.0 9 50000

1 0 8.0 6 45000

2 5 6.0 7 60000

3 2 10.0 10 65000

4 7 9.0 6 70000

5 3 7.0 10 62000

6 10 NaN 7 72000

7 11 7.0 8 80000

Next steps: Generate code with df1


toggle_off View recommended plots

mean = math.floor(df1['test_score(out of 10)'].mean()) # finding out the mean of all the values present in the 'test_score' col
print("The mean is :",mean)

df1['test_score(out of 10)'] = df1['test_score(out of 10)'].fillna(mean) # replacing the NaN values with the mean
df1

The mean is : 7
experience test_score(out of 10) interview_score(out of 10) salary($)

0 0 8.0 9 50000

1 0 8.0 6 45000

2 5 6.0 7 60000

3 2 10.0 10 65000

4 7 9.0 6 70000

5 3 7.0 10 62000

6 10 7.0 7 72000

7 11 7.0 8 80000

Next steps: Generate code with df1


toggle_off View recommended plots

reg1 = linear_model.LinearRegression()
reg1.fit(df1[['experience','test_score(out of 10)','interview_score(out of 10)']], df1['salary($)'])

https://colab.research.google.com/drive/1mWjym7fs-JIEhrgZV1pu0Ey6ErZKC5EL#scrollTo=79BoMxPt-T8E&printMode=true 4/5
3/10/24, 4:35 PM MlYtLec3.ipynb - Colaboratory

▾ LinearRegression
LinearRegression()

reg1.coef_

array([2922.26901502, 2221.30909959, 2147.48256637])

reg1.intercept_

14992.65144669314

reg1.predict([[2,9,6]])

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
array([53713.86677124])

reg1.predict([[12,10,10]])

/usr/local/lib/python3.10/dist-packages/sklearn/base.py:439: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
warnings.warn(
array([93747.79628651])

https://colab.research.google.com/drive/1mWjym7fs-JIEhrgZV1pu0Ey6ErZKC5EL#scrollTo=79BoMxPt-T8E&printMode=true 5/5

You might also like