0% found this document useful (0 votes)

15 views

Supervised Learning For Data Science...

This document discusses various supervised learning techniques including linear regression, multiple linear regression, and making predictions on new data. It loads data on tips and salaries, explores outliers, builds linear regression models to predict tip amounts and salaries based on features, evaluates the models, and makes predictions on new data using the multiple linear regression model. Feature engineering techniques like label encoding, one-hot encoding, and removing insignificant features are applied to build an optimal model.

Uploaded by

shivaybhargava33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Supervised Learning For Data Science...

Uploaded by

shivaybhargava33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Supervised Learning

Example Log Transformation

Import Libraries
import seaborn as sns
import numpy as np
import pandas as pd

Load Data Set and make a copy

tips =sns.load_dataset('tips')
tips1= tips
tips1

Create Box plot to check outliers

sns.boxplot (data = tips1 , x = 'day', y = 'total_bill' )

Create dist plot

sns.distplot(tips1['total_bill'])

Apply log Transformation to address outliers

tips1['total_bill'] = np.log10(tips1['total_bill'])

Create box plot and check outlier again

sns.boxplot (data = tips1 , x = 'day', y = 'total_bill' )

Create dist plot

sns.distplot(tips1['total_bill'])

Save the result in .xls

tips1.to_excel('C:\\Noble\\Training\\DS Temporary Files\\tips.xlsx')
Simple Linear regression –
Import the Libraries
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

Load the Data Set

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

os.getcwd()

df1= pd.read_csv('Salary_Data.csv')
print (df1)

create the graph to check the trend

plt.plot(df1["YearsExperience"], df1["Salary"])
plt.show()

Split the data into x and y - Independent and Dependent variable

x = df1.iloc[:,:-1].values
print (x)
y = df1.iloc[:,1].values
print (y)

Split the Data – Train Test split

from sklearn.model_selection import train_test_split
x_train, x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

Model fitting
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x_train, y_train)

Prediction
y_pred= reg.predict(x_test)
print (y_pred)

y= mx +c (Coefficient and Interceptor Values)

Y= slope
from sklearn.metrics import r2_score
print ('Coefficient', reg.coef_)
print ('Intercept', reg.intercept_)
Accuracy of the model
r2_score(y_test,y_pred)

Final Result in Data Frame

x_final = pd.DataFrame(x,columns= ['Experience'])
y_final = pd.DataFrame(y,columns= ['Salary'])
y_pred_final = pd.DataFrame(y_pred,columns= ['Salary Prediction'])
result = pd.concat([x_final,y_final,y_pred_final], axis =1)
print (result)
result.to_excel("C:\\Noble\\Training\\DS Temporary Files\\Simple
Regression.xlsx")
Create a Graph with predicted numbers
plt.scatter(x_train,y_train)
plt.plot (x_train,reg.predict(x_train),'red' )

predicted graph on test data

plt.scatter(x_test,y_test)
plt.plot (x_train,reg.predict(x_train),'red' )

Prediction for new set of data

y_pred= reg. predict ([[12], [9.6],[8.5], [2.5]])
print (y_pred)

Linear Regression Prediction with Data Frame

Import Libraries
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt

Change directory

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

os.getcwd()
Load Data Set

df1= pd.read_csv('Salary_Data.csv')
print (df1)

Plot Graph
plt.plot(df1["YearsExperience"], df1["Salary"])
plt.show()

X and Y as Data Frame

x = df1.iloc[:,:-1]
print (x)
y = df1.iloc[:,1]
print (y)

Train Test Split

from sklearn.model_selection import train_test_split

x_train, x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

Linear Regression

from sklearn.linear_model import LinearRegression

reg = LinearRegression()
reg.fit(x_train, y_train)
Prediction
y_pred= reg.predict(x_test)
print (y_pred)

Coefficient and Intercept

print ('Coefficient', reg.coef_)
print ('Intercept', reg.intercept_)

Accuracy
from sklearn.metrics import r2_score
r2_score(y_test,y_pred)

Export data to excel

y_pred_final = pd.DataFrame(reg.predict(x),columns= ['Salary Prediction'])

result = pd.concat([x,y,y_pred_final], axis =1)
print (result)
result.to_excel("C:\\Noble\\Training\\DS Temporary Files\\Simple
Regression.xlsx")
Multiple Linear regression –
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from sklearn.metrics import r2_score

Load Data Set

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')
df1=pd.read_csv('50_Startups.csv')
df1

Split x and y
x = df1.iloc[:,:-1].values
print (x)

y = df1.iloc[:,4].values
print (y)
Label Encoding
from sklearn.preprocessing import LabelEncoder
Label = LabelEncoder()
x[:,3]= Label.fit_transform(x[:,3])
print (x)

One Hot Encoding

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])],
remainder='passthrough')
x = np.array(ct.fit_transform(x))
print (x)

Print X as Data Frame

print (pd.DataFrame(x))

Split the data as train , test split

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test =train_test_split (x,y,test_size =
0.2,random_state= 42)

Create the Model

from sklearn.linear_model import LinearRegression
reg= LinearRegression()
reg.fit(x_train,y_train)

Predictions
y_pred= reg.predict(x_test)
print (y_pred)

Print Result
result = pd.concat([pd.DataFrame(y_pred),pd.DataFrame(y_test)], axis =1)
print (result)
Print Y and Prediction in one data frame - Concat
y_pre= pd.DataFrame(y_pred, columns =['Prediction'])
y_te = pd.DataFrame(y_test,columns= ['Actual'])
x_te = pd.DataFrame(x_test,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_te,y_te,y_pre], axis =1)
print (result)

Accuracy
r2_score(y_test, y_pred)
Regression Coefficient
reg.coef_

Regression Intercept
reg.intercept_

Ordinary Least Square Method

x=x.astype('float64')
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog = x)
reg_ols = reg_ols.fit()
print (reg_ols.summary())

Tune the Model by removing State Column (P Value Greater than 0.05)
Print the Data Frame
pd.DataFrame(x)

Create the OLS Method by removing the variable which has maximum P
Value – Remove Column 4

x_opt=x[:,[0,1,2,3,5]]
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog =x_opt)
reg_ols = reg_ols.fit()
print (reg_ols.summary())

Create the OLS Method by removing the variable which has maximum P
Value – Remove Column last Column

x_opt=x[:,[0,1,2,3]]
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog =x_opt)
reg_ols = reg_ols.fit()
print (reg_ols.summary())

All the variables with P Value < 0.05 removed , create the model again with
new data set

Train test Split

from sklearn.model_selection import train_test_split
xopt_train,xopt_test,y_train,y_test =train_test_split (x_opt,y,test_size =
0.2,random_state= 42)

Create Model

from sklearn.linear_model import LinearRegression

reg= LinearRegression()
reg.fit(xopt_train,y_train)

Prediction
yopt_pred= reg.predict(xopt_test)
print (yopt_pred)

Print Result
result = pd.concat([pd.DataFrame(yopt_pred),pd.DataFrame(y_test)], axis =1)
print (result)

Print Original Data Frame with Predicted Value

yopt_pre= pd.DataFrame(yopt_pred, columns =['Prediction'])

y_te = pd.DataFrame(y_test,columns= ['Actual'])
x_te = pd.DataFrame(x_test,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_te,y_te,yopt_pre], axis =1)
print (result)

Check Accuracy
r2_score(y_test, yopt_pred)

Prediction for All 50 records

yfull_pred= reg.predict(x_opt)
print (yfull_pred)

Accuracy
r2_score(y, yfull_pred)

Create the Model with only column R& D Spend

x_opt=x[:,3:4]
x_opt

Train Test Split

from sklearn.model_selection import train_test_split
xopt_train,xopt_test,y_train,y_test =train_test_split (x_opt,y,test_size =
0.2,random_state= 42)

Print Shape
print (xopt_train.shape)

Create Model with one column

from sklearn.linear_model import LinearRegression
freg= LinearRegression()
freg.fit(xopt_train,y_train)
Prediction and Check accuracy
yone_pred= freg.predict(x_opt)
r2_score(y, yone_pred)

Print the result as Graph

import seaborn as sns
sns.regplot( x = yone_pred, y = y, scatter_kws={"color": "b"}, line_kws={"color":
"r"},ci = None)

Prediction for New Data Set

Load new Data Set

df_Predict=pd.read_csv('50_Startups_Predictions.csv')
df_Predict

Count Number of Records

df_Predict.count()

Create Array
x_Predict = df_Predict.values
print (x_Predict)

Label Encoding
Label_Predict = LabelEncoder()
x_Predict[:,3]= Label_Predict.fit_transform(x_Predict[:,3])
print (x_Predict)

One Hot Encoding

Print X Values
print (pd.DataFrame(x_Predict))

Generate Predicted Values

xone_Predict= x_Predict[:,3:4]
yone_Predict= freg.predict(xone_Predict)
print (yone_Predict)

Display the result as Data Frame – with X

yone_Predict= pd.DataFrame(yone_Predict, columns =['Prediction'])

x_Predict = pd.DataFrame(x_Predict,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_Predict,yone_Predict], axis =1)
print (result)

Display the result with Actual Input Data Set

yone_Predict= pd.DataFrame(yone_Predict, columns =['Prediction'])
result = pd.concat([df_Predict,yone_Predict], axis =1)
print (result)

Approximation Algorithms PDF
No ratings yet
Approximation Algorithms PDF
37 pages
Data analytics
No ratings yet
Data analytics
10 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
ml_all_projectpdf_removed
No ratings yet
ml_all_projectpdf_removed
41 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
DA_012307
No ratings yet
DA_012307
8 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
ML Activity Kalyan
No ratings yet
ML Activity Kalyan
21 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Naive Bayes
No ratings yet
Naive Bayes
58 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
ml
No ratings yet
ml
17 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
Task8
No ratings yet
Task8
2 pages
Machine File
No ratings yet
Machine File
27 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
11 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
Linear Regression - Cheatsheet
No ratings yet
Linear Regression - Cheatsheet
8 pages
Linear Regression Besant
No ratings yet
Linear Regression Besant
11 pages
ml_6_7_8 (1)
No ratings yet
ml_6_7_8 (1)
10 pages
liner regression chapter N1
No ratings yet
liner regression chapter N1
1 page
ADS_EXP_01_B4_64
No ratings yet
ADS_EXP_01_B4_64
4 pages
LR
No ratings yet
LR
2 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
Machine Learnin
100% (2)
Machine Learnin
23 pages
ML Internal questions
No ratings yet
ML Internal questions
15 pages
Exp 1
No ratings yet
Exp 1
6 pages
Btech1007022_lab5.1
No ratings yet
Btech1007022_lab5.1
9 pages
Regression
No ratings yet
Regression
16 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
vertopal.com_Untitled
No ratings yet
vertopal.com_Untitled
3 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
C: Users Dell Downloads Salary - Data - CSV
No ratings yet
C: Users Dell Downloads Salary - Data - CSV
2 pages
ML Practicals
No ratings yet
ML Practicals
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Praktikum 1 Jupiter Machine Learning
No ratings yet
Praktikum 1 Jupiter Machine Learning
1 page
AIML PRACTICALS
No ratings yet
AIML PRACTICALS
22 pages
Btech1007022_lab5
No ratings yet
Btech1007022_lab5
14 pages
AI ML - Cycle 2 Programs (1)
No ratings yet
AI ML - Cycle 2 Programs (1)
15 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
2 Regression
No ratings yet
2 Regression
15 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Salary Prediction - Ipynb
No ratings yet
Salary Prediction - Ipynb
3 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
School of Engineering: Lab Manual On Machine Learning Lab
No ratings yet
School of Engineering: Lab Manual On Machine Learning Lab
23 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
6 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Regression Prac 9
No ratings yet
Regression Prac 9
8 pages
linear regression program
No ratings yet
linear regression program
2 pages
Dsbda 4
No ratings yet
Dsbda 4
4 pages
ML-journal
No ratings yet
ML-journal
45 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
PRJ Movie Recommendation Data Science..
No ratings yet
PRJ Movie Recommendation Data Science..
7 pages
SL Classification For Data Science..
No ratings yet
SL Classification For Data Science..
4 pages
Matplotlib For Data Science..
No ratings yet
Matplotlib For Data Science..
11 pages
Machine Learning Project Presentation
No ratings yet
Machine Learning Project Presentation
14 pages
Chapter 2 PPT Num.I.pptxxxxxx New
No ratings yet
Chapter 2 PPT Num.I.pptxxxxxx New
107 pages
ML Application in Signal Processing and Communication Engineering
No ratings yet
ML Application in Signal Processing and Communication Engineering
27 pages
Geilo Winter School 2012 Lecture 1: Introduction To FEM: Anders Logg
No ratings yet
Geilo Winter School 2012 Lecture 1: Introduction To FEM: Anders Logg
35 pages
Search Algorithms in AI
No ratings yet
Search Algorithms in AI
7 pages
Factor I Sing Exercises 132
No ratings yet
Factor I Sing Exercises 132
4 pages
WS4 Operations With Algebraic Expressions
No ratings yet
WS4 Operations With Algebraic Expressions
5 pages
Book 111
No ratings yet
Book 111
3 pages
Minimum Spanning Trees - Cormen Book Ch 23
No ratings yet
Minimum Spanning Trees - Cormen Book Ch 23
17 pages
Linear Equations One Step Edexcel
No ratings yet
Linear Equations One Step Edexcel
4 pages
Problem 10: Gauss-Seidelmethod: Import As
No ratings yet
Problem 10: Gauss-Seidelmethod: Import As
1 page
Descartes Rule
No ratings yet
Descartes Rule
48 pages
Midterms Numerical
No ratings yet
Midterms Numerical
7 pages
3.1-3.2. Absolute Extrema of A Function On An Interval Optimization Problems PDF
No ratings yet
3.1-3.2. Absolute Extrema of A Function On An Interval Optimization Problems PDF
15 pages
Summary
No ratings yet
Summary
3 pages
Mba or Unit-Iii Notes
No ratings yet
Mba or Unit-Iii Notes
12 pages
CH#4 Curve Fittings-19-01-2025
No ratings yet
CH#4 Curve Fittings-19-01-2025
24 pages
Approximation Algorithms II Max 3 SAT
No ratings yet
Approximation Algorithms II Max 3 SAT
5 pages
03 Ot-1
No ratings yet
03 Ot-1
15 pages
5.1A Polynomials: Basics: A. Definition of A Polynomial
No ratings yet
5.1A Polynomials: Basics: A. Definition of A Polynomial
5 pages
Application of Neural Network Models For Mathematical Programming Problems - A State of The Art Review
No ratings yet
Application of Neural Network Models For Mathematical Programming Problems - A State of The Art Review
12 pages
529C3C
No ratings yet
529C3C
3 pages
NM Laboratory 3 Roots of Non Linear Functions Open Methods
No ratings yet
NM Laboratory 3 Roots of Non Linear Functions Open Methods
13 pages
Final - Assessment-MOGAJI - GABRIEL - ROTIMI - R1812D7158691-UU-COM-3005-42931
No ratings yet
Final - Assessment-MOGAJI - GABRIEL - ROTIMI - R1812D7158691-UU-COM-3005-42931
22 pages
BIT 3209-Lecture 7 Divide and Conquer
No ratings yet
BIT 3209-Lecture 7 Divide and Conquer
27 pages
Saint Anthony Montessori Educational Network, Inc.: "Quality Education Is The Foundation of Your Child's Bright Future."
No ratings yet
Saint Anthony Montessori Educational Network, Inc.: "Quality Education Is The Foundation of Your Child's Bright Future."
2 pages
Algebra
No ratings yet
Algebra
3 pages
Picards Method
100% (4)
Picards Method
6 pages
Chap 3
No ratings yet
Chap 3
24 pages
QABD Group Assignment
No ratings yet
QABD Group Assignment
4 pages

Supervised Learning For Data Science...

Uploaded by

Supervised Learning For Data Science...

Uploaded by

Supervised Learning

Example Log Transformation

Load Data Set and make a copy

Create Box plot to check outliers

Create dist plot

Apply log Transformation to address outliers

Create box plot and check outlier again

Create dist plot

Save the result in .xls

Load the Data Set

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

create the graph to check the trend

Split the data into x and y - Independent and Dependent variable

Split the Data – Train Test split

y= mx +c (Coefficient and Interceptor Values)

Final Result in Data Frame

predicted graph on test data

Prediction for new set of data

Linear Regression Prediction with Data Frame

os.chdir('C:\\Noble\\Training\\Top Mentor\\Training\\Data Set\\')

X and Y as Data Frame

Train Test Split

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

Coefficient and Intercept

Export data to excel

y_pred_final = pd.DataFrame(reg.predict(x),columns= ['Salary Prediction'])

Load Data Set

One Hot Encoding

Print X as Data Frame

Split the data as train , test split

Create the Model

Ordinary Least Square Method

Train test Split

from sklearn.linear_model import LinearRegression

Print Original Data Frame with Predicted Value

yopt_pre= pd.DataFrame(yopt_pred, columns =['Prediction'])

Prediction for All 50 records

Create the Model with only column R& D Spend

Train Test Split

Create Model with one column

Print the result as Graph

Prediction for New Data Set

Load new Data Set

Count Number of Records

One Hot Encoding

Generate Predicted Values

Display the result as Data Frame – with X

yone_Predict= pd.DataFrame(yone_Predict, columns =['Prediction'])

Display the result with Actual Input Data Set

You might also like