Supervised Learning For Data Science...
Supervised Learning For Data Science...
df1= pd.read_csv('Salary_Data.csv')
print (df1)
Model fitting
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x_train, y_train)
Prediction
y_pred= reg.predict(x_test)
print (y_pred)
Import Libraries
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
Change directory
df1= pd.read_csv('Salary_Data.csv')
print (df1)
Plot Graph
plt.plot(df1["YearsExperience"], df1["Salary"])
plt.show()
x = df1.iloc[:,:-1]
print (x)
y = df1.iloc[:,1]
print (y)
Linear Regression
Accuracy
from sklearn.metrics import r2_score
r2_score(y_test,y_pred)
Split x and y
x = df1.iloc[:,:-1].values
print (x)
y = df1.iloc[:,4].values
print (y)
Label Encoding
from sklearn.preprocessing import LabelEncoder
Label = LabelEncoder()
x[:,3]= Label.fit_transform(x[:,3])
print (x)
Predictions
y_pred= reg.predict(x_test)
print (y_pred)
Print Result
result = pd.concat([pd.DataFrame(y_pred),pd.DataFrame(y_test)], axis =1)
print (result)
Print Y and Prediction in one data frame - Concat
y_pre= pd.DataFrame(y_pred, columns =['Prediction'])
y_te = pd.DataFrame(y_test,columns= ['Actual'])
x_te = pd.DataFrame(x_test,columns= ['CF','FR','New Y','R&D','Admin','Mark'])
result = pd.concat([x_te,y_te,y_pre], axis =1)
print (result)
Accuracy
r2_score(y_test, y_pred)
Regression Coefficient
reg.coef_
Regression Intercept
reg.intercept_
Tune the Model by removing State Column (P Value Greater than 0.05)
Print the Data Frame
pd.DataFrame(x)
Create the OLS Method by removing the variable which has maximum P
Value – Remove Column 4
x_opt=x[:,[0,1,2,3,5]]
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog =x_opt)
reg_ols = reg_ols.fit()
print (reg_ols.summary())
Create the OLS Method by removing the variable which has maximum P
Value – Remove Column last Column
x_opt=x[:,[0,1,2,3]]
import statsmodels.api as sm
reg_ols = sm.OLS (endog = y, exog =x_opt)
reg_ols = reg_ols.fit()
print (reg_ols.summary())
All the variables with P Value < 0.05 removed , create the model again with
new data set
Create Model
Prediction
yopt_pred= reg.predict(xopt_test)
print (yopt_pred)
Print Result
result = pd.concat([pd.DataFrame(yopt_pred),pd.DataFrame(y_test)], axis =1)
print (result)
Check Accuracy
r2_score(y_test, yopt_pred)
Accuracy
r2_score(y, yfull_pred)
Print Shape
print (xopt_train.shape)
Create Array
x_Predict = df_Predict.values
print (x_Predict)
Label Encoding
Label_Predict = LabelEncoder()
x_Predict[:,3]= Label_Predict.fit_transform(x_Predict[:,3])
print (x_Predict)
Print X Values
print (pd.DataFrame(x_Predict))