Machine Learning - Manual
Machine Learning - Manual
It is used for:
Application of Python:
x = 123 # integer
x = 3.14 # float
x = "hello" # string
x = [0,1,2] # list
x = (0,1,2) # tuple
Program code:
# Python3 program to add two numbers in int data types
num1 = 15
num2 = 12
output:
Sum of 15 and 12 is 27
Assignment problem:
Finding an area of rectangle using “float data type”
Program code:
length = 1.10
width = 2.20
area = length * width
print ("The area is: " , area)
Output:
This will print out: The area is: 2.42
Strings, lists, tuples in Python
Program code:
list1 = ['physics', 'chemistry', 1997, 2000]
list2 = [1, 2, 3, 4, 5, 6, 7 ]
print ("list1[0]: ", list1[0])
print ("list2[1:5]: ", list2[1:5])
Output:
list1[0]: physics
list2[1:5]: [2, 3, 4, 5]
Tuples:
Program code:
tup1 = ('physics', 'chemistry', 1997, 2000)
tup2 = (1, 2, 3, 4, 5, 6, 7 )
Output:
tup1[0]: physics
tup2[1:5]: (2, 3, 4, 5)
Strings:
Program code:
var1 = 'Hello World!'
var2 = "Python Programming"
Output:
var1[0]: Hello
var2[1:5]: ytho
Importing libraries for Machine learning applications in Python
Syntax:
math.sqrt(x)
Parameter:
x is any number such that x>=0
Returns:
It returns the square root of the number
passed in the parameter.
Program:
# Python3 program to demonstrate the
# sqrt() method
Output:
0.0
2.0
1.8708286933869707
Program:
# Python program using NumPy
# for some basic mathematical
# operations
import numpy as np
Output:
219
[29 67]
[[19 22]
[43 50]]
Program:
# Python program using Pandas for
# arranging a given set of data
# into a table
# importing pandas as pd
import pandas as pd
data_table = pd.DataFrame(data)
print(data_table)
Output:
Program:
# Python program using Matplotib
# for forming a linear plot
# Add a legend
plt.legend()
Output:
Data pre-processing in Python:
Program:
# Data Preprocessing
output:
Data Frame:
Interpolate and extrapolate missing data and Categorical
data
Data in real world are rarely clean and homogeneous. Typically, they tend to be incomplete, noisy,
and inconsistent and it is an important task of a Data scientist to prepossess the data by filling missing
values. It is important to be handled as they could lead to wrong prediction or classification for any
given model being used.
Program:
# Data Preprocessing
Output:
?????
Linear Regression
Y = a+bX
X: input training data (univariate – one input variable(parameter))
Y: labels to data (supervised learning)
a: intercept
b: coefficient of X
For example: X is years of experience based on which Y (Salary) may be
estimated by a linear relationship.
File1: simple_linear_regression.py
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3,
random_state = 0)
# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
Results:
VIVA QUESTIONS:
5. What is the perfect ratio to divide training and testing data in simple linear
regression model?
6. What is the difference between test data , train data and predicting data in
Linear regression concept?
7. What is regression?
9. What is the use of Pandas library and when we will use Pandas library?
10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Multiple linear regression
Theory:
The goal in any data analysis is to extract from raw information the accurate
estimation. One of the most important and common question concerning if
there is statistical relationship between a response variable (Y) and
explanatory variables (Xi).
Example
Example: A data scientist who wants to buy a car. He uses Multi variate
Regression model to estimate the price of the car. He estimates price as a
function of engine size, horse power, peakRPM, length, width and height.
Files:
50_Startups.csv
multiple_linear_regression.py
import numpy as np
import pandas as pd
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
labelencoder = LabelEncoder()
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
# Feature Scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
Results:
VIVA QUESTIONS:
6. What is the difference between test data , train data and predicting data in
Linear regression concept?
7. What is regression?
9. What is the use of numpy library and when we will use numpy library?
10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Polynomial Regression
Aim: To perform polynomial regression on given data set
Position_Salaries.csv
polynomial_regression.py
# Polynomial Regression
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
X=pd.DataFrame(X)
y = dataset.iloc[:, 2].values
y=pd.DataFrame(y)
# Splitting the dataset into the Training set and Test set
# Feature Scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)"""
lin_reg = LinearRegression()
lin_reg.fit(X, y)
X_poly = poly_reg.fit_transform(X)
poly_reg.fit(X_poly, y)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
# Visualising the Polynomial Regression results (for higher resolution and smoother curve)
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
lin_reg.predict(6.5)
lin_reg_2.predict(poly_reg.fit_transform(6.5))
Results:
VIVA QUESTIONS:
5. What is the perfect ratio to divide training and testing data in Polynomial
linear regression model?
7. What is regression?
9. What is the use of Pandas library and when we will use Pandas library?
10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Naïve bayes algorithm (classification)
Aim: To classify given data set by Naïve bayes algorithm
Theory:
Naive Bayes model is easy to build and particularly useful for very large
data sets. Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods.
Files:
Social_Network_Ads.csv
naive_bayes.py
# Naive Bayes
import numpy as np
import pandas as pd
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
classifier = GaussianNB()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
# Visualising the Test set results
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
Results:
Viva questions:
1. Can we use Naïve bayes algorithm to predict the winner of a dice game?
4. What is the perfect ratio to divide training and testing data in Naïve bayes
algorithm model?
8. What is the use of Pandas library and when we will use Pandas library?
9. What is the use of numpy Library and when we need it in Naïve bayes
algorithm model?