0% found this document useful (0 votes)
113 views

Machine Learning - Manual

The document provides an overview of the Python programming language, describing how it was created, commonly used data types like integers, floats, strings, lists and tuples, and how Python can be applied to tasks like web development, software development, mathematics, and system scripting. It also demonstrates several Python programs for performing basic math operations, manipulating different data types, and introducing commonly used Python libraries for machine learning like NumPy, Pandas, Matplotlib, and Scikit-learn.

Uploaded by

Rahul Punna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Machine Learning - Manual

The document provides an overview of the Python programming language, describing how it was created, commonly used data types like integers, floats, strings, lists and tuples, and how Python can be applied to tasks like web development, software development, mathematics, and system scripting. It also demonstrates several Python programs for performing basic math operations, manipulating different data types, and introducing commonly used Python libraries for machine learning like NumPy, Pandas, Matplotlib, and Scikit-learn.

Uploaded by

Rahul Punna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Introduction:

Python is a popular programming language. It was created by Guido van


Rossum, and released in 1991.

It is used for:

 web development (server-side),


 software development,
 mathematics,
 system scripting.

Application of Python:

 Python can be used on a server to create web applications.


 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify
files.
 Python can be used to handle big data and perform complex
mathematics.
 Python can be used for rapid prototyping, or for production-ready
software development.

List of some different variable types:

x = 123 # integer
x = 3.14 # float
x = "hello" # string
x = [0,1,2] # list
x = (0,1,2) # tuple

1. Python program to add two numbers in “ int data type “

Program code:
# Python3 program to add two numbers in int data types

num1 = 15
num2 = 12

# Adding two nos


sum = num1 + num2
# printing values
print("Sum of {0} and {1} is {2}" .format(num1, num2, sum))

output:
Sum of 15 and 12 is 27

Assignment problem:
Finding an area of rectangle using “float data type”

Program code:
length = 1.10
width = 2.20
area = length * width
print ("The area is: " , area)

Output:
This will print out: The area is: 2.42
Strings, lists, tuples in Python

Accessing Values in Lists:


To access values in lists, use the square brackets for slicing along with the
index or indices to obtain value available at that index

Program code:
list1 = ['physics', 'chemistry', 1997, 2000]
list2 = [1, 2, 3, 4, 5, 6, 7 ]
print ("list1[0]: ", list1[0])
print ("list2[1:5]: ", list2[1:5])
Output:
list1[0]: physics
list2[1:5]: [2, 3, 4, 5]

Basic List Operations:


Lists respond to the + and * operators much like strings; they mean
concatenation and repetition here too, except that the result is a new list,
not a string.
In fact, lists respond to all of the general sequence operations we used on
strings in the prior chapter.

Tuples:

A tuple is a sequence of immutable Python objects. Tuples are sequences,


just like lists. The main difference between the tuples and the lists is that
the tuples cannot be changed unlike lists. Tuples use parentheses, whereas
lists use square brackets. Creating a tuple is as simple as putting different
comma-separated values. Optionally, you can put these comma-separated
values between parentheses also.

Program code:
tup1 = ('physics', 'chemistry', 1997, 2000)
tup2 = (1, 2, 3, 4, 5, 6, 7 )

print ("tup1[0]: ", tup1[0])


print ("tup2[1:5]: ", tup2[1:5])

Output:
tup1[0]: physics
tup2[1:5]: (2, 3, 4, 5)

Strings:

Program code:
var1 = 'Hello World!'
var2 = "Python Programming"

print ("var1[0]: ", var1[0:5])


print ("var2[1:5]: ", var2[1:5])

Output:
var1[0]: Hello
var2[1:5]: ytho
Importing libraries for Machine learning applications in Python

Best Python libraries for Machine Learning:

Machine Learning, as the name suggests, is the science of programming a


computer by which they are able to learn from different kinds of data. A more
general definition given by Arthur Samuel is – “Machine Learning is the field
of study that gives computers the ability to learn without being explicitly
programmed.” They are typically used to solve various types of life problems.
In the older days, people used to perform Machine Learning tasks by manually
coding all the algorithms and mathematical and statistical formula. This made
the process time consuming, tedious and inefficient. But in the modern days, it
is become very much easy and efficient compared to the olden days by various
python libraries, frameworks, and modules. Today, Python is one of the most
popular programming languages for this task and it has replaced many
languages in the industry, one of the reason is its vast collection of libraries.
Python libraries that used in Machine Learning are:
 Math
 Numpy
 Pandas
 Matplotlib
 TensorFlow
 Scipy
 Keras
 PyTorch
 Scikit-learn
 Theano

1. Python math function


sqrt() function is an inbuilt function in Python programming language that returns the square root of
any number.

Syntax:

math.sqrt(x)

Parameter:
x is any number such that x>=0
Returns:
It returns the square root of the number
passed in the parameter.

Program:
# Python3 program to demonstrate the
# sqrt() method

# import the math module


import math

# print the square root of 0


print(math.sqrt(0))

# print the square root of 4


print(math.sqrt(4))

# print the square root of 3.5


print(math.sqrt(3.5))

Output:

0.0
2.0
1.8708286933869707

2. Python numpy function


NumPy is a very popular python library for large multi-dimensional array and
matrix processing, with the help of a large collection of high-level mathematical
functions. It is very useful for fundamental scientific computations in Machine
Learning. It is particularly useful for linear algebra, Fourier transform, and
random number capabilities. High-end libraries like TensorFlow use NumPy
internally for manipulation of Tensors.

Program:
# Python program using NumPy
# for some basic mathematical
# operations

import numpy as np

# Creating two arrays of rank 2


x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
# Creating two arrays of rank 1
v = np.array([9, 10])
w = np.array([11, 12])

# Inner product of vectors


print(np.dot(v, w), "\n")

# Matrix and Vector product


print(np.dot(x, v), "\n")

# Matrix and matrix product


print(np.dot(x, y))

Output:
219

[29 67]

[[19 22]
[43 50]]

3. Python pandas function


Pandas is a popular Python library for data analysis. It is not directly related to
Machine Learning. As we know that the dataset must be prepared before
training. In this case, Pandas comes handy as it was developed specifically for
data extraction and preparation. It provides high-level data structures and wide
variety tools for data analysis. It provides many inbuilt methods for groping,
combining and filtering data.

Program:
# Python program using Pandas for
# arranging a given set of data
# into a table

# importing pandas as pd
import pandas as pd

data = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],


"capital": ["Brasilia", "Moscow", "New Dehli", "Beijing",
"Pretoria"],
"area": [8.516, 17.10, 3.286, 9.597, 1.221],
"population": [200.4, 143.5, 1252, 1357, 52.98] }

data_table = pd.DataFrame(data)
print(data_table)
Output:

4. Python Matplotlib function


Matpoltlib is a very popular Python library for data visualization. Like Pandas, it is not directly
related to Machine Learning. It particularly comes in handy when a programmer wants to visualize
the patterns in the data. It is a 2D plotting library used for creating 2D graphs and plots. A module
named pyplot makes it easy for programmers for plotting as it provides features to control line styles,
font properties, formatting axes, etc. It provides various kinds of graphs and plots for data
visualization, viz., histogram, error charts, bar chats, etc,

Program:
# Python program using Matplotib
# for forming a linear plot

# importing the necessary packages and modules


import matplotlib.pyplot as plt
import numpy as np

# Prepare the data


x = np.linspace(0, 10, 100)

# Plot the data


plt.plot(x, x, label ='linear')

# Add a legend
plt.legend()

# Show the plot


plt.show()

Output:
Data pre-processing in Python:

 Pre-processing refers to the transformations applied to our data before


feeding it to the algorithm.
 Data Preprocessing is a technique that is used to convert the raw data
into a clean data set. In other words, whenever the data is gathered from
different sources it is collected in raw format which is not feasible for the
analysis.

Program:
# Data Preprocessing

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

output:

Data Frame:
Interpolate and extrapolate missing data and Categorical
data
Data in real world are rarely clean and homogeneous. Typically, they tend to be incomplete, noisy,
and inconsistent and it is an important task of a Data scientist to prepossess the data by filling missing
values. It is important to be handled as they could lead to wrong prediction or classification for any
given model being used.

Missing values could be: NaN, empty string, ?,-1,-99,-999

Program:
# Data Preprocessing

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data


from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

Output:
?????
Linear Regression

Aim: To perform Linear regression on given dataset


Theory:

Linear Regression is a machine learning algorithm based on supervised


learning. It performs a regression task. Regression models a target prediction
value based on independent variables. It is mostly used for finding out the
relationship between variables and forecasting. Different regression models
differ based on – the kind of relationship between dependent and independent
variables, they are considering and the number of independent variables being
used.

Linear regression performs the task to predict a dependent variable


value (y) based on a given independent variable (x). So, this regression
technique finds out a linear relationship between X (input) and Y(output).

Y = a+bX
X: input training data (univariate – one input variable(parameter))
Y: labels to data (supervised learning)
a: intercept
b: coefficient of X
For example: X is years of experience based on which Y (Salary) may be
estimated by a linear relationship.

File1: simple_linear_regression.py

# Simple Linear Regression

# Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3,
random_state = 0)

# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""

# Fitting Simple Linear Regression to the Training set


from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results


y_pred = regressor.predict(X_test)

# Visualising the Training set results


plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results


plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
File2: Salary_Data.csv

Results:

VIVA QUESTIONS:

1. Can we use Simple Linear Regression to predict the winner of a football


game?

2. What is the class used in Python to create a simple linear regressor ?

3. What is the formula used in a simple linear regression model ?


4. Parameter B in Y= Ax+B indicates in simple linear regression model?

5. What is the perfect ratio to divide training and testing data in simple linear
regression model?

6. What is the difference between test data , train data and predicting data in
Linear regression concept?

7. What is regression?

8. What is the use of Simple linear regression algorithm?

9. What is the use of Pandas library and when we will use Pandas library?

10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Multiple linear regression

Aim: To perform polynomial regression on given data set

Theory:
The goal in any data analysis is to extract from raw information the accurate
estimation. One of the most important and common question concerning if
there is statistical relationship between a response variable (Y) and
explanatory variables (Xi).

Multiple linear Regression is one of the simplest Machine Learning Algorithm.


It comes under the class of Supervised Learning Algorithms i.e, when we are
provided with training dataset.

Example

Example: A data scientist who wants to buy a car. He uses Multi variate
Regression model to estimate the price of the car. He estimates price as a
function of engine size, horse power, peakRPM, length, width and height.

Files:

50_Startups.csv

multiple_linear_regression.py

# Multiple Linear Regression

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, 4].values

# Encoding categorical data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder = LabelEncoder()

X[:, 3] = labelencoder.fit_transform(X[:, 3])

onehotencoder = OneHotEncoder(categorical_features = [3])

X = onehotencoder.fit_transform(X).toarray()

# Avoiding the Dummy Variable Trap

X = X[:, 1:]

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,


random_state = 0)

# Feature Scaling

"""from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)

sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""

# Fitting Multiple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

# Predicting the Test set results

y_pred = regressor.predict(X_test)

Results:

Prediction values of y: (for test data of x)


1 103015
2 132582
3 132448
4 71976.1
5 178537
6 116161
7 67851.7
8 98791.7
9 113969
10 167921

VIVA QUESTIONS:

1. Can we use Multiple Linear Regression to predict the winner of a football


game?

2. What is the class used in Python to create a Multiple linear regressor ?

3. What is the formula used in a Multiple linear regression model ?


5. What is the perfect ratio to divide training and testing data in Multiple linear
regression model?

6. What is the difference between test data , train data and predicting data in
Linear regression concept?

7. What is regression?

8. What is the use of Multiple linear regression algorithm?

9. What is the use of numpy library and when we will use numpy library?

10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Polynomial Regression
Aim: To perform polynomial regression on given data set

Theory: polynomial regression is a form ofregression analysis in which the


relationship between the independent variable x and the dependent variable y is
modelled as an nth degree polynomial in x. Polynomial regression fits a
nonlinear relationship between the value of x and the
corresponding conditional mean of y, denoted E(y |x), and has been used to
describe nonlinear phenomena.

Polynomial vs linear regression


Files:

Position_Salaries.csv

polynomial_regression.py

# Polynomial Regression

# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Position_Salaries.csv')

X = dataset.iloc[:, 1:2].values

X=pd.DataFrame(X)

y = dataset.iloc[:, 2].values

y=pd.DataFrame(y)

# Splitting the dataset into the Training set and Test set

"""from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"""

# Feature Scaling

"""from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)"""

# Fitting Linear Regression to the dataset

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(X, y)

# Fitting Polynomial Regression to the dataset

from sklearn.preprocessing import PolynomialFeatures


poly_reg = PolynomialFeatures(degree = 4)

X_poly = poly_reg.fit_transform(X)

poly_reg.fit(X_poly, y)

lin_reg_2 = LinearRegression()

lin_reg_2.fit(X_poly, y)

# Visualising the Linear Regression results

plt.scatter(X, y, color = 'red')

plt.plot(X, lin_reg.predict(X), color = 'blue')

plt.title('Truth or Bluff (Linear Regression)')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

# Visualising the Polynomial Regression results

plt.scatter(X, y, color = 'red')

plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')

plt.title('Truth or Bluff (Polynomial Regression)')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

# Visualising the Polynomial Regression results (for higher resolution and smoother curve)

X_grid = np.arange(min(X), max(X), 0.1)

X_grid = X_grid.reshape((len(X_grid), 1))

plt.scatter(X, y, color = 'red')


plt.plot(X_grid, lin_reg_2.predict(poly_reg.fit_transform(X_grid)), color = 'blue')

plt.title('Truth or Bluff (Polynomial Regression)')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

# Predicting a new result with Linear Regression

lin_reg.predict(6.5)

# Predicting a new result with Polynomial Regression

lin_reg_2.predict(poly_reg.fit_transform(6.5))

Results:
VIVA QUESTIONS:

1. Can we use Polynomial Linear Regression to predict the winner of a dice


game?

2. What is the class used in Python to create a Polynomial linear regressor ?

3. What is the formula used in a Polynomial linear regression model ?

4. Parameter B in Y= Ax2+B indicates what in Polynomial linear regression


model?

5. What is the perfect ratio to divide training and testing data in Polynomial
linear regression model?

6. What is the difference between supervised and unsupervised machine


learning

7. What is regression?

8. What is the use of Polynomial linear regression algorithm?

9. What is the use of Pandas library and when we will use Pandas library?

10. What is the use of Matplotlib Library and when we need it in linear
regression model?
Naïve bayes algorithm (classification)
Aim: To classify given data set by Naïve bayes algorithm

Theory:

It is a classification technique based on Bayes’ Theorem with an


assumption of independence among predictors. In simple terms, a Naive Bayes
classifier assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature. For example, a fruit may be
considered to be an apple if it is red, round, and about 3 inches in diameter.
Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability
that this fruit is an apple and that is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large
data sets. Along with simplicity, Naive Bayes is known to outperform even
highly sophisticated classification methods.

Files:

Social_Network_Ads.csv

naive_bayes.py

# Naive Bayes

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values

y = dataset.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set

#from sklearn.cross_validation import train_test_split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,


random_state = 0)

# Feature Scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

# Fitting Naive Bayes to the Training set

from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()

classifier.fit(X_train, y_train)

# Predicting the Test set results

y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

# Visualising the Training set results

from matplotlib.colors import ListedColormap

X_set, y_set = X_train, y_train

X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:,


0].max() + 1, step = 0.01),

np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1,


step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),


X2.ravel()]).T).reshape(X1.shape),

alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],

c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Naive Bayes (Training set)')

plt.xlabel('Age')

plt.ylabel('Estimated Salary')

plt.legend()

plt.show()
# Visualising the Test set results

from matplotlib.colors import ListedColormap

X_set, y_set = X_test, y_test

X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:,


0].max() + 1, step = 0.01),

np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1,


step = 0.01))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),


X2.ravel()]).T).reshape(X1.shape),

alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],

c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Naive Bayes (Test set)')

plt.xlabel('Age')

plt.ylabel('Estimated Salary')

plt.legend()

plt.show()

Results:
Viva questions:

1. Can we use Naïve bayes algorithm to predict the winner of a dice game?

2. What is the class used in Python to create a Naïve bayes algorithm?

3. What is the formula used in a Naïve bayes algorithm model ?

4. What is the perfect ratio to divide training and testing data in Naïve bayes
algorithm model?

6. Is classification supervised or unsupervised machine learning

7. What is Naïve bayes algorithm

8. What is the use of Pandas library and when we will use Pandas library?
9. What is the use of numpy Library and when we need it in Naïve bayes
algorithm model?

10. what is supervised learning?

You might also like