lab mannual of ML
lab mannual of ML
1-7
1. Implement data pre-processing
8-11
2. Deploy Simple Linear Regression
12-15
3. Simulate Multiple Linear Regression
16-19
4. Implement Decision Tree
20-23
5. Deploy Random forest classification
24-27
6. Simulate Naïve Bayes algorithm
36-39
9. Simulate Artificial Neural Network
40-43
10. Implement the Genetic Algorithm code
2
Practical No. 1
Aim: Implement data pre-processing
• Pre-processing refers to the transformations applied to our data before feeding it to the
algorithm. • Data Preprocessing is a technique that is used to convert the raw data into a clean
data set. In other words, whenever the data is gathered from different sources it is collected in
raw format which is not feasible for the analysis.
Need of Data Preprocessing
• For achieving better results from the applied model in Machine Learning projects the
format of the data must be in a proper manner. Some specified Machine Learning model needs
information in a specified format, for example, Random Forest algorithm does not support null
values, therefore to execute random forest algorithm null values have to be managed from the
original raw data set.
• Another aspect is that data set should be formatted in such a way that more than one
Machine Learning and Deep Learning algorithms are executed in one data set, and best out of
them is chosen.
Steps:
1. Getting the dataset
2. Importing libraries
3. Importing datasets
4. Finding Missing Data
5. Encoding Categorical Data
6. Splitting dataset into training and test set
7. Feature scaling
Python Code:
pandas as pd data_set=
pd.read_csv('Data.csv')
Output:
3
Figure 1: Dataset uploaded
x= data_set.iloc[:,:-1].values y=
data_set.iloc[:,3].values
SimpleImputer(missing_values=np.nan, strategy='mean')
4
from sklearn.preprocessing import LabelEncoder
label_encoder_x= LabelEncoder() x[:, 0]=
x = np.array(ct.fit_transform(x),
dtype=np.float) labelencoder_y=
LabelEncoder() y=
labelencoder_y.fit_transform(y)
6
Figure 5: Feature scaling
7
Practical No. 2
Aim: Deploy Simple Linear Regression
Regression searches for relationships among variables.
For example, you can observe several employees of some company and try to understand how
their salaries depend on the features, such as experience, level of education, role, city they
work in, and so on.
This is a regression problem where data related to each employee represent one observation.
The presumption is that the experience, education, role, and city are the independent features,
while the salary depends on them.
Similarly, you can try to establish a mathematical dependence of the prices of houses on their
areas, numbers of bedrooms, distances to the city center, and so on.
Generally, in regression analysis, you usually consider some phenomenon of interest and have
a number of observations. Each observation has two or more features. Following the
assumption that (at least) one of the features depends on the others, you try to establish a relation
among them.
In other words, you need to find a function that maps some features or variables to others
sufficiently well.
The dependent features are called the dependent variables, outputs, or responses.
The independent features are called the independent variables, inputs, or predictors.
Regression problems usually have one continuous and unbounded dependent variable. The
inputs, however, can be continuous, discrete, or even categorical data such as gender,
nationality, brand, and so on.
.
Simple Linear Regression 5 steps have to perform as per below:
1. Importing the dataset.
2. Splitting dataset into training set and testing set (2 dimensions of X and y per each
set). Normally, the testing set should be 5% to 30% of dataset.
3. Visualize the training set and testing set to double check (you can bypass this step if
you want).
4. Initializing the regression model and fitting it using training set (both X and y).
5. Let’s predict
data_set= pd.read_csv('Salary_Data.csv')
8
Figure 1: output of Dataset uploaded
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
9
4: Fitting the Simple Linear Regression to the Training Set from
This will create a prediction vector y_pred, and x_pred, which will contain predictions of test
dataset, and prediction of training set respectively
10
Figure 6: Prediction variables
Practical Outcomes:
• Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
• Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year, etc
11
Practical No- 3
• For MLR, the dependent or target variable(Y) must be the continuous/real, but the
predictor or independent variable may be of continuous or categorical form.
• Each feature variable must model the linear relationship with the dependent variable.
• MLR tries to fit a regression line through a multidimensional space of data-points.
“Multiple Linear Regression is one of the important regression algorithms which models the
linear relationship between a single dependent continuous variable and more than one
independent variable.”
Python Code:
pd.read_csv('50_Startups.csv')
x = nm.array(ct.fit_transform(x), dtype=nm.float)
labelencoder_y= LabelEncoder() y=
labelencoder_y.fit_transform(y)
13
Figure 3: Conversion of categorical data to numerical
Step 4: Splitting training data and testing data from sklearn.model_selection import
random_state=0)
14
Step 6: Prediction of Test set results
#Predicting the Test set result; y_pred=
regressor.predict(x_test)
We can also check the score for training dataset and test dataset. Below is the code for it:
print('Train Score: ', regressor.score(x_train, y_train)) print('Test
Score: ', regressor.score(x_test, y_test))
Practical Outcomes:
• A linear relationship should exist between the Target and predictor variables.
• The regression residuals must be normally distributed.
• MLR assumes little or no multicollinearity (correlation between the independent
variable) in data
Applications of MLR
15
Practical No -4
Python Code:
16
y = dataset.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set from
sklearn.model_selection import train_test_split
X_test = sc.transform(X_test)
17
alpha = 0.75, cmap = ListedColormap(('red',
'green'))) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(),
X2.max()) for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
Output
18
Practical Outcomes:
• Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
• The logic behind the decision tree can be easily understood because it shows a treelike
structure
19
Practical No - 5
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
Python Code:
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values y =
dataset.iloc[:, -1].values
20
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state
= 0)
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Training the Random Forest Classification model on the Training set from
sklearn.ensemble import RandomForestClassifier
= classifier.predict(X_test)
21
cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(),
X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in
enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j,
0], X_set[y_set == j, 1], c = ListedColormap(('red',
'green'))(i), label = j) plt.title('Random Forest Classification
(Training set)') plt.xlabel('Age') plt.ylabel('Estimated Salary')
plt.legend() plt.show()
Output:
22
Practical Outcomes:
• There should be some actual values in the feature variable of the dataset so that the
classifier can predict accurate results rather than a guessed result. • The predictions
from each tree must have very low correlations
• It takes less training time as compared to other algorithms.
• It predicts output with high accuracy, even for the large dataset it runs efficiently.
• It can also maintain accuracy when a large proportion of data is missing
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
23
Practical No. 6
Suppose we have a dataset of weather conditions and corresponding target variable "Play".
So using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below
steps:
Python Code:
# Naive Bayes
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values y =
dataset.iloc[:, -1].values
24
# Splitting the dataset into the Training set and Test set from
0.25, random_state = 0)
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
import confusion_matrix cm =
confusion_matrix(y_test, y_pred) print(cm)
25
X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in
enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Training set)') plt.xlabel('Age')
plt.ylabel('Estimated Salary') plt.legend() plt.show()
Output:
Practical Outcomes:
• The rain forest algorithm is a machine learning algorithm that is easy to use and
flexible. It uses ensemble learning, which enables organizations to solve regression
and classification problems.
26
• This is an ideal algorithm for developers because it solves the problem of overfitting
of datasets. It’s a very resourceful tool for making accurate predictions needed in
strategic decision making in organizations.
27
Practical No-7
The K-NN working can be explained on the basis of the below algorithm:
Python Code:
# K-Nearest Neighbors (K-NN)
numpy as np import
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values y =
dataset.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set from
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Training the K-NN model on the Training set from sklearn.neighbors import
= classifier.predict(X_test)
29
+ 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step
= 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j,
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend() plt.show()
Output:
30
Practical Outcomes:
• Since the KNN algorithm requires no training before making predictions, new data
can be added seamlessly, which will not impact the accuracy of the algorithm.
• KNN is very easy to implement. There are only two parameters required to implement
KNN—the value of K and the distance function (e.g. Euclidean, Manhattan, etc.
Applications:
• KNN is widely used in almost all industries, such as healthcare, financial services,
eCommerce, political campaigns, etc.
• Healthcare companies use the KNN algorithm to determine if a patient is susceptible
to certain diseases and conditions.
• Financial institutions predict credit card ratings or qualify loan applications and the
likelihood of default with the help of the KNN algorithm.
• Political analysts classify potential voters into separate classes based on whom they
are likely to vote for.
31
Practical No- 8
Python Code:
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values y =
dataset.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set from
sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state
= 0)
32
X_test = sc.transform(X_test)
y_train)
= classifier.predict(X_test)
33
# Visualising the Test set results from
Practical Outcomes:
• It works really well with a clear margin of separation
• It is effective in high dimensional spaces.
• It is effective in cases where the number of dimensions is greater than the number of
samples.
• It uses a subset of training points in the decision function (called support vectors), so it is
also memory efficient.
Applications:
• Face detection – SVMc classify parts of the image as a face and non-face and create a
square boundary around the face.
• Text and hypertext categorization – SVMs allow Text and hypertext categorization for
both inductive and transductive models. They use training data to classify documents
into different categories. It categorizes on the basis of the score generated and then
compares with the threshold value.
34
• Classification of images – Use of SVMs provides better search accuracy for image
classification. It provides better accuracy in comparison to the traditional query-based
searching techniques.
• Bioinformatics – It includes protein classification and cancer classification. We use
SVM for identifying the classification of genes, patients on the basis of genes and other
biological problems.
• Protein fold and remote homology detection – Apply SVM algorithms for protein
remote homology detection.
• Handwriting recognition – We use SVMs to recognize handwritten characters used
widely.
• Generalized predictive control(GPC) – Use SVM based GPC to control chaotic
dynamics with useful parameters.
35
Practical No. 9
Python Code:
36
class NeuralNet(object):
def init (self):
# Generate random numbers
random.seed(1)
# Train the neural network and adjust the weights each time.
37
# The neural network thinks.
= NeuralNet()
Output:
38
Practical Outcomes:
• Know the main provisions neuromathematics;
• Know the main types of neural networks;
• Know and apply the methods of training neural networks;
• Know the application of artificial neural networks;
• To be able to formalize the problem, to solve it by using a neural network
Applications of ANN
a. Classification of data:
Based on a set of data, our trained neural network predicts whether it is a dog or a cat? b.
Anomaly detection:
Given the details about transactions of a person, it can say that whether the transaction is
fraud or not.
c. Speech recognition:
We can train our neural network to recognize speech patterns. Example: Siri, Alexa, Google
assistant.
d. Audio generation:
Given the inputs as audio files, it can generate new music based on various factors like genre,
singer, and others.
e. Time series analysis:
Spell checking:
We can train a neural network that detects misspelled spellings and can also suggest a
similar meaning for words. Example: Grammarly g. Character recognition:
Machine translation:
We can develop a neural network that translates one language into another language. i.
Image processing:
We can train a neural network to process an image and extract pieces of information from it.
39
Practical No. 10
Aim: Implement the Genetic Algorithm code
One of the advanced algorithms in the field of computer science is Genetic Algorithm inspired
by the Human genetic process of passing genes from one generation to another. It is generally
used for optimization purpose and is heuristic in nature and can be used at various places. For
eg – solving np problem, game theory, code-
Here are quick steps for how the genetic algorithm works:
load_breast_cancer cancer=load_breast_cancer() df =
pd.DataFrame(cancer['data'],columns=cancer['feature_names'])
label=cancer["target"]
label, test_size=0.30,
random_state=101)
40
#training a logistics regression model logmodel =
LogisticRegression() logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test) print("Accuracy = "+
str(accuracy_score(y_test,predictions))) #defining various
steps required for the genetic algorithm def
initilization_of_population(size,n_feat): population = []
for i in range(size):
chromosome = np.ones(n_feat,dtype=np.bool)
chromosome[:int(0.3*n_feat)]=False
np.random.shuffle(chromosome)
population.append(chromosome) return
population
def fitness_score(population):
logmodel.fit(X_train.iloc[:,chromosome],y_train)
predictions = logmodel.predict(X_test.iloc[:,chromosome])
scores.append(accuracy_score(y_test,predictions)) scores,
population = np.array(scores), np.array(population) inds =
list(population[inds,:][::-1])
def selection(pop_after_fit,n_parents):
population_nextgen = [] for i in
range(n_parents):
population_nextgen.append(pop_after_fit[i])
return population_nextgen
41
def crossover(pop_after_sel):
population_nextgen=pop_after_sel for
i in range(len(pop_after_sel)):
child=pop_after_sel[i]
child[3:7]=pop_after_sel[(i+1)%len(pop_after_sel)][3:7]
population_nextgen.append(child) return
population_nextgen
def mutation(pop_after_cross,mutation_rate):
population_nextgen = [] for i in
range(0,len(pop_after_cross)):
#print(population_nextgen)
return population_nextgen
def generations(size,n_feat,n_parents,mutation_rate,n_gen,X_train,
print(scores[:2]) pop_after_sel =
selection(pop_after_fit,n_parents) pop_after_cross =
crossover(pop_after_sel) population_nextgen =
mutation(pop_after_cross,mutation_rate)
42
best_chromo.append(pop_after_fit[0])
best_score.append(scores[0])
return best_chromo,best_score
Output:
Practical Outcomes:
Applications:
• Robotics
The use of genetic algorithm in the field of robotics is quite big. Actually, genetic
algorithm is being used to create learning robots which will behave as a human and
will do tasks like cooking our meal, do our laundry etc
• Traffic and Shipment Routing (Travelling Salesman Problem)
This is a famous problem and has been efficiently adopted by many sales-based
companies as it is time saving and economical. This is also achieved using genetic
algorithm.
• Engineering Design
Engineering design has relied heavily on computer modeling and simulation to make
design cycle process fast and economical. Genetic algorithm has been used to
optimize and provide a robust solution
43