0% found this document useful (0 votes)
70 views

CP4252 Machine Learning lab manual

Uploaded by

muthu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

CP4252 Machine Learning lab manual

Uploaded by

muthu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BONAFIDECERTIFICATE

Name of the Student:

Register No.

This is to certify that this is a bonafide record of the workdone by the above student with
RollNo. of Semester B.E
Degree in in
the during Laboratory
the academic year 2022 – 2023.

Staff-In-Charge Head of the Dept.

Date:

Submitted for the Practical Examination held on

Internal Examiner External Examiner


CS3461- OPERATING SYSTEMS LABORATORY
INDEX

EX.NO. DATE NAME OF THE EXPERIMENT MARKS SIGN


PROGRAM 1:

IMPLEMENT NAIVE BAYES THEOREM TO CLASSIFY THE ENGLISH TEXT:

AIM:
To Implement naive bayes theorem to classify the English text

DESCRIPTION:

The challenge of text classification is to attach labels to bodies of text, e.g., tax document, medical
form, etc. based on the text itself. For example, think of your spam folder in your email. How does your
email provider know that a particular message is spam or “ham” (not spam)? We’ll take a look at one
natural language processing technique for text classification called Naive Bayes.

SOURCE CODE:

import pandas as pd

fromsklearn.model_selection import train_test_split

fromsklearn.feature_extraction.text import CountVectorizer

fromsklearn.naive_bayes import MultinomialNB

fromsklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score

msg = pd.read_csv('document.csv', names=['message', 'label'])

print("Total Instances of Dataset: ", msg.shape[0])

msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})

X = msg.message

y = msg.labelnum

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

count_v = CountVectorizer()

Xtrain_dm = count_v.fit_transform(Xtrain)

Xtest_dm = count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = MultinomialNB()

clf.fit(Xtrain_dm, ytrain)

pred = clf.predict(Xtest_dm)

print('Accuracy Metrics:')

print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ',

precision_score(ytest, pred))

print('Confusion Matrix: \n', confusion_matrix(ytest, pred))

document.csv:
I love this sandwich,pos This

is an amazingplace,pos

I feel very good about these beers,pos

This is my best work,pos

What an awesome view,pos

I do not like this restaurant,neg

I am tired of this stuff,neg

I can't deal with this,neg He

is my sworn enemy,neg My

boss is horrible,neg

This is an awesome place,pos

I do not like the taste of this juice,neg

I love to dance,pos

I am sick and tired of this place,neg

What a great holiday,pos

That is a bad locality to stay,neg

We will have good fun tomorrow,pos I


went to my enemy's house today,neg

OUTPUT:
Total Instances of Dataset: 18

Accuracy Metrics: Accuracy: 0.6

Recall: 0.6666666666666666

Precision: 0.6666666666666666

Confusion Matrix:

[[1 1]

[1 2]]
VIVA QUESTIONS & ANSWERS:

1. How Naive Bayes algorithm works?

Let’s understand it using an example. Below I have a training data set of weather and
corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify
whether players will play or not based on weather condition. Let’s follow the below steps to perform it.

Step 1: Convert the data set into a frequency table

Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class
with the highest posterior probability is the outcome of prediction.

Problem:: Players will play if weather is sunny. Is this statement is correct?

We can solve it using above discussed method of posterior probability.

P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64

Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based on various attributes.
This algorithm is mostly used in text classification and with problems having multiple classes.

2. Applications of Naive Bayes Algorithms:

 Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be
used for making predictions in real time.

 Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we
can predict the probability of multiple classes of target variable.

 Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text
classification (due to better result in multi class problems and independence rule) have higher success
rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-
mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer
sentiments).

 Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a
Recommendation System that uses machine learning and data mining techniques to filter unseen
information and predict whether a user would like a given resource or not.
PROGRAM 2:

TO IMPLEMENT CLASSIFICATION WITH K-NEAREST NEIGHBORS

AIM:

Write a program to implement classification with K-Nearest Neighbors

DESCRIPTION:

Write a program to implement classification with K-Nearest Neighbors. In this question, you

will use the scikit-learn’s KNN classifier to classify real vs. fake news headlines. The aim of

this question is for you to read the scikit-learn API and get comfortable with training

/validation splits. Use California Housing Datasets.


Distance Metrics

K-Nearest-Neighbour Algorithm:

L-Load the data


M-Initialize the value of k
N-For getting the predicted class, iterate from 1 to total number of training data points

1. Calculate the distance between test data and each row of training data. Here we will use Euclidean
distance as our distance metric since it’s the most popular method. The other metrics that can be
used are Chebyshev, cosine, etc.
2. Sort the calculated distances in ascending order based on distance values
3. Get top k rows from the sorted array.
4. Get the most frequent class of these rows i.e Get the labels of the selected K entries.
5. Return the predicted class. If regression, return the mean of the K labels If classification,
return the mode of the K labels.
Confusion matrix:

Note;

• Class 1 : Positive
• Class 2 : Negative

• Positive (P) : Observation is positive (for example: is an apple).


• Negative (N) : Observation is not positive (for example: is not an apple).
• True Positive (TP) : Observation is positive, and is predicted to be positive.
• False Negative (FN) : Observation is positive, but is predicted negative. (Also known as a "Type II
error.".
• True Negative (TN) : Observation is negative, and is predicted to be negative.
• False Positive (FP) : Observation is negative, but is predicted positive. (Also known as a "Type I error.").

ACCURACY = TP+TN
------------------
TP+TN+FP+FN

RECALL = TP
-------------------
TP+FN

PRECISION = TP
-------------------
TP+FP

F-MEASURE = 2*RECALL*PRECISION
--------------------------------
RECALL+PRECISION
EXAMPLE:

PREDICTED: PREDICTED:
NO YES
TN=50 FP=10 60

n=165 FN=5 TP=100 105

ACTUAL = NO 55 110

ACTUAL = YES

Accuracy: Overall, how often is the classifier correct?


(TP+TN)/total = (100+50)/165 = 0.91

Misclassification Rate: Overall, how often is it wrong?


(FP+FN)/total = (10+5)/165 = 0.09
equivalent to 1 minus Accuracy also known as "Error Rate“.

True Positive Rate: When it's actually yes, how often does it predict yes?
TP/actual yes = 100/105 = 0.95
also known as "Sensitivity" or "Recall".

False Positive Rate: When it's actually no, how often does it predict yes?
FP/actual no = 10/60 = 0.17

True Negative Rate: When it's actually no, how often does it predict no?
TN/actual no = 50/60 = 0.83
equivalent to 1 minus False Positive Rate also known as "Specificity“.

Precision: When it predicts yes, how often is it correct?


TP/predicted yes = 100/110 = 0.91.

Prevalence: How often does the yes condition actually occur in our sample?
actual yes/total = 105/165 = 0.64

Source Code:

from sklearn.neighbors import KNeighborsClassifier


from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

import pandas as pd

dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)

classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')

classifier.fit(X_train,y_train)

#predict the test resuts


y_pred=classifier.predict(X_test)

cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worng predicition",(1-accuracy_score(y_test,y_pred)))

Output :
Confusion matrix is as follows
[[13 0 0]
[ 0 15 1]
[ 0 0 9]]

Accuracy Metrics:

Precision Recall F1-score Support


Iris-setosa 1.00 1.00 1.00 13
Iris-versicolor 1.00 0.94 0.97 16
Iris-virginica 0.90 1.00 0.95 9
Avg / total 0.98 0.97 0.97 38

Correct Predicition: 0.9736842105263158


Wrong Predicition: 0.02631578947368418
PROGRAM 3:

IMPLEMENT A LINEAR REGRESSION WITH A REAL DATASET

AIM:

To Implement Linear Regression with a real datasets and Experiment with different features in

building a model. Tune the model's hyper parameters.

DESCRIPTION:

1. Define business object.


2. Make sense of the data from a high level.

 data types (number, text, object, etc.)


 continuous/discrete
 basic stats (min, max, std, median, etc.) using boxplot
 frequency via histogram
 scales and distributions of different features

3. Create the training and test sets using proper sampling methods, e.g., random vs. Stratified.
4. Correlation analysis (pair-wise and attribute combinations).
5. Data cleaning (missing data, outliers, data errors).
6. Data transformation via pipelines (categorical text to number using one hot encoding, feature
scaling via normalization/standardization, feature combinations).
7. Train and cross validate different models and select the most promising one (Linear Regression,
Decision Tree, and Random Forest were tried in this tutorial).
8. Fine tune the model using trying different combinations of hyper parameters.
9. Evaluate the model with best estimators in the test set.
10. Launch, monitor, and refresh the model and system.

SOURCE CODE:

# This Python 3 environment comes with many helpful analytic libraries installed.

# It is defined by the kaggle/python docker image: https://github.com/kaggle/ docker-python.

import numpy as np

import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

import seaborn as sns

# Input data files are available in the "../input/" directory.

# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input
Directory

import os

print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

['anscombe.csv', 'housing.csv']

# loading data

data_path = "../input/housing.csv"

housing = pd.read_csv(data_path)

# see the basic info

housing.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 20640 entries, 0 to 20639

Data columns (total 10 columns):

longitude 20640 non-null float64

latitude 20640 non-null float64

housing_median_age 20640 non-null float64

total_rooms 20640 non-null float64

total_bedrooms 20433 non-null float64

population 20640 non-null float64

households 20640 non-null float64

median_income 20640 non-null float64


median_house_value 20640 non-null float64

ocean_proximity 20640 non-null object

dtypes: float64(9), object(1)

memory usage: 1.6+ MB

Input(3): housing.head(10)

Input(4): housing.describe()

Input(5): housing.boxplot(['median_house_value'], figsize=(10, 10))

Input(6): housing.hist(bins=50, figsize=(15, 15))

Output(6):
Input(7): housing['ocean_proximity'].value_counts()

Output(7):

<1H OCEAN 9136

INLAND 6551

NEAR OCEAN 2658

NEAR BAY 2290

ISLAND 5

Name: ocean_proximity, dtype: int64

Input(8):

op_count = housing['ocean_proximity'].value_counts()

plt.figure(figsize=(10,5))

sns.barplot(op_count.index, op_count.values, alpha=0.7)

plt.title('Ocean Proximity Summary')

plt.ylabel('Number of Occurrences', fontsize=12)

plt.xlabel('Ocean Proximity', fontsize=12)

plt.show()

# housing['ocean_proximity'].value_counts().hist()

Output(8):
Input(9): housing['median_income'].hist()

Output(9): <matplotlib.axes._subplots.AxesSubplot at 0x7f264523cb00>

Input(10): housing['median_income'].hist()

Output(10): <matplotlib.axes._subplots.AxesSubplot at 0x7f264523cb00>


Input(11):housing.plot(kind='scatter', x='longitude', y='latitude', alpha=0.1)

Output(11): <matplotlib.axes._subplots.AxesSubplot at 0x7f2645224e10>

Input(12):

# Pearson's r, aka, standard correlation coefficient for every pair

corr_matrix = housing.corr()

# Check the how much each attribute correlates with the median house value

corr_matrix['median_house_value'].sort_values(ascending=False)

Output(12):

median_house_value 1.000000

median_income 0.687160

total_rooms 0.135097

housing_median_age 0.114110

households 0.064506

total_bedrooms 0.047689

population -0.026920

longitude -0.047432

latitude -0.142724

Name: median_house_value, dtype: float64.


PROGRAM 4:

THE SCIKIT-LEARNS’s KNN CLASSIFIER TO CLASSIFY REAL vs. FAKE NEWS


HEADLINES.

AIM:

The aim of this question is for you to read the scikit-learn API and get comfortable with training/validation

splits. Use California Housing Datasets.

DESCRIPTION:

Classification with Nearest Neighbors. In this question, you will use the scikit-learn’s KNN classifier to

classify real vs. fake news headlines. The aim of this question is for you to read the scikit-learn API and get

comfortable with training/validation splits. Use California Housing Datasets.

SOURCE CODE:

import csv

import random

import math

import operator

def loadDataset(filename, split, trainingSet=[] , testSet=[])


:
with open(filename, 'rb') as csvfile
:
lines = csv.reader(csvfile)

dataset = list(lines)

for x in range(len(dataset)-1):

for y in range(4):

dataset[x][y] = float(dataset[x][y])

if random.random() < split: trainingSet.append(dataset[x])

else:

testSet.append(dataset[x])
def euclideanDistance(instance1, instance2, length):

distance = 0

for x in range(length):

distance += pow((instance1[x] - instance2[x]), 2)

return math.sqrt(distance)

def getNeighbors(trainingSet, testInstance, k):

distances = []

length = len(testInstance)-1

for x in range(len(trainingSet)):

dist = euclideanDistance(testInstance, trainingSet[x], length) distances.append((trainingSet[x], dist))

distances.sort(key=operator.itemgetter(1))

neighbors = []

for x in range(k):

neighbors.append(distances[x][0])

return neighbors

def getResponse(neighbors):

classVotes = {}

for x in range(len(neighbors)):

response = neighbors[x][-1]

if response in classVotes:

classVotes[response] += 1

else:

classVotes[response] = 1
sortedVotes =

sorted(classVotes.iteritems(),

reverse=True)

return sortedVotes[0][0]

def getAccuracy(testSet,

predictions): correct = 0

for x in

range(len(testSet)):

key=operator.itemgetter(1);

if testSet[x][-1] == predictions[x]:

correct += 1

return (correct/float(len(testSet))) * 100.0

def main():

# prepare

Data

trainingSet=

[] testSet=[]

split = 0.67

loadDataset('knndat.data', split, trainingSet,

testSet) print('Train set: ' + repr(len(trainingSet)))

print('Test set: ' + repr(len(testSet)))

# generate

predictions

predictions=[]
k=3

for x in range(len(testSet)):

neighbors = getNeighbors(trainingSet, testSet[x],

k) result = getResponse(neighbors)

predictions.append(result)

print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-

) accuracy = getAccuracy(testSet, predictions)

print('Accuracy: ' + repr(accuracy) +

1]) '%') main()


OUTPUT:

Confusion matrix is as follows:

[[11 0 0]
[0 9 1]
[0 1 8]]

Accuracy metrics:

0 1.00 1.00 1.00 11

1 0.90 0.90 0.90 10

2 0.89 0.89 0.89 9

Avg/Total: 0.93 0.93 0.93 30.


PROGRAM 5:

ANALYZE DELTAS BETWEEN TRAINING SET AND VALIDATION SET RESULTS

AIM:

To implement the experiment with validation sets and test sets using the datasets.

DESCRIPTION:

To experiment with validation sets and test sets using the datasets. Split a training set into a smaller

training set and a validation set. Analyze deltas between training set and validation set results. Test the

trained model with a test set to determine whether your trained model is over fitting. Detect and fix a

common training problem.

SOURCE CODE:

import matplotlib.pyplot as plt

import numpy as np

from sklearn.model_selection import cross_validate, train_test_split

from sklearn.preprocessing import Polynomial Features, StandardScaler

from sklearn.pipeline import make_pipeline

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

np.random.seed(42)

# Generate data and plot

N = 300

x = np.linspace(0, 7*np.pi, N)

smooth = 1 + 0.5*np.sin(x)

y = smooth + 0.2*np.random.randn(N)

plt.plot(x, y)

plt.plot(x, smooth)
plt.xlabel("x")

plt.ylabel("y")

plt.ylim(0,2)

plt.show()

# Train-test split, intentionally use shuffle=False

X = x.reshape(-1,1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, shuffle=False)

# Create two models: Polynomial and linear regression

degree = 2

polyreg = make_pipeline(Polynomial Features(degree), LinearRegression(fit_intercept=False))

linreg = LinearRegression()

# Cross-validation

scoring = "neg_root_mean_squared_error"

polyscores = cross_validate(polyreg, X_train, y_train, scoring=scoring, return_estimator=True)

linscores = cross_validate(linreg, X_train, y_train, scoring=scoring, return_estimator=True)

# Which one is better? Linear and polynomial

print("Linear regression score:", linscores["test_score"].mean())

print("Polynomial regression score:", polyscores["test_score"].mean())

print("Difference:", linscores["test_score"].mean() - polyscores["test_score"].mean())

print("Coefficients of polynomial regression and linear regression:")

# Let's show the coefficient of the last fitted polynomial regression

# This starts from the constant term and in ascending order of powers

print(polyscores["estimator"][0].steps[1][1].coef_)

# And show the coefficient of the last-fitted linear regression


print(linscores["estimator"][0].intercept_, linscores["estimator"][-1].coef_)

# Plot and compare

plt.plot(x, y)

plt.plot(x, smooth)

plt.plot(x, polyscores["estimator"][0].predict(X))

plt.plot(x, linscores["estimator"][0].predict(X))

plt.ylim(0,2)

plt.xlabel("x")

plt.ylabel("y")

plt.show()

# Retrain the model and evaluate

import sklearn

linreg = sklearn.base.clone(linreg)

linreg.fit(X_train, y_train)

print("Test set RMSE:", mean_squared_error(y_test, linreg.predict(X_test), squared=False))

print("Mean validation RMSE:", -linscores["test_score"].mean()).


OUTPUT:
PROGRAM 6:

IMPLEMENT A BINARY CLASSIFICATION MODEL

AIM:

Implement a binary classification model. Binary question such as "Are houses in this neighborhood above a

certain price?

DESCRIPTION:

Implement a binary classification model. Binary question such as "Are houses in this neighborhood above a

certain price?"(use data from exercise 1). Modify the classification threshold and determine how that

modification influences the model. Experiment with different classification metrics to determine your

model's effectiveness.

SOURCE CODE:

train_df = pd.read_csv("https://download.mlcc.google.com/mledu datasets/california_housing_train.csv")

test_df = pd.read_csv("https://download.mlcc.google.com/mledu-datasets/california_housing_test.csv")

train_df = train_df.reindex(np.random.permutation(train_df.index))

# shuffle the training set

threshold = 265000 # This is the 75th percentile for median house values.

train_df_norm["median_house_value_is_high"] = ? Your code here

test_df_norm["median_house_value_is_high"] = ? Your code here

# Print out a few example cells from the beginning and

# middle of the training set, just to make sure that

# your code created only 0s and 1s in the newly created

# median_house_value_is_high column

train_df_norm["median_house_value_is_high"].head(8000)

inputs = {

# Features used to train the model on.


'median_income': tf.keras.Input(shape=(1,)),

'total_rooms': tf.keras.Input(shape=(1,))

# The following variables are the hyperparameters.

learning_rate = 0.001

epochs = 20

batch_size = 100

classification_threshold = 0.35

label_name = "median_house_value_is_high"

# Modify the following definition of METRICS to generate

# not only accuracy and precision, but also recall:

METRICS = [
tf.keras.metrics.BinaryAccuracy(name='accuracy',

threshold=classification_threshold),
tf.keras.metrics.Precision(thresholds=classification_threshold,

name='precision'
),
? # write code here
]

# Establish the model's topography.

my_model = create_model(inputs, learning_rate, METRICS)

# Train the model on the training set.

epochs, hist = train_model(my_model, train_df_norm, epochs,

label_name, batch_size)
# Plot metrics vs. Epochs
list_of_metrics_to_plot = ['accuracy', 'precision', 'recall']

plot_curve(epochs, hist, list_of_metrics_to_plot)


OUTPUT:
PROGRAM 7:

IMPLEMENT THE FINITE WORDS CLASSIFICATION SYSTEM USING


BACKPROPAGATIONALGORITHM

AIM:

To implement the finite words classification system using Back-propagation algorithm.

DESCRIPTION:

What is back propagation?

We can define the back propagation algorithm as an algorithm that trains some given feed forward Neural

Network for a given input pattern where the classifications are known to us. At the point when every

passage of the example set is exhibited to the network, the network looks at its yield reaction to the

example input pattern. After that, the comparison done between output response and expected output with

the error value is measured. Later, we adjust the connection weight based upon the error value measured.

It was first introduced in the 1960s and 30 years later it was popularized by David Rumelhart,

Geoffrey Hinton, and Ronald Williams in the famous 1986 paper. In this paper, they spoke about the

various neural networks. Today, back propagation is doing good. Neural network training happens through

back propagation. By this approach, we fine-tune the weights of a neural net based on the error rate

obtained in the previous run. The right manner of applying this technique reduces error rates and makes

the model more reliable. Back propagation is used to train the neural network of the chain rule method. In

simple terms, after each feed-forward passes through a network, this algorithm does the backward pass to

adjust the model’s parameters based on weights and biases. A typical supervised learning algorithm

attempts to find a function that maps input data to the right output. Back propagation works with a multi-

layered neural network and learns internal representations of input to output mapping.

The Back propagation algorithm is a supervised learning method for multi layer feed forward

networks from the field of Artificial Neural Networks.

Feed-forward neural networks are inspired by the information processing of one or more neural
cells, called a neuron. A neuron accepts input signals via its dendrites, which pass the electrical signal

down to the cell body. The axon carries the signal out to synapses, which are the connections of a cell’s

axon to other cell’s dendrites.

The principle of the back propagation approach is to model a given function by modifying internal

weightings of input signals to produce an expected output signal. The system is trained using a supervised

learning method, where the error between the system’s output and a known expected output is presented to

the system and used to modify its internal state.

Technically, the back propagation algorithm is a method for training the weights in a multilayer

feed-forward neural network. As such, it requires a network structure to be defined of one or more layers

where one layer is fully connected to the next layer. A standard network structure is one input layer, one

hidden layer, and one output layer.Back propagation can be used for both classification

and regression problems.

In classification problems, best results are achieved when the network has one neuron in the output

layer for each class value. For example, a 2-class or binary classification problem with the class values of

A and B. These expected outputs would have to be transformed into binary vectors with one column for

each class value. Such as [1, 0] and [0, 1] for A and B respectively. This is called a one hot encoding.

How does back propagation work?

Let us take a look at how back propagation works. It has four layers: input layer, hidden layer,

hidden layer II and final output layer. So, the main three layers are:

1. Input layer

2. Hidden layer

3. Output layer.

Each layer has its own way of working and its own way to take action such that we are able to get the

desired results and correlate these scenarios to our conditions. Let us discuss other details needed to help

summarizing this algorithm.


SOURCE CODE:

import pandas as pd

fromsklearn.model_selection import train_test_split

fromsklearn.feature_extraction.text import

CountVectorizer from sklearn.neural_network

import MLPClassifier

fromsklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score

msg = pd.read_csv('document.csv',

names=['message', 'label']) print("Total Instances of

Dataset: ", msg.shape[0]) msg['labelnum'] =

msg.label.map({'pos': 1, 'neg': 0})

X=

msg.me

ssage y

msg.lab

Elnum

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)

count_v = CountVectorizer()

Xtrain_dm =

count_v.fit_transform(Xtrain)

Xtest_dm =

count_v.transform(Xtest)

df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names())
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,hidden_layer_sizes=(5, 2),

random_state=1)

clf.fit(Xtrain_dm, ytrain) pred = clf.predict(Xtest_dm)

print('Accuracy Metrics:')

print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred))

print('Precision: ', precision_score(ytest, pred))

print('Confusion Matrix: \n', confusion_matrix(ytest, pred)),

document.csv:

I love this sandwich

,pos This is

an amazingplace,

pos KG CO

I feel very good about these

beers,pos This is my best

work,pos

What an awesome view,pos

I do not like this

restaurant,neg I am

tired of this stuff,neg

I can't deal with this,neg

He is my sworn enemy,ne

My boss is horrible,neg

This is an awesome place,pos

I do not like the taste of this

juice,neg I love to dance,pos


I am sick and tired

of this place,neg

What a great holiday,pos

That is a bad locality to stay

,neg We will have good fun

tomorrow,pos I went to my

enemy's house today,neg

OUTPUT:

Total Instances of Dataset: 18

Accuracy Metrics:

Accuracy: 0.8

Recall: 1.0

Precisio n: 0.75

Confusi on Matrix:

[[1 1]

[0 3]}
VIVA QUESTIONS:

What is machine learning?

Define supervised learning?

Define unsupervised learning?

Define semi supervised learning?

Define reinforcement learning?

What do you mean by hypotheses is classification?

What is clustering?

Define precision, accuracy and recall 10.Define entropy?

Define regression?

How Knn is different from k-means clustering?

What is concept learning?

Define specific boundary and general boundary?

Define target function 16.Define decision tree?

What is ANN?

Explain gradient descent approximation?

State Bayes theorem?

Define Bayesian belief networks?

Differentiate hard and soft clustering?

Define variance 23. What is inductive machine learning?

Why K nearest neighbour algorithm is lazy learning algorithm?

Why naïve Bayes is naïve?

Mention classification algorithms?

Define pruning 28.Differentiate Clustering and classification?

Mention clustering algorithms and Define Bias?

You might also like