0% found this document useful (0 votes)

11 views

B22EE010 Report

1) The document discusses implementing and testing a perceptron model on synthetic data to perform binary classification. Key steps include data generation, normalization, training the perceptron, and measuring misclassification over iterations. 2) It also covers implementing PCA for dimensionality reduction on a face recognition dataset. Steps include loading data, visualizing samples, fitting and transforming data with PCA, and analyzing explained variance to determine optimal number of components. 3) The perceptron is tested on different sized test sets, achieving varying accuracy. PCA is applied to reduce the face data dimensionality before further processing.

Uploaded by

Anuj Vijay Patil (B22EE010)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

B22EE010 Report

Uploaded by

Anuj Vijay Patil (B22EE010)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

CSL2050 - Pattern Recognition and Machine Learning

Assignment 3: Lab5 & lab6

Colab link:report
Question -1
Perceptron Training and Testing on Synthetic Data:-

• Dataset Generation:-
1. A synthetic dataset with sample size 2000 samples has been created having 4
features with labels [w0 w1 w2 w3].

2. 5 integers are taken from range of (-100,100) randomly using np.random()

3. 4 more variables are selected randomly from the range (-100,100) namely
[x1 x2 x3 x4] the pair of the these four variables generated are linearly
combined as the given formula f(x) = w0 + w1*x1 + w2*x2 + w3*x3

4. As per the above mentioned formula the output y is calculated and stored in
for of list in the variable “predictions”, here np.dot(X,w[1:])+w[0] to
compute the output all at once

5. The predictions list and w list is combined together to form the Syn_dataset

• Dataset Visualization:-
1. The Syn_dataset is visualized using scatter plot to observe the distribution of
the output(0 or 1) with respect to values of the features [w0 w1 w2 w3].
2. Python library matplotlib.pyplot has been used to plot a scatter plot , where
label 1 is colored green and represents 1 whereas label 0 is colored red and
represents 0
3. Plot of the Dataset is show as below:-
We can observe the most of the red
dots are populated in the region
from -100 to 0 after which the count
of green dot increases. It should also
be noted that only 200 samples
were plotted to avoid over-
crowding of the data .

• Data
Normalization:-
1. We normalize the data to bring from range of (-100,100) to range of (0,1)

2. Minima and maxima are calculated using datset.min() datset.max()

Normalization is done using the formula: -
normalized_data =(data-min)/(max-min)
After which each column of the normalized data is referenced to the original
data

3. After the normalization of the data, we make box plots to observe the ouliers
which are not present since the data is being synthetically made and also the
data is being normalized.

• Perceptron Implementation :-
(1) Class Initialization:-
▪ The perceptron class is initialized with two default values namely
(learning_rate) and (n_iters).
▪ Few more required variables are also initialized namely lr, n_iters,
weights, bias, and errors

(2) Fitting and training :-

1. Number of samples ,number of features and the weights are initialized
in the beginning and bias is set to zero along with weights.
2. Now we loop through each iteration , for each iteration we set the error
value equal to zero and a new loop for all the features of the data set is
being run.
3. In the second for loop , the linear output is being calculated by
multiplying all the weights with the input X values and bias is added at
the end.
4. Now If the linear output is greater than 1 we term it as 1 else we term it
as zero
5. A new update variable is being created by multiplying the learning rate
with the output of (y_input- y_predicted) as per the given formula
update = self.lr * (y[i] - y_predicted).
6. Then we iterate through all the weights adding the update variable
value to each of them respectively
7. Lastly the error list is being updated at the end of each iteration of the
outermost loop which will be used to plot the misclassification

(3) Prediction Method:

1. Linear output is calculated as per the formula f(x) = w0 + w1*x1 +
w2*x2 + w3*x3 + w4*x4
2. We then run a loop for all the linear outputs and see if linear output is
greater than 1 or less than 1 for the respective cases the predicted
values are set to 1 or 0

• Miss-Classification Plot :
(1) A plot off all the miss-classifications with respect to each iterations have
been represented as follows
(2) We also observe that as the model iterates through the values
continuously adjusting the parameters the number of occurrences of the
errors gets substantially reduced.
• Python Implementation and Txt file generation

1. Text File Generation

a. Three text files are generated namely test.txt train.txt and Data.txt.
b. Test.txt and train.txt are splits of Data.txt in ratio 80:20.
c. Train.txt has X values along with the labels while test.txt only has X
values
d. We have made use of np.savetxt() to save the data into corresponding
file name

2. Train.py and Train.txt

(i) A part of perceptron class Implemented earlier is used
(ii) Train.txt is loaded into data ,data is split into X and y which are then used
(iii) to train the model. As a result the weights are saved into another file.
(iv) It is to me made sure that the text file for training must be given in
terminal command as train.py train.txt

3. Test.py and Test.txt

a. The weights from the “weights.txt” file are extracted
b. Input values for testing is taken from test.txt
c. Output values are printed into output.txt
d. We are taking use of predict method from the implemented perceptron to
predict the linear output
• Model Testing on different Sizes

I. The model is being tested using variable test size of {30,50,70}

II. The output accuracies are as follows

Question -2
A dimensionality reduction technique based on PCA for face recognition :-

• Data Loading and Preprocessing :-

1) Loading the dataset (labelled daces in the wild (LFW)) using sklearn
library function ‘fetch_lfw_people’.

2) fetch_lfw_people(min_faces_per_person=70, resize=0.4) is used to

extract faces , where individuals with at least 70 samples are included
in the dataset and their images are scaled down to 40 % of the original
size.

3) “X “ contains the 2D array data where each row represents an image

and the number of columns represent a pixel value of that image.
“Y” is a 1D array where each element represents the target label(identity
of the person)for that corresponding person.
“Target_Names” is used to store the name of the target person.
4) Data is then normalized using the formula
(X_test-min_vals_train) / (max_vals_train – min_vals_trains)

• Data Visulalization :-
1) We visualize a small subset of the dataset using ‘matplotlib.pyplot’ by
plotting them in gray map

• PCA Implementation:
1) Initialization:-
(i) The PCA class is initialized with initial parameter as n_components to
which the dataset will be reduced at the end.
(ii) Other variables are also initialized in the constructor namely
“components,mean”

2) Fitting the PCA :-

(i) The function fit takes X_train as parameter.
(ii) We first calculate the mean of dataset using np.mean, no we calculate
the value X= (X-Xmean)
(iii) We calculate the covariance matrix directly using the inbuilt library
function of NumPy np.cov(X.T) where X.T is transpose of the matrix X
(iv) The eigenvalues and eigenvectors are calculated using an inbuilt
function of NumPy library np.linalg.eig(cov)
(v) Now we take transpose of the eigenvector matrix so that each column
represents an eigenvector and sort the eigenvalues in descending
order which is evident from np.argsort()[::-1] , then the Eigenvalues
and Eigenvectors are reordered.
(vi) Then we extract the n_components using
eigenvector[0:self.n_components]
2) Transformation :-

i. The transformation function is used to reduce the dimensional space of

the input data into smaller size.
ii. We centre the data by substracting it from the mean.
iii. The centered data is then projected on the principal components using
matrix multiplication np.dot(X,self.components.T)

• Principal Component Analysis (PCA) Explained

Variance Analysis

1) Calculated cumulative _variance using np.cumsum(pca.explained_variance)

2) The explained variance is given by the amount of variance given by each
Components
3) By plotting the variance with respect to the components , we get to know at
What number of components are enough to predict the highest accuracy
4) Plot is shown as below (We can clearly see that variance saturates at 500 ):-

• Scatter Plot Visulaization

(a) Scatter plot has being plot the transformed dataset after PCA application
(b) We see that the size of the data has changed from (1030,1850) to (1288,550)
(c) The plot for the same displaying the distribution of different classes is as
follows:-

• EigenFaces Plot
1) Since we know that eigen faces are the principal components that are
obtained from applying PCA to image, they represent a specific pattern found
in the dataset.
2) We get eigenfaces by taking transpose of the pca components and extracting
its real value
3) The respective Eigen faces are plotted as follows:-

• Training And Testing Models with and Without PCA

Without PCA:-
1. KNN :- 0.550387596899224
2. Logistic Regression accuracy:- 0.8449612403100775
3. Random Forest accuracy :- 0.5852713178294574

With PCA:-
1. KNN :- 0.5465116279069767
2. Logistic Regression accuracy:- 0.8255813953488372
3. Random Forest accuracy :- 0.37209302325581395
We observe that although the accuracy decreased a bit but at the end we were able to
decrease the number for features to predict the value

• Impact on model accuracy with n components

The below plot shows the impact on accuracy as the n_components of PCA are
varied:-

KNN: Optimal n_components = 100

Maximum Accuracy = 0.5813953488372093

Logistic Regression: Optimal n_components = 300

Maximum Accuracy = 0.8410852713178295

Random Forest: Optimal n_components = 50

Maximum Accuracy = 0.5852713178294574

• Casses where the Model

failed(KNN)

Colab link

Image Processing
No ratings yet
Image Processing
5 pages
A Complete Guide To Cloud Computing
100% (2)
A Complete Guide To Cloud Computing
170 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Pattern Recognition Lab
No ratings yet
Pattern Recognition Lab
24 pages
MECN430 Homework 4
No ratings yet
MECN430 Homework 4
6 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
SC Lab File Fayiz PDF
No ratings yet
SC Lab File Fayiz PDF
29 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
9 pages
DNN ALL Practical 28
No ratings yet
DNN ALL Practical 28
34 pages
featureselection
No ratings yet
featureselection
11 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
30 pages
Deep-Learning-Keras-Tensorflow - 1.1.1 Perceptron and Adaline - Ipynb at Master Leriomaggio - Deep-Learning-Keras-Tensorflow
No ratings yet
Deep-Learning-Keras-Tensorflow - 1.1.1 Perceptron and Adaline - Ipynb at Master Leriomaggio - Deep-Learning-Keras-Tensorflow
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
mlalllabprgs
No ratings yet
mlalllabprgs
17 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
12212159
No ratings yet
12212159
59 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
MiniProject - ML - Ipynb - Colaboratory
No ratings yet
MiniProject - ML - Ipynb - Colaboratory
26 pages
DL_Lab_12212039
No ratings yet
DL_Lab_12212039
72 pages
1
No ratings yet
1
13 pages
ANN_EXPERIENTIAL_LEARNING
No ratings yet
ANN_EXPERIENTIAL_LEARNING
43 pages
ML LAB
No ratings yet
ML LAB
23 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Bilal Ahmad Ai & DSS Assign # 03
No ratings yet
Bilal Ahmad Ai & DSS Assign # 03
7 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Exp 1_Exp 2_Exp 3_merged
No ratings yet
Exp 1_Exp 2_Exp 3_merged
9 pages
Ml Manual
No ratings yet
Ml Manual
30 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
S. NO. Title of The Experiments Page No
No ratings yet
S. NO. Title of The Experiments Page No
11 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
Final ML File
No ratings yet
Final ML File
34 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
ML Lab
No ratings yet
ML Lab
7 pages
som
No ratings yet
som
19 pages
Lab 8
No ratings yet
Lab 8
8 pages
ML Exp 6-9(Corrected)
No ratings yet
ML Exp 6-9(Corrected)
9 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
Machine Learning Lab File (BTCS619-18)
No ratings yet
Machine Learning Lab File (BTCS619-18)
50 pages
Pdf
No ratings yet
Pdf
41 pages
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
No ratings yet
Machine Learning Lab Assignment CSE-716: S. M. Shafkat Raihan ID: 16701041 SESSION: 2015-16
9 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
Experiment 2
No ratings yet
Experiment 2
15 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
14401172022_tanu raman ml lab file
No ratings yet
14401172022_tanu raman ml lab file
21 pages
machine learning final manual
No ratings yet
machine learning final manual
45 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Image Classification Handson-Image - Test
No ratings yet
Image Classification Handson-Image - Test
5 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
HW46
No ratings yet
HW46
5 pages
C121 Exp1
No ratings yet
C121 Exp1
32 pages
Deep Learning Practical Assignment #1:: Instructions
No ratings yet
Deep Learning Practical Assignment #1:: Instructions
5 pages
Lab 1_harshil_parmar (1)
No ratings yet
Lab 1_harshil_parmar (1)
2 pages
hand writing using _cnn (1)
No ratings yet
hand writing using _cnn (1)
5 pages
Deep Learning Perceptron
No ratings yet
Deep Learning Perceptron
10 pages
ML Classification
No ratings yet
ML Classification
54 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Natural Trumpet Music and The Modern Per
100% (1)
Natural Trumpet Music and The Modern Per
81 pages
Metaphysics
100% (4)
Metaphysics
28 pages
SEO For Startups: YCombinator February 2010
94% (17)
SEO For Startups: YCombinator February 2010
37 pages
SN WP MetalBellows
No ratings yet
SN WP MetalBellows
5 pages
RAC Assignments
No ratings yet
RAC Assignments
77 pages
2015 Exit Exam
50% (2)
2015 Exit Exam
12 pages
BJS 2106 Restorative Justice NOTES
No ratings yet
BJS 2106 Restorative Justice NOTES
51 pages
ALAPAAP
No ratings yet
ALAPAAP
31 pages
The AHAM Seven Basic Truths
100% (1)
The AHAM Seven Basic Truths
6 pages
Phases of The Moon
No ratings yet
Phases of The Moon
4 pages
Lesson 2 Customer Service in Management
No ratings yet
Lesson 2 Customer Service in Management
24 pages
Network and System Administration CHP 1 & 2
No ratings yet
Network and System Administration CHP 1 & 2
26 pages
Crystal Growth
No ratings yet
Crystal Growth
40 pages
Introduction To Concrete - Part 2 Outcomes and Sample Questions
No ratings yet
Introduction To Concrete - Part 2 Outcomes and Sample Questions
22 pages
Performance Tasks
No ratings yet
Performance Tasks
29 pages
Weekly Field Audit
No ratings yet
Weekly Field Audit
5 pages
Ttex
No ratings yet
Ttex
26 pages
Maternal and Child Health Nursing Is Family-Centered Assessment Data
100% (1)
Maternal and Child Health Nursing Is Family-Centered Assessment Data
5 pages
GROUP ASSIGNMENT - MKT1702DIG - MKT309m - GROUP2
No ratings yet
GROUP ASSIGNMENT - MKT1702DIG - MKT309m - GROUP2
33 pages
PVC insulation,Multi-Core, Armoured
No ratings yet
PVC insulation,Multi-Core, Armoured
1 page
Fire Ese Diesel
No ratings yet
Fire Ese Diesel
103 pages
171 Value Proposition Canvass
No ratings yet
171 Value Proposition Canvass
2 pages
Powdered Apple Snail Shell As Tiles
No ratings yet
Powdered Apple Snail Shell As Tiles
1 page
Rubric
No ratings yet
Rubric
1 page
Download (Ebook) More How to Draw Manga Vol. 2 - Penning Characters by Go Office ISBN 9784766114836, 4766114833 ebook All Chapters PDF
100% (3)
Download (Ebook) More How to Draw Manga Vol. 2 - Penning Characters by Go Office ISBN 9784766114836, 4766114833 ebook All Chapters PDF
81 pages
Personal Statement - SAMPLE
100% (1)
Personal Statement - SAMPLE
2 pages
Biodata
No ratings yet
Biodata
2 pages
Credit - Private Credit_ Outlook and Considerations _ Morgan Stanley
No ratings yet
Credit - Private Credit_ Outlook and Considerations _ Morgan Stanley
7 pages