0% found this document useful (0 votes)
11 views

B22EE010 Report

1) The document discusses implementing and testing a perceptron model on synthetic data to perform binary classification. Key steps include data generation, normalization, training the perceptron, and measuring misclassification over iterations. 2) It also covers implementing PCA for dimensionality reduction on a face recognition dataset. Steps include loading data, visualizing samples, fitting and transforming data with PCA, and analyzing explained variance to determine optimal number of components. 3) The perceptron is tested on different sized test sets, achieving varying accuracy. PCA is applied to reduce the face data dimensionality before further processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

B22EE010 Report

1) The document discusses implementing and testing a perceptron model on synthetic data to perform binary classification. Key steps include data generation, normalization, training the perceptron, and measuring misclassification over iterations. 2) It also covers implementing PCA for dimensionality reduction on a face recognition dataset. Steps include loading data, visualizing samples, fitting and transforming data with PCA, and analyzing explained variance to determine optimal number of components. 3) The perceptron is tested on different sized test sets, achieving varying accuracy. PCA is applied to reduce the face data dimensionality before further processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CSL2050 - Pattern Recognition and Machine Learning

Assignment 3: Lab5 & lab6

Colab link:report
Question -1
Perceptron Training and Testing on Synthetic Data:-

• Dataset Generation:-
1. A synthetic dataset with sample size 2000 samples has been created having 4
features with labels [w0 w1 w2 w3].

2. 5 integers are taken from range of (-100,100) randomly using np.random()

3. 4 more variables are selected randomly from the range (-100,100) namely
[x1 x2 x3 x4] the pair of the these four variables generated are linearly
combined as the given formula f(x) = w0 + w1*x1 + w2*x2 + w3*x3

4. As per the above mentioned formula the output y is calculated and stored in
for of list in the variable “predictions”, here np.dot(X,w[1:])+w[0] to
compute the output all at once

5. The predictions list and w list is combined together to form the Syn_dataset

• Dataset Visualization:-
1. The Syn_dataset is visualized using scatter plot to observe the distribution of
the output(0 or 1) with respect to values of the features [w0 w1 w2 w3].
2. Python library matplotlib.pyplot has been used to plot a scatter plot , where
label 1 is colored green and represents 1 whereas label 0 is colored red and
represents 0
3. Plot of the Dataset is show as below:-
We can observe the most of the red
dots are populated in the region
from -100 to 0 after which the count
of green dot increases. It should also
be noted that only 200 samples
were plotted to avoid over-
crowding of the data .

• Data
Normalization:-
1. We normalize the data to bring from range of (-100,100) to range of (0,1)

2. Minima and maxima are calculated using datset.min() datset.max()


Normalization is done using the formula: -
normalized_data =(data-min)/(max-min)
After which each column of the normalized data is referenced to the original
data

3. After the normalization of the data, we make box plots to observe the ouliers
which are not present since the data is being synthetically made and also the
data is being normalized.

• Perceptron Implementation :-
(1) Class Initialization:-
▪ The perceptron class is initialized with two default values namely
(learning_rate) and (n_iters).
▪ Few more required variables are also initialized namely lr, n_iters,
weights, bias, and errors

(2) Fitting and training :-


1. Number of samples ,number of features and the weights are initialized
in the beginning and bias is set to zero along with weights.
2. Now we loop through each iteration , for each iteration we set the error
value equal to zero and a new loop for all the features of the data set is
being run.
3. In the second for loop , the linear output is being calculated by
multiplying all the weights with the input X values and bias is added at
the end.
4. Now If the linear output is greater than 1 we term it as 1 else we term it
as zero
5. A new update variable is being created by multiplying the learning rate
with the output of (y_input- y_predicted) as per the given formula
update = self.lr * (y[i] - y_predicted).
6. Then we iterate through all the weights adding the update variable
value to each of them respectively
7. Lastly the error list is being updated at the end of each iteration of the
outermost loop which will be used to plot the misclassification

(3) Prediction Method:


1. Linear output is calculated as per the formula f(x) = w0 + w1*x1 +
w2*x2 + w3*x3 + w4*x4
2. We then run a loop for all the linear outputs and see if linear output is
greater than 1 or less than 1 for the respective cases the predicted
values are set to 1 or 0

• Miss-Classification Plot :
(1) A plot off all the miss-classifications with respect to each iterations have
been represented as follows
(2) We also observe that as the model iterates through the values
continuously adjusting the parameters the number of occurrences of the
errors gets substantially reduced.
• Python Implementation and Txt file generation

1. Text File Generation


a. Three text files are generated namely test.txt train.txt and Data.txt.
b. Test.txt and train.txt are splits of Data.txt in ratio 80:20.
c. Train.txt has X values along with the labels while test.txt only has X
values
d. We have made use of np.savetxt() to save the data into corresponding
file name

2. Train.py and Train.txt


(i) A part of perceptron class Implemented earlier is used
(ii) Train.txt is loaded into data ,data is split into X and y which are then used
(iii) to train the model. As a result the weights are saved into another file.
(iv) It is to me made sure that the text file for training must be given in
terminal command as train.py train.txt

3. Test.py and Test.txt


a. The weights from the “weights.txt” file are extracted
b. Input values for testing is taken from test.txt
c. Output values are printed into output.txt
d. We are taking use of predict method from the implemented perceptron to
predict the linear output
• Model Testing on different Sizes

I. The model is being tested using variable test size of {30,50,70}


II. The output accuracies are as follows

Question -2
A dimensionality reduction technique based on PCA for face recognition :-

• Data Loading and Preprocessing :-


1) Loading the dataset (labelled daces in the wild (LFW)) using sklearn
library function ‘fetch_lfw_people’.

2) fetch_lfw_people(min_faces_per_person=70, resize=0.4) is used to


extract faces , where individuals with at least 70 samples are included
in the dataset and their images are scaled down to 40 % of the original
size.

3) “X “ contains the 2D array data where each row represents an image


and the number of columns represent a pixel value of that image.
“Y” is a 1D array where each element represents the target label(identity
of the person)for that corresponding person.
“Target_Names” is used to store the name of the target person.
4) Data is then normalized using the formula
(X_test-min_vals_train) / (max_vals_train – min_vals_trains)

• Data Visulalization :-
1) We visualize a small subset of the dataset using ‘matplotlib.pyplot’ by
plotting them in gray map

• PCA Implementation:
1) Initialization:-
(i) The PCA class is initialized with initial parameter as n_components to
which the dataset will be reduced at the end.
(ii) Other variables are also initialized in the constructor namely
“components,mean”

2) Fitting the PCA :-


(i) The function fit takes X_train as parameter.
(ii) We first calculate the mean of dataset using np.mean, no we calculate
the value X= (X-Xmean)
(iii) We calculate the covariance matrix directly using the inbuilt library
function of NumPy np.cov(X.T) where X.T is transpose of the matrix X
(iv) The eigenvalues and eigenvectors are calculated using an inbuilt
function of NumPy library np.linalg.eig(cov)
(v) Now we take transpose of the eigenvector matrix so that each column
represents an eigenvector and sort the eigenvalues in descending
order which is evident from np.argsort()[::-1] , then the Eigenvalues
and Eigenvectors are reordered.
(vi) Then we extract the n_components using
eigenvector[0:self.n_components]
2) Transformation :-

i. The transformation function is used to reduce the dimensional space of


the input data into smaller size.
ii. We centre the data by substracting it from the mean.
iii. The centered data is then projected on the principal components using
matrix multiplication np.dot(X,self.components.T)

• Principal Component Analysis (PCA) Explained


Variance Analysis

1) Calculated cumulative _variance using np.cumsum(pca.explained_variance)


2) The explained variance is given by the amount of variance given by each
Components
3) By plotting the variance with respect to the components , we get to know at
What number of components are enough to predict the highest accuracy
4) Plot is shown as below (We can clearly see that variance saturates at 500 ):-

• Scatter Plot Visulaization


(a) Scatter plot has being plot the transformed dataset after PCA application
(b) We see that the size of the data has changed from (1030,1850) to (1288,550)
(c) The plot for the same displaying the distribution of different classes is as
follows:-

• EigenFaces Plot
1) Since we know that eigen faces are the principal components that are
obtained from applying PCA to image, they represent a specific pattern found
in the dataset.
2) We get eigenfaces by taking transpose of the pca components and extracting
its real value
3) The respective Eigen faces are plotted as follows:-

• Training And Testing Models with and Without PCA


Without PCA:-
1. KNN :- 0.550387596899224
2. Logistic Regression accuracy:- 0.8449612403100775
3. Random Forest accuracy :- 0.5852713178294574

With PCA:-
1. KNN :- 0.5465116279069767
2. Logistic Regression accuracy:- 0.8255813953488372
3. Random Forest accuracy :- 0.37209302325581395
We observe that although the accuracy decreased a bit but at the end we were able to
decrease the number for features to predict the value

• Impact on model accuracy with n components


The below plot shows the impact on accuracy as the n_components of PCA are
varied:-

KNN: Optimal n_components = 100

Maximum Accuracy = 0.5813953488372093

Logistic Regression: Optimal n_components = 300

Maximum Accuracy = 0.8410852713178295

Random Forest: Optimal n_components = 50

Maximum Accuracy = 0.5852713178294574

• Casses where the Model


failed(KNN)

Colab link

You might also like