B22EE010 Report
B22EE010 Report
Colab link:report
Question -1
Perceptron Training and Testing on Synthetic Data:-
• Dataset Generation:-
1. A synthetic dataset with sample size 2000 samples has been created having 4
features with labels [w0 w1 w2 w3].
3. 4 more variables are selected randomly from the range (-100,100) namely
[x1 x2 x3 x4] the pair of the these four variables generated are linearly
combined as the given formula f(x) = w0 + w1*x1 + w2*x2 + w3*x3
4. As per the above mentioned formula the output y is calculated and stored in
for of list in the variable “predictions”, here np.dot(X,w[1:])+w[0] to
compute the output all at once
5. The predictions list and w list is combined together to form the Syn_dataset
• Dataset Visualization:-
1. The Syn_dataset is visualized using scatter plot to observe the distribution of
the output(0 or 1) with respect to values of the features [w0 w1 w2 w3].
2. Python library matplotlib.pyplot has been used to plot a scatter plot , where
label 1 is colored green and represents 1 whereas label 0 is colored red and
represents 0
3. Plot of the Dataset is show as below:-
We can observe the most of the red
dots are populated in the region
from -100 to 0 after which the count
of green dot increases. It should also
be noted that only 200 samples
were plotted to avoid over-
crowding of the data .
• Data
Normalization:-
1. We normalize the data to bring from range of (-100,100) to range of (0,1)
3. After the normalization of the data, we make box plots to observe the ouliers
which are not present since the data is being synthetically made and also the
data is being normalized.
• Perceptron Implementation :-
(1) Class Initialization:-
▪ The perceptron class is initialized with two default values namely
(learning_rate) and (n_iters).
▪ Few more required variables are also initialized namely lr, n_iters,
weights, bias, and errors
• Miss-Classification Plot :
(1) A plot off all the miss-classifications with respect to each iterations have
been represented as follows
(2) We also observe that as the model iterates through the values
continuously adjusting the parameters the number of occurrences of the
errors gets substantially reduced.
• Python Implementation and Txt file generation
Question -2
A dimensionality reduction technique based on PCA for face recognition :-
• Data Visulalization :-
1) We visualize a small subset of the dataset using ‘matplotlib.pyplot’ by
plotting them in gray map
• PCA Implementation:
1) Initialization:-
(i) The PCA class is initialized with initial parameter as n_components to
which the dataset will be reduced at the end.
(ii) Other variables are also initialized in the constructor namely
“components,mean”
• EigenFaces Plot
1) Since we know that eigen faces are the principal components that are
obtained from applying PCA to image, they represent a specific pattern found
in the dataset.
2) We get eigenfaces by taking transpose of the pca components and extracting
its real value
3) The respective Eigen faces are plotted as follows:-
With PCA:-
1. KNN :- 0.5465116279069767
2. Logistic Regression accuracy:- 0.8255813953488372
3. Random Forest accuracy :- 0.37209302325581395
We observe that although the accuracy decreased a bit but at the end we were able to
decrease the number for features to predict the value
Colab link