ML Lab Record
ML Lab Record
Expt-1
1.a Program to access and observe the samples from iris dataset.
print('\n\n\n\n')
df['target_names']=df['target'].apply(lambda y:iris.target_names[y])
:Summary Statistics:
The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.
.. topic:: References
target target_names
0 0 setosa
1 0 setosa
2 0 setosa
3 0 setosa
4 0 setosa
To display randomply 5 samples
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
3 4.6 3.1 1.5 0.2
94 5.6 2.7 4.2 1.3
55 5.7 2.8 4.5 1.3
107 7.3 2.9 6.3 1.8
139 6.9 3.1 5.4 2.1
target target_names
3 0 setosa
94 1 versicolor
55 1 versicolor
107 2 virginica
139 2 virginica
The total number of samples in the dataset = 150
The number of samples in training set = 105
The number of samples in testing set = 45
The first five samples of training set
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
104 6.5 3.0 5.8 2.2
23 5.1 3.3 1.7 0.5
18 5.7 3.8 1.7 0.3
52 6.9 3.1 4.9 1.5
75 6.6 3.0 4.4 1.4
target target_names
104 2 virginica
23 0 setosa
18 0 setosa
52 1 versicolor
75 1 versicolor
target target_names
55 1 versicolor
124 2 virginica
111 2 virginica
74 1 versicolor
76 1 versicolor
Assignment
1) Show that in two different executions same samples will not be present training and
testing set
2) Give a reason, why target names appear as setosa, versicular and virginica?
Ans: The following command gives target names.
iris.target_names
3) Execute the following commands and give reason for the outputs obtained.
print(df.shape)
print(df_train.shape)
print(df_test.shape)
1.b to access and observe the samples from iris dataset.
print('\n\n\n\n')
df['target_names']=df['target'].apply(lambda y:digit.target_names[y])
OUTPUT
THIS IS THE PROGRAM TO ACCESS DIGITS DATASET
To print the description of Digits Dataset
.. _digits_dataset:
.. topic:: References
target_names
0 0
1 1
2 2
3 3
4 4
[5 rows x 66 columns]
To display randomply 5 samples
pixel_0_0 pixel_0_1 pixel_0_2 pixel_0_3 pixel_0_4 pixel_0_5
\
527 0.0 0.0 6.0 13.0 0.0 0.0
95 0.0 0.0 0.0 11.0 16.0 8.0
162 0.0 5.0 16.0 16.0 16.0 11.0
1120 0.0 0.0 1.0 11.0 14.0 5.0
1295 0.0 0.0 4.0 15.0 13.0 3.0
target target_names
527 1 1
95 6 6
162 5 5
1120 1 1
1295 8 8
[5 rows x 66 columns]
The total number of samples in the dataset = 1797
The number of samples in training set = 1257
The number of samples in testing set = 540
The first five samples of training set
pixel_0_0 pixel_0_1 pixel_0_2 pixel_0_3 pixel_0_4 pixel_0_5
\
4 0.0 0.0 0.0 1.0 11.0 0.0
53 0.0 0.0 4.0 8.0 16.0 5.0
1187 0.0 0.0 9.0 14.0 15.0 6.0
1686 0.0 0.0 8.0 14.0 12.0 3.0
1727 0.0 0.0 6.0 11.0 16.0 13.0
target target_names
4 4 4
53 8 8
1187 0 0
1686 9 9
1727 3 3
[5 rows x 66 columns]
target_names
415 9
740 7
67 6
768 8
463 2
[5 rows x 66 columns]
Assignment
1) Show that in two different executions same samples will not be present training and
testing set.
2) Give reason why target name is not appearing as One , two, three
print('\n\n\n\n')
OUTPUT
THIS IS THE PROGRAM TO ACCESS DIABETES DATASET
To print the description of Digits Dataset
.. _diabetes_dataset:
Diabetes dataset
----------------
Ten baseline variables, age, sex, body mass index, average blood
pressure, and six blood serum measurements were obtained for each of n
=
442 diabetes patients, as well as the response of interest, a
quantitative measure of disease progression one year after baseline.
:Attribute Information:
- age age in years
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, total serum cholesterol
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, total cholesterol / HDL
- s5 ltg, possibly log of serum triglycerides level
- s6 glu, blood sugar level
Note: Each of these 10 feature variables have been mean centered and
scaled by the standard deviation times the square root of `n_samples`
(i.e. the sum of squares of each column totals 1).
Source URL:
https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
s4 s5 s6 target
0 -0.002592 0.019907 -0.017646 151.0
1 -0.039493 -0.068332 -0.092204 75.0
2 -0.002592 0.002861 -0.025930 141.0
3 0.034309 0.022688 -0.009362 206.0
4 -0.002592 -0.031988 -0.046641 135.0
To display randomply 5 samples
age sex bmi bp s1 s2
s3 \
261 0.048974 -0.044642 -0.041774 0.104501 0.035582 -0.025739
0.177497
263 -0.074533 0.050680 -0.077342 -0.046985 -0.046975 -0.032629
0.004460
337 0.019913 0.050680 -0.012673 0.070072 -0.011201 0.007141 -
0.039719
140 0.041708 0.050680 0.014272 0.042529 -0.030464 -0.001314 -
0.043401
363 -0.049105 0.050680 -0.024529 0.000079 -0.046975 -0.028245 -
0.065491
s4 s5 s6 target
261 -0.076395 -0.012909 0.015491 103.0
263 -0.039493 -0.072133 -0.017646 116.0
337 0.034309 0.005386 0.003064 91.0
140 -0.002592 -0.033246 0.015491 118.0
363 0.028405 0.019196 0.011349 58.0
The total number of samples in the dataset = 442
The number of samples in training set = 309
The number of samples in testing set = 133
The first five samples of training set
age sex bmi bp s1 s2
s3 \
168 0.001751 0.050680 0.059541 -0.002228 0.061725 0.063195 -
0.058127
295 -0.052738 0.050680 0.039062 -0.040099 -0.005697 -0.012900
0.011824
100 0.016281 -0.044642 0.017506 -0.022885 0.060349 0.044406
0.030232
269 0.009016 -0.044642 -0.032073 -0.026328 0.042462 -0.010395
0.159089
316 0.016281 0.050680 0.014272 0.001215 0.001183 -0.021355 -
0.032356
s4 s5 s6 target
168 0.108111 0.068986 0.127328 268.0
295 -0.039493 0.016307 0.003064 85.0
100 -0.002592 0.037236 -0.001078 128.0
269 -0.076395 -0.011897 -0.038357 87.0
316 0.034309 0.074966 0.040343 220.0
s4 s5 s6 target
435 -0.002592 -0.038460 -0.038357 64.0
271 -0.002592 -0.018114 0.007207 127.0
62 -0.039493 -0.082379 -0.025930 52.0
74 -0.039493 0.003709 0.073480 85.0
413 -0.039493 -0.035816 0.019633 113.0
Assignment
1) Show that in two different executions same samples will not be present training and
testing set.
2) Give a reason, why the target name is not present?
Expt:-3
Aim:-To perform K-Fold cross-validation. Write a program to Split the given dataset into K-train and
test sets using K-Fold cross-validation.
X,y=make_classification(n_samples=10,n_features=4,n_classes=2)
import numpy as np
def kfold_indices(data, k):
fold_size = len(data) // k
indices = np.arange(len(data))
folds = []
for i in range(k):
test_indices = indices[i * fold_size: (i + 1) * fold_size]
train_indices = np.concatenate([indices[:i * fold_size],
indices[(i + 1) * fold_size:]])
folds.append((train_indices, test_indices))
return folds
print('X_train samples')
print(X_train)
print('X_test samples')
print(X_test)
# Train the model on the training data
model.fit(X_train, y_train)
OUTPUT
X_train samples
[[-1.18941118 0.48413876 1.99104662 1.22821249]
[ 1.56378906 -0.02472443 -0.58163875 1.03924999]
[ 0.60791888 -0.46216114 -1.73221979 -1.5591981 ]
[-1.6902142 0.14973539 1.03805245 -0.58963015]
[-2.54374997 0.31212816 1.85105828 -0.51093257]
[-0.76359532 0.28232836 1.18343782 0.66493047]
[ 1.40146806 0.10232489 -0.10697891 1.471395 ]
[-0.13135007 -0.15316023 -0.46778227 -0.76072441]]
X_test samples
[[-0.53713281 -0.18806132 -0.45435904 -1.20963223]
[ 0.90410033 -0.44415591 -1.76687511 -1.26394141]]
X_train samples
[[-0.53713281 -0.18806132 -0.45435904 -1.20963223]
[ 0.90410033 -0.44415591 -1.76687511 -1.26394141]
[ 0.60791888 -0.46216114 -1.73221979 -1.5591981 ]
[-1.6902142 0.14973539 1.03805245 -0.58963015]
[-2.54374997 0.31212816 1.85105828 -0.51093257]
[-0.76359532 0.28232836 1.18343782 0.66493047]
[ 1.40146806 0.10232489 -0.10697891 1.471395 ]
[-0.13135007 -0.15316023 -0.46778227 -0.76072441]]
X_test samples
[[-1.18941118 0.48413876 1.99104662 1.22821249]
[ 1.56378906 -0.02472443 -0.58163875 1.03924999]]
X_train samples
[[-0.53713281 -0.18806132 -0.45435904 -1.20963223]
[ 0.90410033 -0.44415591 -1.76687511 -1.26394141]
[-1.18941118 0.48413876 1.99104662 1.22821249]
[ 1.56378906 -0.02472443 -0.58163875 1.03924999]
[-2.54374997 0.31212816 1.85105828 -0.51093257]
[-0.76359532 0.28232836 1.18343782 0.66493047]
[ 1.40146806 0.10232489 -0.10697891 1.471395 ]
[-0.13135007 -0.15316023 -0.46778227 -0.76072441]]
X_test samples
[[ 0.60791888 -0.46216114 -1.73221979 -1.5591981 ]
[-1.6902142 0.14973539 1.03805245 -0.58963015]]
X_train samples
[[-0.53713281 -0.18806132 -0.45435904 -1.20963223]
[ 0.90410033 -0.44415591 -1.76687511 -1.26394141]
[-1.18941118 0.48413876 1.99104662 1.22821249]
[ 1.56378906 -0.02472443 -0.58163875 1.03924999]
[ 0.60791888 -0.46216114 -1.73221979 -1.5591981 ]
[-1.6902142 0.14973539 1.03805245 -0.58963015]
[ 1.40146806 0.10232489 -0.10697891 1.471395 ]
[-0.13135007 -0.15316023 -0.46778227 -0.76072441]]
X_test samples
[[-2.54374997 0.31212816 1.85105828 -0.51093257]
[-0.76359532 0.28232836 1.18343782 0.66493047]]
X_train samples
[[-0.53713281 -0.18806132 -0.45435904 -1.20963223]
[ 0.90410033 -0.44415591 -1.76687511 -1.26394141]
[-1.18941118 0.48413876 1.99104662 1.22821249]
[ 1.56378906 -0.02472443 -0.58163875 1.03924999]
[ 0.60791888 -0.46216114 -1.73221979 -1.5591981 ]
[-1.6902142 0.14973539 1.03805245 -0.58963015]
[-2.54374997 0.31212816 1.85105828 -0.51093257]
[-0.76359532 0.28232836 1.18343782 0.66493047]]
X_test samples
[[ 1.40146806 0.10232489 -0.10697891 1.471395 ]
[-0.13135007 -0.15316023 -0.46778227 -0.76072441]]
K-Fold Cross-Validation Scores: [0.5, 1.0, 0.5, 1.0, 1.0]
Mean Accuracy: 0.8
Assignment
Program 4
Aim: To generate a Confusion Matrix and compute true positive, true negative, false
positive, and false negative.
Program to generate confusion Matrix and classification report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
# actual values
#A=1= Positive Class , B=0=Negative Class
actual = [1,0,0,1,0,1,1,1,0,1,0]
# predicted values
predicted = [1,0,0,1,0,0,0,1,0,0,1]
# confusion matrix
matrix = confusion_matrix(actual,predicted, labels=[1,0])
print('Confusion matrix : \n',matrix)
acc=accuracy_score(actual,predicted)
print('Accuracy = ',acc)
matrix = classification_report(actual,predicted,labels=[1,0])
print('Classification Report \n')
print(matrix)
fpr, tpr , _= metrics.roc_curve(actual, predicted) #create ROC curve
print('fpr = ',fpr)
print('tpr = ',tpr)
plt.plot(fpr,tpr)
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
OUTPUT
Confusion matrix :
[[3 3]
[1 4]]
Accuracy = 0.6363636363636364
Classification Report
accuracy 0.64 11
macro avg 0.66 0.65 0.63 11
weighted avg 0.67 0.64 0.63 11
Assignment:
1) Verify theoretically the entries of the classification report.
2∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
Note: 𝑓1 − 𝑠𝑐𝑜𝑟𝑒 =
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
2) Experiment with the following actual and predicted samples and verify the entries
of the classification report.
# actual values
#A=1=Positive Class , B=0=Negative Class
actual = [1,0,0,1,0,1,1,1,0,1,0,1,1,1,1,0,0,1]
# predicted values
predicted = [1,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,1,0]
Program - 5
Program to learn Linear Regression using Gradient Descent method
1.4796491688889395 0.10148121494753734
Assignment:
Write the inference for the linear regression model by varying ‘m’
and ‘c’ values (with reference to the regression line obtained)
4. Repeat this process until the loss function is a very small value or
ideally 0 (which means 0 error or 100% accuracy). The value
of m and c that are left with are the optimum values.
Reference:
1. Linear Regression using Gradient Descent | by Adarsh Menon | Towards Data Science
UE22EC352B MachineLearning andApplications Lab
Program: 6
c. Exhibit low bias, high variance model and high bias, low variance model
importnumpyasnp
importmatplotlib.pyplotasplt
fromsklearn.model_selectionimporttrain_test_split
fromsklearn.linear_modelimportLinearRegression from
sklearn.metrics import mean_squared_error from
sklearn.pipeline import make_pipeline
fromsklearn.preprocessingimportPolynomialFeatures
#Generateasyntheticdataset
np.random.seed(0)
X=np.linspace(-3,3, 100)
y=np.sin(X)+np.random.normal(0,0.2,100)#Addingnoise X = X[:,
np.newaxis]# Reshape for sklearn
#Splitthedatasetintotrainingandtestingsets
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2, random_state=42)
#HighBias/LowVarianceModel(LinearRegression)
linear_model = LinearRegression() linear_model.fit(X_train,
y_train)
y_pred_train_linear=linear_model.predict(X_train) y_pred_test_linear =
linear_model.predict(X_test)
# Low Bias / High Variance Model (Polynomial Regression with high degree)
poly_model=make_pipeline(PolynomialFeatures(degree=15),LinearRegression())
poly_model.fit(X_train, y_train)
y_pred_train_poly=poly_model.predict(X_train)
y_pred_test_poly = poly_model.predict(X_test)
# Plotting
plt.figure(figsize=(14,6))
#LinearModelPlot plt.subplot(1,
2, 1)
plt.scatter(X_train,y_train,label='Trainingdata',color='blue',alpha=0.6)
plt.scatter(X_test, y_test, label='Test data', color='green', alpha=0.6) plt.plot(X,
linear_model.predict(X), color='red', label='Linear Model') plt.title('High Bias /
Low Variance Model')
plt.legend()
plt.show()
#PrintMeanSquaredErrorforbothmodels
print(f"LinearModelTrainingMSE:{mean_squared_error(y_train,
y_pred_train_linear)}")
print(f"LinearModelTestingMSE:{mean_squared_error(y_test,y_pred_test_linear)}")
print(f"Polynomial Model Training MSE: {mean_squared_error(y_train,
y_pred_train_poly)}")
print(f"PolynomialModelTestingMSE:{mean_squared_error(y_test,
y_pred_test_poly)}")
OUTPUT
# Calculate Mean Squared Error (MSE) for training and testing sets
mse_train_linear=mean_squared_error(y_train,y_pred_train_linear) mse_test_linear =
mean_squared_error(y_test, y_pred_test_linear) mse_train_poly = mean_squared_error(y_train,
y_pred_train_poly) mse_test_poly = mean_squared_error(y_test, y_pred_test_poly)
#DisplayMSEforbothmodels
print("LinearRegressionModel(HighBias/LowVariance)")
print(f"Training MSE: {mse_train_linear}")
print(f"TestingMSE:{mse_test_linear}\n")
print("PolynomialRegressionModel(LowBias/HighVariance)")
print(f"Training MSE: {mse_train_poly}")
print(f"TestingMSE:{mse_test_poly}")
OUTPUT:
TestingMSE:0.24695224333039673
TestingMSE:0.1425297085836628
frommlxtend.evaluateimportbias_variance_decomp
mse,bias,var=bias_variance_decomp(linear_model,X_train,y_train,X_test,y_test, loss='mse',
num_rounds=200, random_seed=1)
# summarize results print('MSE:
%.3f'% mse) print('Bias: %.3f'%
bias) print('Variance:%.3f'%var)
OUTPUT:
MSE:0.253
Bias:0.247
Variance:0.006
mse,bias,var=bias_variance_decomp(poly_model,X_train,y_train,X_test,y_test, loss='mse',
num_rounds=200, random_seed=1)
# summarize results print('MSE:
%.3f'% mse) print('Bias: %.3f'%
bias) print('Variance:%.3f'%var)
OUTPUT:
MSE:2.609
Bias:0.140
Variance:2.468
Assignment:
c. What is your inference on train and test mse values for each of the models?
Program 7
Program to implement a decision tree for Classification and Regression
#PROGRAM To Implement a Decision TREE AND to DISPLAY IT
print('PROGRAM To Implement a Decision Tree AND to DISPLAY IT')
from sklearn import datasets
import pandas as pd
iris=datasets.load_iris()
# df will fold dataset as a table
df=pd.DataFrame(
iris.data,
columns=iris.feature_names
)
#labels are assigned to df[target] table or array
df['target']=pd.Series(
iris.target
)
from sklearn.model_selection import train_test_split
# Train Test Split Ratio
df_train,df_test=train_test_split(df,test_size=0.3)
df['target_names']=df['target'].apply(lambda y:iris.target_names[y])
print('Number of Training samples')
print(df_train.shape[0])
print('Number of Testing samples')
print(df_test.shape[0])
#Importing Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
clf=DecisionTreeClassifier()
x_train=df_train[iris.feature_names]
x_test=df_test[iris.feature_names]
y_train=df_train['target']
y_test=df_test['target']
#Training Decision Tree Classifier
clf.fit(x_train,y_train)
#Testing the data
y_test_pred=clf.predict(x_test)
print('Class of Testing Samples')
print(y_test_pred)
#To display the decision tree in command shell
from sklearn.tree import export_text
from sklearn import tree
from matplotlib import pyplot as plt
text_representation = tree.export_text(clf)
print(text_representation)
with open("decistion_tree.log", "w") as fout:
fout.write(text_representation)
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(clf,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True)
fig.savefig("decistion_tree.png")
OUTPUT:
PROGRAM To Implement a Decision Tree AND to DISPLAY IT
Number of Training samples
105
Number of Testing samples
45
Class of Testing Samples
[0 1 0 1 1 1 2 0 2 1 2 0 0 2 2 0 0 1 0 2 1 0 1 2 1 2 0 0 0 1 1 1 2 0 1 0 0
2 1 1 1 0 0 0 1]
|--- feature_2 <= 2.60
| |--- class: 0
|--- feature_2 > 2.60
| |--- feature_3 <= 1.75
| | |--- feature_2 <= 4.95
| | | |--- class: 1
| | |--- feature_2 > 4.95
| | | |--- feature_3 <= 1.55
| | | | |--- class: 2
| | | |--- feature_3 > 1.55
| | | | |--- class: 1
| |--- feature_3 > 1.75
| | |--- class: 2
b) Program to calculate accuracy of decision tree
#PROGRAM To Calculate Accuracy of Decision Tree
print('PROGRAM To Calculate Accuracy of Decision Tree')
from sklearn import datasets
import pandas as pd
iris=datasets.load_iris()
# df will fold dataset as a table
df=pd.DataFrame( iris.data, columns=iris.feature_names )
#labels are assigned to df[target] table or array
df['target']=pd.Series( iris.target )
from sklearn.model_selection import train_test_split
# Train Test Split Ratio
df_train,df_test=train_test_split(df,test_size=0.3)
df['target_names']=df['target'].apply(lambda y:iris.target_names[y])
print('Number of Training samples')
print(df_train.shape[0])
print('Number of Testing samples')
print(df_test.shape[0])
#Importing Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
clf=DecisionTreeClassifier()
x_train=df_train[iris.feature_names]
x_test=df_test[iris.feature_names]
y_train=df_train['target']
y_test=df_test['target']
#Training Decision Tree Classifier
clf.fit(x_train,y_train)
#Testing the data
y_test_pred=clf.predict(x_test)
print('Class of Testing Samples')
print(y_test_pred)
from sklearn.metrics import accuracy_score
x=accuracy_score(y_test,y_test_pred)
print('Accuracy')
print(x)
OUTPUT:
PROGRAM To Calculate Accuracy of Decision Tree
Number of Training samples
105
Number of Testing samples
45
Class of Testing Samples
[1 0 0 0 1 2 1 0 2 2 0 1 2 2 1 1 0 1 2 2 1 2 2 0 0 2 0 1 1 2 0 2 2 1 0 1 1
2 1 0 1 1 2 1 0]
Accuracy
0.9111111111111111
OUTPUT:
Training Samples
Gender Height
62 1 138.355342
713 2 180.390215
676 2 182.020545
68 1 142.013297
719 2 188.114743
.. ... ...
644 2 179.164963
39 1 160.125862
759 2 189.485930
359 1 184.260191
484 2 170.556742
criterion=absolute_error:
1. Change the train test split to 0.4 and execute both the programs. Observe the
changes
2. Write the equations of the error functions used in both the cases
3. Explain the following built in functions:
1. clf=DecisionTreeClassifier()
2. criterion in['squared_error','absolute_error']:
3. rgrsr=DecisionTreeRegressor(criterion=criterion)
4. df_height.groupby('Gender')[['Height']].agg([np.mean,np.median]).round(1)
Objective:
Implement a Decision Tree Classifier in Python from scratch. Classifier should handle
categorical and numerical data and include functionality to prevent over fitting through tree
pruning.
Expt:7
OUTPUT
PROGRAM To Implement a Decision Tree AND to DISPLAY IT
Number of Training samples
105
Number of Testing samples
45
Class of Testing Samples
[1 0 2 2 0 2 0 0 0 2 0 2 1 1 2 0 0 2 1 0 1 0 1 1 2 1 2 2 2 0 1 1 1 1 0 2 2
1 1 0 1 0 2 2 0]
|--- feature_2 <= 2.45
| |--- class: 0
|--- feature_2 > 2.45
| |--- feature_3 <= 1.70
| | |--- feature_2 <= 4.95
| | | |--- class: 1
| | |--- feature_2 > 4.95
| | | |--- feature_3 <= 1.55
| | | | |--- class: 2
| | | |--- feature_3 > 1.55
| | | | |--- class: 1
| |--- feature_3 > 1.70
| | |--- class: 2
OUTPUT
OUTPUT
Training Samples
Gender Height
585 2 183.343569
590 2 171.482107
740 2 188.868396
633 2 182.644465
515 2 178.841086
.. ... ...
336 1 136.035596
694 2 180.998702
449 2 176.484152
332 1 164.418790
299 1 186.458330
Assignment
1) Write a program to calculate the confusion matrix and classification report for the decision
tree
2) Implement a decision tree as a classifier using digits dataset
Expt.8a)Aim:Outlier detection with LOF
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import LocalOutlierFactor
np.random.seed(42)
n_outliers = len(X_outliers)
ground_truth = np.ones(len(X), dtype=int)
ground_truth[-n_outliers:] = -1
OUTPUT
Assignment:
#plt.style.use('seaborn')
plt.figure(figsize = (10,10))
plt.scatter(X[:,0], X[:,1], c=y, marker= '*',s=100,edgecolors='black')
plt.show()
knn5 = KNeighborsClassifier(n_neighbors = 5)
knn1 = KNeighborsClassifier(n_neighbors=1)
knn5.fit(X_train, y_train)
knn1.fit(X_train, y_train)
y_pred_5 = knn5.predict(X_test)
y_pred_1 = knn1.predict(X_test)
plt.figure(figsize = (15,5))
plt.subplot(1,2,1)
plt.scatter(X_test[:,0], X_test[:,1], c=y_pred_5, marker= '*',
s=100,edgecolors='black')
plt.title("Predicted values with k=5", fontsize=20)
plt.subplot(1,2,2)
plt.scatter(X_test[:,0], X_test[:,1], c=y_pred_1, marker= '*',
s=100,edgecolors='black')
plt.title("Predicted values with k=1", fontsize=20)
plt.show()
OUTPUT
Accuracy with k=5 93.60000000000001
Program 9:
Program 9: SVM
Program 10:
Assignment: Modify no. of samples and k value and repeat the plots
10.b. Principal Component Analysis(PCA):
Assignment: Check for different Train Test split and different data set.
Ref:
1. https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
2. https://scikit-learn.org/stable/modules/clustering.html
3. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
4. https://www.kdnuggets.com/2023/05/principal-component-analysis-pca-scikitlearn.html
5. https://builtin.com/machine-learning/pca-in-python