Ml Cyber Lab
Ml Cyber Lab
Laboratory Exercises
Kuldeep. J. Purohit
November 25, 2024
Contents
1 Data Manipulation and Statistical Analysis 2
1
1 Data Manipulation and Statistical Analysis
Objective
Apply Python programming skills to manipulate data using lists, dictionaries, and the
pandas library. Perform statistical operations such as mean, median, mode, standard
deviation, and variance.
Tasks
1. Create a dictionary representing a dataset of students with their names, ages, and
scores. Convert it into a pandas DataFrame and display the data.
Solution
1
import pandas as pd
2 import numpy as np
3 import matplotlib . pyplot as plt
4
5
# Data
dictionary data
6
= {
7
’ Name ’: [ ’ Alice ’, ’ Bob ’, ’ Charlie ’, ’ David ’, ’ Eve
8 ’], ’ Age ’: [23 , 22 , 24 , 23 , 22] ,
9 ’ Score ’: [88 , 92 , 85 , 91 , 89]
10 }
11
# Convert to Data
12
Frame df = pd . Data
13
Frame ( data ) print ( df )
14
22 # Print statistics
23 print ( f" Mean : { mean_score
24
}")
print ( f" Median : { median_score
25
26
2
27
28
3
29
30 # Plot histogram
31 plt . hist ( df [ ’ Score ’], bins =5 , color = ’ blue ’, alpha
32 =0.7) plt . title ( ’ Distribution of Scores ’)
33
plt . xlabel ( ’ Score ’)
plt . ylabel ( ’ Frequency
34
’) plt . show ()
35
Tasks
1. Represent the system of equations 2x + 3y = 5 and x −y = 1 as a matrix equation
Ax = b.
3. Verify the solution by substituting x and y back into the original equations.
Solution
1
import numpy as np
2
3 # Coefficients matrix A
4 A = np . array ([[2 , 3] , [1 , -1]])
5
# Constants matrix
6
b b = np . array ([5 ,
7
1])
8
15
5
Tasks
1. Create vectors and perform:
• Dot product
• Element-wise addition
• Cross product
2. Create matrices and perform:
• Matrix multiplication
• Transpose
• Inverse (if invertible)
3. Compute eigenvalues and eigenvectors of a random 3 × 3 matrix.
Solution
1
# Vector operations
2 v1 = np . array ([1 , 2 , 3])
3 v2 = np . array ([4 , 5 , 6])
4
5
dot_product = np . dot ( v1 , v2 )
elementwise_addition = v1 + v2
6
cross_product = np . cross ( v1 ,
7
v2 )
8
27
28
29
30
7
Tasks
1. Generate synthetic data for y = 2x + 1 with random noise.
2. Visualize the data using matplotlib.
3. Implement the linear regression formula:
θ = (XT X)−1XT y
Solution
1
# Generate synthetic data
2 np . random . seed (42)
3 X = np . random . rand (100 , 1) * 10
4 y = 2 * X + 1 + np . random . randn (100 , 1)
5
# Visualize the data
6
plt . scatter (X , y , color = ’ blue
7
’) plt . title ( ’ Generated Data
8 ’) plt . xlabel ( ’X ’)
9 plt .
10 ylabel ( ’y ’)
11
plt . show ()
12
# Linear regression
13
X_bias = np . c_ [ np . ones (( X . shape [0] , 1) ) , X ]
14
theta = np . linalg . inv ( X_bias . T. dot ( X_bias )) . dot ( X_bias . T). dot ( y )
15
16 # Predictions
17 y_pred = X_bias . dot ( theta )
18
# MSE
19
mse = np . mean (( y - y_pred ) ** 2)
20
print ( f" Mean Squared Error : { mse }" )
21
28
29
30
Tasks
8
1. Research and Present AI Applications: List at least 5 applications of AI and
ML in different domains (e.g., healthcare, finance, transportation, etc.). Write a
brief explanation of how AI/ML is used in each application.
9
2. Classify Types of Learning: Describe and compare supervised learning, unsu-
pervised learning, and reinforcement learning. For each type of learning, provide a
real-world example. Create a table summarizing the types of learning.
3. Hands-on Task: Load a simple dataset (e.g., the Iris dataset) using scikit-learn
and visualize the features.
1
from sklearn import datasets
2 import matplotlib . pyplot as plt
3
14
15
Tasks
1. Data Cleaning: Load a dataset (e.g., Titanic dataset from Kaggle). Check for
missing values and apply methods to handle them (e.g., fill with mean or drop
rows).
1
import pandas as pd
2
1
1
3. Feature Selection: Use correlation analysis or feature importance (e.g., decision
trees) to select relevant features.
1
import seaborn as sns
2
3 # Calculate correlation
4 matrix corr = df . corr ()
5
# Plot the heatmap of correlations
6
sns . heatmap ( corr , annot = True , cmap = ’ coolwarm
7
’)
8
Tasks
1. Regression with Linear Regression: Use the Boston Housing dataset from scikit-
learn to perform linear regression and predict house prices. Evaluate the model
using Mean Squared Error (MSE).
1 from sklearn . datasets import load_boston
2 from sklearn . linear_model import LinearRegression
3 from sklearn . metrics import mean_squared_error
4 from sklearn . model_selection import train_test_split
5
6 # Load dataset
7 boston = load_boston ()
8 X = boston . data
9 y = boston . target
10
11 # Train - test split
12 X_train , X_test , y_train , y_test = train_test_split (X , y,
test_size =0.2 , random_state =42)
13
14 # Train the model
15 model = LinearRegression ()
16 model . fit ( X_train , y_train )
17
18 # Predict and evaluate
19 y_pred = model . predict ( X_test )
20 mse = mean_squared_error ( y_test , y_pred )
21 print ( f" Mean Squared Error : { mse }" )
22
2. Classification with Logistic Regression: Use the Iris dataset for classification
with Logistic Regression. Evaluate the model using accuracy and confusion matrix.
1
from sklearn . linear_model import LogisticRegression
2 from sklearn . metrics import accuracy_score ,
3 confusion_matrix from sklearn . model_selection import
4 train_test_split
1
2
5
1
3
6 ir is = datasets . load_iris ()
7 X = iris. data
8 y = iris. target
9
10 # Train - test split
11 X_train , X_test , y_train , y_test = train_test_split (X , y ,
test_size =0.3 , random_state =42)
12
13
# Train the model
model = LogisticRegression ( max_iter =200)
14
model . fit ( X_train , y_train )
15
16
# Predict and evaluate
17 y_pred = model . predict ( X_test )
18 print ( f" Accuracy : { accuracy_score ( y_test , y_pred )}" )
19 print ( f" Confusion Matrix : \ n { confusion_matrix ( y_test , y_pred
20
)}"
)
21
Tasks
1. Clustering with K-Means: Apply K-Means clustering on the Iris dataset and
visualize the clusters.
1
from sklearn . cluster import KMeans
2 import matplotlib . pyplot as plt
3
7
# Visualize the clusters
8 plt . scatter ( X [: , 0] , X [: , 1] , c = y_kmeans , cmap = ’ viridis ’)
9 plt . scatter ( kmeans . cluster_centers_ [: , 0] ,
10 kmeans . cluster_centers_ [: , 1] , s =200 , c = ’ red ’,
marker = ’x ’) plt . title ("K - Means Clustering ")
11
plt . xlabel (" Feature 1 ")
plt . ylabel (" Feature 2
12
") plt . show ()
13
14
15
1
5
5
X_pca = pca . fit_transform ( X )
6
13
Tasks
1. Model Evaluation with Cross-Validation: Apply cross-validation to evaluate
the performance of a classification model (e.g., SVM or Random Forest).
1
from sklearn . model_selection import cross_val_score
2 from sklearn . ensemble import Ran dom F ore st Cl assifie r
3
7
# Apply Grid Search CV
8 grid_search = Grid Search CV ( svm , param_grid , cv
9 =5) grid_search . fit ( X_train , y_train )
10
11
# Display best hyperparameters
print ( f" Best Hyperparameters : { grid_search . best_params_ }")
12
13
14
1
6