Lab Manual
Lab Manual
CVS and
excel files using pandas.
Steps:
Install Pandas: If Pandas is not installed, use !pip install
pandas to install it. This command is typically run in a Jupyter
notebook or a similar environment.
Import Pandas: Import the Pandas library to use its functionalities.
Load CSV File: Use pd.read_csv() to read the CSV file into a
DataFrame. Replace 'path/to/your/csvfile.csv' with the
actual path to your CSV file.
Load Excel File: Use pd.read_excel() to read the Excel file
into a DataFrame. Replace 'path/to/your/excelfile.xlsx'
with the actual path to your Excel file. You can also specify the sheet
name if the Excel file contains multiple sheets.
Explore Data: Use methods like .head() to display the first few
rows, .info() to get a concise summary of the DataFrame, and
.describe() to generate descriptive statistics.
Program:
Loading Data from CSV
import pandas as pd
df=pd.read_csv("IRIS.csv")
print(df)
df.head()
df.info()
df.describe()
1
print("Type of data:
{}".format(type(iris_dataset['data'])))
print("Shape of data:
{}".format(iris_dataset['data'].shape))
print("First five columns of
data:\n{}".format(iris_dataset['data'][:5]))
import pandas as pd
exc=pd.ExcelFile(r"The_Excel_File_Path")
e=pd.read_excel(exc,sheet_name=0)
print(e)
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB
2
Loading Data from the sklearn Library
<class 'sklearn.utils._bunch.Bunch'>
dict_items([('data', array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2] ]))
Target names: ['setosa' 'versicolor' 'virginica']
Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width
(cm)']
Type of data: <class 'numpy.ndarray'>
Shape of data: (150, 4)
First five columns of data:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]
3
2.Write a program to visualize the dataset to gain insights using
Matplotlib or Seaborn by plotting scatter plots, bar charts.
Steps:
Install Required Libraries: Install Pandas, Matplotlib, and Seaborn
if they are not already installed using !pip install pandas
matplotlib seaborn.
Import Libraries: Import Pandas for data manipulation, Matplotlib
for plotting, and Seaborn for enhanced visualization.
Load Data: Load your dataset using Pandas. Adjust the file path and
format (CSV or Excel) as needed.
Visualize Data:
df=pd.read_csv(R"D:\amara\IRIS.csv")
mean_measuerments=df.groupby('species').mean()
mean_measuerments.plot(kind='bar')
plt.xlabel('Species')
plt.ylabel('Mean Measuerments(cm)')
plt.title('Mean Measuerments by Species')
plt.show()
4
import seaborn as sns
sns.FacetGrid(df,hue="species").map(plt.scatter,"sepa
l_length","sepal_width") \
.add_legend()
plt.show()
5
3.Write a program to visualize the dataset to gain insights using
Matplotlib or Seaborn by plotting Histogram and Box plot using
excel file.
Steps:
Install Required Libraries: Install Pandas, Matplotlib, and Seaborn if they
are not already installed using !pip install pandas matplotlib
seaborn openpyxl.
Import Libraries: Import Pandas for data manipulation, Matplotlib for
plotting, and Seaborn for enhanced visualization.
Load Data: Load your Excel dataset using Pandas. Adjust the file path and
sheet name as needed.
Visualize Data:
Program:
import matplotlib.pyplot as plt
import pandas as pd
exc=pd.ExcelFile(r"D:\amara\ml lab.xlsx")
s1=pd.read_excel(exc,sheet_name=0)
bin_ed=[0,35,50,60,85,100]
plt.hist(s1.Avg,bins=bin_ed,color='g',label='Freaquen
cy',alpha=0.9,edgecolor='r',orientation='vertical',li
newidth=2)
plt.legend(loc=9)
6
plt.xlabel('Percentage',color='b',size=30)
plt.ylabel('BinRange',color='b',size=30)
plt.xticks(rotation=30,fontsize=15)
plt.margins(x=0,y=1)
plt.title("Histogram",color='b',size=30)
s1.boxplot(column='Avg',by='Scourse')
plt.show()
Output:
7
4.Write a program to implement confusion matrix.
Steps:
Install Required Libraries: Install Pandas, Scikit-Learn, Matplotlib, and
Seaborn if they are not already installed using !pip install pandas
scikit-learn matplotlib seaborn.
Import Libraries: Import Pandas for data manipulation, Scikit-Learn for
machine learning, and Matplotlib and Seaborn for plotting.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Preprocess Data:
Split Data: Use train_test_split to split the data into training and
testing sets. test_size=0.2 means 20% of the data is used for testing.
Train a Classifier: Initialize and train a classifier (e.g., Decision Tree) using
the training data.
Predict and Compute Confusion Matrix: Use the trained model to predict
the target values for the test set and compute the confusion matrix using
confusion_matrix.
Visualize Confusion Matrix: Use Seaborn's heatmap to visualize the confu
sion matrix. Customize the plot with titles and labels.
Program:
import numpy as np
from sklearn.metrics import
confusion_matrix,classification_report
import seaborn as sns
import matplotlib.pyplot as plt
actual=np.array(['Dog','Dog','Dog','Not Dog','Not
Dog',])
predicted=np.array(['Dog','Dog','Not Dog','Not
Dog','Not Dog'])
cm=confusion_matrix(actual,predicted)
sns.heatmap(cm,
annot=True,
8
fmt='g',
xticklabels=['Dog','Not Dog'],
yticklabels=['Dog','Not Dog'])
plt.ylabel('prediction',fontsize=13)
plt.xlabel('Actual',fontsize=13)
plt.title('Confusion Matrix',fontsize=17)
plt.show()
print(classification_report(actual,predicted))
Output:
accuracy 0.80 5
macro avg 0.83 0.83 0.80 5
weighted avg 0.87 0.80 0.80 5
9
5.Write a program to Handle missing data, encode categorical
variables, and perform feature scaling.
Steps:
Install Required Libraries: Install Pandas and Scikit-Learn if they are not
already installed using !pip install pandas scikit-learn.
Import Libraries: Import Pandas for data manipulation and Scikit-Learn for
preprocessing.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Handle Missing Data:
Feature Scaling:
Program:
import pandas as pd
from sklearn.impute import SimpleImputer
print("Original DataFrame:")
print(df)
10
print("\nMissing values in each column:")
print(df.isnull().sum())
num_imputer=SimpleImputer(strategy='mean')
df[['year', 'price', 'mileage']] =
num_imputer.fit_transform \
(df[['year', 'price', 'mileage']])
cat_imputer=SimpleImputer(strategy='most_frequent')
df[['make', 'model']] =
cat_imputer.fit_transform(df[['make', 'model']])
Output:
Original DataFrame:
make model year price mileage
0 Toyota Corolla 2010.0 7000.0 75000.0
1 Ford F150 2015.0 NaN 50000.0
2 Honda Civic 2018.0 15000.0 NaN
3 Toyota Camry 2012.0 9000.0 60000.0
4 NaN NaN 2016.0 10000.0 40000.0
5 Chevrolet Malibu 2014.0 12000.0 30000.0
6 Ford Focus NaN 8500.0 45000.0
7 Nissan Altima 2013.0 9500.0 55000.0
8 Honda Accord 2017.0 NaN 70000.0
9 Toyota Highlander 2019.0 25000.0 15000.0
11
6.Write a program to implement a decision tree classifier using sciki
t-learn and visualize the decision tree and understand its splits.
Steps:
Install Required Libraries: Install Pandas, Scikit-Learn, and Matplotlib if
they are not already installed using !pip install pandas scikit-
learn matplotlib.
Import Libraries: Import Pandas for data manipulation, Scikit-Learn for
machine learning, and Matplotlib for plotting.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Preprocess Data:
Split Data: Use train_test_split to split the data into training and
testing sets. test_size=0.2 means 20% of the data is used for testing.
Train Decision Tree Classifier: Initialize and train the decision tree classifier
using the training data.
Visualize the Decision Tree: Use plot_tree to visualize the decision tree.
Customize the visualization with parameters like filled, feature_names,
and class_names.
Evaluate Performance:
Program:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
play1=pd.read_csv(r"D:\amara\ML lab
DataSets\PlayTennis.csv")
print(play1)
from sklearn.preprocessing import LabelEncoder
12
outlook=LabelEncoder()
temp=LabelEncoder()
humidity=LabelEncoder()
windy=LabelEncoder()
play=LabelEncoder()
Le=LabelEncoder()
play1['outlook']=Le.fit_transform(play1['outlook'])
play1['temp']=Le.fit_transform(play1['temp'])
play1['humidity']=Le.fit_transform(play1['humidity'])
play1['windy']=Le.fit_transform(play1['windy'])
play1['play']=Le.fit_transform(play1['play'])
features_cols=['outlook','temp','humidity','windy']
x=play1[features_cols]
y=play1.play
x_train,x_test,y_train,y_test=train_test_split(x,y,te
st_size=0.2,random_state=0)
print(x_train)
print(x_test)
print(y_train)
print(y_test)
from sklearn.tree import plot_tree
from sklearn.tree import export_text
clf=DecisionTreeClassifier(criterion="gini")
clf=clf.fit(x_train,y_train)
print(clf)
plot_tree(clf)
Output:
outlook temp humidity windy play
0 sunny hot high False no
1 sunny hot high True no
2 overcast hot high False yes
3 rainy mild high False yes
4 rainy cool normal False yes
5 rainy cool normal True no
6 overcast cool normal True yes
7 sunny mild high False no
8 sunny cool normal False yes
9 rainy mild normal False no
10 sunny mild normal True yes
11 overcast mild high True no
12 overcast hot normal False yes
13 rainy mild high True no
13
outlook temp humidity windy
11 0 2 0 1
2 0 1 0 0
13 1 2 0 1
9 1 2 1 0
1 2 1 0 1
7 2 2 0 0
10 2 2 1 1
3 1 2 0 0
0 2 1 0 0
5 1 0 1 1
12 0 1 1 0
outlook temp humidity windy
8 2 0 1 0
6 0 0 1 1
4 1 0 1 0
11 0
2 1
13 0
9 0
1 0
7 0
10 1
3 1
0 0
5 0
12 1
Name: play, dtype: int32
8 1
6 1
4 1
Name: play, dtype: int32
DecisionTreeClassifier()
[Text(0.36363636363636365, 0.9166666666666666, 'x[0] <= 0.5\ngini = 0.463\n
samples = 11\nvalue = [7, 4]'),
Text(0.18181818181818182, 0.75, 'x[3] <= 0.5\ngini = 0.444\nsamples = 3\nv
alue = [1, 2]'),
Text(0.09090909090909091, 0.5833333333333334, 'gini = 0.0\nsamples = 2\nva
lue = [0, 2]'),
Text(0.2727272727272727, 0.5833333333333334, 'gini = 0.0\nsamples = 1\nval
ue = [1, 0]'),
Text(0.5454545454545454, 0.75, 'x[1] <= 1.5\ngini = 0.375\nsamples = 8\nva
lue = [6, 2]'),
Text(0.45454545454545453, 0.5833333333333334, 'gini = 0.0\nsamples = 3\nva
lue = [3, 0]'),
Text(0.6363636363636364, 0.5833333333333334, 'x[2] <= 0.5\ngini = 0.48\nsa
mples = 5\nvalue = [3, 2]'),
Text(0.45454545454545453, 0.4166666666666667, 'x[0] <= 1.5\ngini = 0.444\n
samples = 3\nvalue = [2, 1]'),
Text(0.36363636363636365, 0.25, 'x[3] <= 0.5\ngini = 0.5\nsamples = 2\nval
ue = [1, 1]'),
Text(0.2727272727272727, 0.08333333333333333, 'gini = 0.0\nsamples = 1\nva
lue = [0, 1]'),
Text(0.45454545454545453, 0.08333333333333333, 'gini = 0.0\nsamples = 1\nv
alue = [1, 0]'),
Text(0.5454545454545454, 0.25, 'gini = 0.0\nsamples = 1\nvalue = [1, 0]'),
14
Text(0.8181818181818182, 0.4166666666666667, 'x[3] <= 0.5\ngini = 0.5\nsam
ples = 2\nvalue = [1, 1]'),
Text(0.7272727272727273, 0.25, 'gini = 0.0\nsamples = 1\nvalue = [1, 0]'),
Text(0.9090909090909091, 0.25, 'gini = 0.0\nsamples = 1\nvalue = [0, 1]')]
15
7.Write a program to implement a k-Nearest Neighbours (k-NN) cla
ssifier using scikit- learn and Train the classifier on the dataset and
evaluate its performance.
Steps:
Install Required Libraries: Install Pandas and Scikit-Learn if they are not
already installed using !pip install pandas scikit-learn.
Import Libraries: Import Pandas for data manipulation and Scikit-Learn for
machine learning.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Preprocess Data:
Split Data: Use train_test_split to split the data into training and
testing sets. test_size=0.2 means 20% of the data is used for testing.
Train k-NN Classifier: Initialize and train the k-NN classifier using the
training data. n_neighbors=5 specifies the number of neighbors to use.
Evaluate Performance:
Program:
import pandas as pd
import numpy as np
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
print("Setup Complete")
df=pd.read_csv(R"D:\amara\ML lab DataSets\IRIS.csv")
16
print(df.head())
print("Column names:",df.columns)
feature_columns=df.columns[:-1]
target_column=df.columns[-1]
X=df[feature_columns]
y=df[target_column]
X_train, X_test,
y_train,y_test=train_test_split(X,y,test_size=0.2,ran
dom_state=42)
17
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
(120, 4)
(30, 4)
(120,)
(30,)
Accuracy:1.0
Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]
1.0
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
18
8.Write a program to implement a linear regression model for regre
ssion tasks and Train the model on a dataset with continuous target
variables.
Steps:
Install Required Libraries: Install Pandas and Scikit-Learn if they are not
already installed using !pip install pandas scikit-learn.
Import Libraries: Import Pandas for data manipulation and Scikit-Learn for
machine learning.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Preprocess Data:
Split Data: Use train_test_split to split the data into training and
testing sets. test_size=0.2 means 20% of the data is used for testing.
Train Linear Regression Model: Initialize and train the linear regression
model using the training data.
Evaluate Performance:
Program:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
19
# Split the dataset into features (X) and target (Y)
X = dataset.iloc[:, :-1].values
Y = dataset.iloc[:, 1].values
Output:
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0
10 3.9 63218.0
11 4.0 55794.0
12 4.0 56957.0
13 4.1 57081.0
14 4.5 61111.0
15 4.9 67938.0
16 5.1 66029.0
17 5.3 83088.0
18 5.9 81363.0
19 6.0 93940.0
20 6.8 91738.0
21 7.1 98273.0
22 7.9 101302.0
23 8.2 113812.0
24 8.7 109431.0
25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0
Slope: [9345.94244312]
Y-intercept: 26816.19224403119
21
[-3104.10590871 -688.39940819 -8053.55626083 -47.36777221
1366.35454631 1305.1085008 -3902.23969801 -8405.96201652
6738.31280742 652.8624553 ]
22
9.Write a program to Implement K-Means clustering and Visualize c
lusters.
Steps:
Install Required Libraries: Install Pandas, Scikit-Learn, and Matplotlib if
they are not already installed using !pip install pandas scikit-
learn matplotlib.
Import Libraries: Import Pandas for data manipulation, Scikit-Learn for
machine learning, and Matplotlib for plotting.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Preprocess Data:
Train K-Means Model: Initialize and train the K-Means model using the
preprocessed data. Set n_clusters to the number of clusters you want to form.
Visualize Clusters:
Program:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
data=df.iloc[:,:-1]
kmeans=KMeans(n_clusters=3,random_state=0)
clusters=kmeans.fit_predict(data)
df["cluster"]=clusters
plt.figure(figsize=(12,5))
23
plt.subplot(1,2,1)
plt.scatter(data.iloc[:,2],data.iloc[:,1],
c=df['cluster'],cmap='viridis')
plt.xlabel('Sepal length (cm)')
plt.ylabel('Sepal width (cm)')
plt.title('K-means Clustering (Sepal)')
plt.subplot(1, 2, 2)
plt.scatter(data.iloc[:, 2], data.iloc[:, 3],
c=df['cluster'], cmap='viridis')
plt.xlabel('Petal length (cm)')
plt.ylabel('Petal width (cm)')
plt.title('K-means Clustering (Petal)')
plt.tight_layout()
plt.show()
Output:
24
10.Write a program to implement image segmentation using KMeans clusteri-
ng algorithm.
Steps:
Install Required Libraries: Install OpenCV, Scikit-Learn, Matplotlib, and
NumPy if they are not already installed using !pip install opencv-
python-headless scikit-learn matplotlib numpy.
Import Libraries: Import OpenCV for image processing, Scikit-Learn for
KMeans clustering, Matplotlib for plotting, and NumPy for numerical operations.
Load Image: Load the image using OpenCV and convert it to RGB format.
Preprocess Image: Reshape the image into a 2D array where each row is a
pixel and each column is a color channel. Convert pixel values to float32 for
clustering.
Apply KMeans Clustering: Initialize and fit the KMeans model to the pixel
data. Set the number of clusters k to the desired number of segments.
Post-process and Display Segmented Image:
Map each pixel in the image to the color of its corresponding cluster center.
Reshape the clustered pixel labels back to the original image shape.
Display the original and segmented images side by side using Matplotlib.
Program:
import numpy as np
import matplotlib.pyplot as plt
import cv2
from sklearn.cluster import KMeans
25
# Initialize the KMeans model
kmeans = KMeans(n_clusters=2, random_state=42) #
Specify the number of clusters (2 for binary
segmentation)
plt.subplot(1, 3, 1)
plt.imshow(image)
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 3, 2)
plt.imshow(segmented_image)
plt.title('Segmented Image (K=2)')
plt.axis('off')
plt.subplot(1, 3, 3)
plt.imshow(gray_segmented_image, cmap='gray')
plt.title('Segmented Image in Grayscale')
26
plt.axis('off')
plt.tight_layout()
plt.show()
Output:
27
11.Write a program to implement Naïve Bayes classifier algorithm.
Steps:
Load the Dataset: Prepare the dataset for training and testing.
Calculate Prior Probabilities: Compute the prior probabilities for each
class.
Calculate Likelihood: Compute the likelihood (conditional probability)
for each feature given each class.
Calculate Posterior Probabilities: Use Bayes' theorem to compute the
posterior probabilities for each class given the input features.
Predict the Class: Choose the class with the highest posterior
probability.
Evaluate the Model: Assess the performance of the model using a test
dataset.
Program:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import GaussianNB
df=pd.read_csv(R"D:\amara\ML lab DataSets\weather.csv")
df
Numerics=LabelEncoder()
inputs=df.drop("Play",axis="columns")
target=df["Play"]
target
inputs['outlook_n']=Numerics.fit_transform(inputs['Outlook'])
inputs['Temp_n']=Numerics.fit_transform(inputs['Temp'])
inputs['Humidity_n']=Numerics.fit_transform(inputs['Humidity']
)
inputs['Windy_n']=Numerics.fit_transform(inputs['Windy'])
inputs
inputs_n=inputs.drop(["Outlook","Temp","Humidity","Windy"],axi
s="columns")
inputs_n
Classifier=GaussianNB()
Classifier.fit(inputs_n,target)
Classifier.score(inputs_n,target)
Classifier.predict([[0,0,0,1]])
accuracy=Classifier.score(inputs_n,target)
print(f"Accuracy of Naive Bayes Classifier: {accuracy:.2f}" )
prediction=Classifier.predict([[0,0,0,1]])
print(f"Prediction: {prediction}")
28
Output:
0 no
1 no
2 yes
3 yes
4 yes
5 no
6 yes
7 no
8 yes
9 yes
10 yes
Name: Play, dtype: object
29
0.8571428571428571
array(['yes'], dtype='<U3')
Prediction: ['yes']
30
12.Write a Program to implement approach for agglomerative clust-
ering.
Steps:
Install Required Libraries: Install Pandas, Scikit-Learn, Matplotlib, and
SciPy if they are not already installed using !pip install pandas
scikit-learn matplotlib scipy.
Import Libraries: Import Pandas for data manipulation, Scikit-Learn for
clustering, Matplotlib for plotting, and SciPy for hierarchical clustering
dendrograms.
Load Data: Load your dataset using Pandas. Adjust the file path and format
(CSV or Excel) as needed.
Preprocess Data:
Visualize Clusters:
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram,
linkage
from sklearn.cluster import AgglomerativeClustering
31
# Convert the DataFrame to a NumPy array
distance_matrix = df.values
Output:
32