ML LabManual (1)
ML LabManual (1)
(Affiliated to Bangalore University and NAAC Accredited with ‘A’ Grade) Dr. Vishnuvardhan Road,
Channasandra, R R Nagara, Bengaluru – 560 098
LIST OF PROGRAMS
1. Install and set up Python and essential libraries like NumPy and pandas
Installation of Python
Pandas is a very popular library for working with data (its goal is to be the most powerful and
flexible open source tool, and in our opinion, it has reached that goal). DataFrames are at the
center of pandas. A DataFrame is structured like a table or spreadsheet. The rows and the
columns both have indexes, and can perform operations on rows or columns separately.
Features
Rather than focusing on loading, manipulating and summarizing data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn
are as follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like
Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-
learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular unsupervised
learning algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to
unsupervised neural networks.
Clustering − This model is used for grouping unlabeled data.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface in
Python
Install scikit-learn using pip: Open your terminal or command prompt and run the following
command:
pip install -U scikit-learn
import pandas as pd
def load_data(file):
if file.endswith('.csv'):
df = pd.read_csv(file)
elif file.endswith('.xlsx'):
df = pd.read_excel(file)
else:
print("Unsupported file format. Please provide a CSV or Excel file.")
return
print("Dataset information:")
print(df.info())
print("\nTop rows of the dataset:")
print(df.head(1))
file = 'train.csv'
# Change this to the path of your CSV or Excel file
load_data(file)
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn by plotting scatter plots, and bar charts.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('train.csv')
df.head(2)
#plotting pairchart
sns.pairplot(df,hue='Survived')
sns.set_theme(style="darkgrid")
plt.show()
sns.countplot(x='Survived',data=df,hue = 'Sex')
#Replacing the missing values in the “Age” column with the mean value
df['Age'].fillna(df['Age'].mean(), inplace=True)
#Replacing the missing values in the “Age” column with the mode value
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df.isnull().sum().sum()
output:
0
df.info()
#FEATURE SCALING
#spliting input and output
X = df.drop(columns = ['Survived'],axis=1)
y=df['Survived']
X.head()
#Train-test split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)
#importing iris dataset from sklearn and spliting input and output
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
#Train-test split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=0)
#Implementing Knn Classifier model
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)
y_pred = knn_model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))
#Reading dataset
df=pd.read_csv('Boston.csv')
df.head(3)
#spliting input and output
X = df.drop(columns = ['medv'],axis=1)
y=df['medv']
#Train-test split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)
#Fitting model
model.fit(X_train, y_train)
#Prediction
y_pred = model.predict(X_test)
#importing iris dataset from sklearn and spliting input and output
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
#Train-test split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)
#Checking accuracy
acc=accuracy_score(y_test,y_pred)
print("Accuracy of model=", acc)
Output:
Accuracy: 1.0
#Visualizing decision tree
plt.figure(figsize=(12, 8))
plot_tree(dtc, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True)
plt.show()
10. Write a program to Implement K-Means clustering and Visualize clusters
make_blobs is a synthetic data generator, especially useful for clustering and classification
algorithms. It generates isotropic Gaussian blobs. An isotropic Gaussian blob essentially
means that the data points are distributed in a circular (spherical, for multi-dimensional data)
shape around the centroid.