0% found this document useful (0 votes)
2 views

ML Foram

The document outlines various practical tasks in machine learning, including calculations of mean, median, and mode, data visualization techniques, and implementations of different algorithms such as Linear Regression, kNN, Decision Trees, SVM, ANN, and K-means clustering. It also covers ensemble methods like Bagging and Boosting on the Wisconsin Breast Cancer Dataset and dimension reduction using PCA. Each task includes code snippets and concludes with the evaluation of model performance.

Uploaded by

Foram Modi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML Foram

The document outlines various practical tasks in machine learning, including calculations of mean, median, and mode, data visualization techniques, and implementations of different algorithms such as Linear Regression, kNN, Decision Trees, SVM, ANN, and K-means clustering. It also covers ensemble methods like Bagging and Boosting on the Wisconsin Breast Cancer Dataset and dimension reduction using PCA. Each task includes code snippets and concludes with the evaluation of model performance.

Uploaded by

Foram Modi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Machine Learning 211310132057

Practical-1
TASK 1: Find and analyse mean, median and mode of given data.

Mean:
import numpy as np
data = [10, 20, 25, 23, 25, 40]
mean = np.mean(data)
print("Answer: ", mean)

Output:

Median:
import numpy as np
data = [10, 20, 25, 23, 25, 40]
median = np.median(data)
print("Answer: ", median)

Output:

Mode:
from scipy import stats
data = [10, 20, 25, 23, 25, 40]
mode = stats.mode(data)
print("Answer: ", mode)

Output:

1
Machine Learning 211310132057

TASK 2: Find mean, median and mode for displacement auto mpg.

Mean:
import pandas as pd
url="C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)
mean_displacement = df['displacement'].mean()
print("Answer", mean_displacement)

Output:

Median:
import pandas as pd
url="C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)
median_displacement = df['displacement'].median()
print("Answer", median_displacement)

Output:

Mode:
import pandas as pd
url="C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']

2
Machine Learning 211310132057

df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)


mode_displacement = df['displacement'].mode()
print("Answer", mode_displacement)

Output:

3
Machine Learning 211310132057

Practical-2
Understand structure of data using various visualization methods like
scatter plot, histogram and box plot.

TASK 1: Draw the scatter plot for displacement vs mpg.


import pandas as pd
import matplotlib.pyplot as plt
url = "C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)
plt.scatter(df['displacement'], df['mpg'])
plt.xlabel('displacement')
plt.ylabel('MPG')
plt.title('Scatter Plot of MPG vs Displacement')
plt.show()

GRAPH:

4
Machine Learning 211310132057

TASK 2: Draw the histogram for mpg.


import pandas as pd
import matplotlib.pyplot as plt
url = "C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)
plt.hist(df['mpg'], bins = 20, color='blue',edgecolor='black',alpha=0.7)
plt.title('Histogram of MPG')
plt.xlabel('MPG'),
plt.ylabel('Frequency')
plt.show()

GRAPH:

TASK 3: Draw the box plot for displacement.


import pandas as pd
import matplotlib.pyplot as plt
url = "C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)

5
Machine Learning 211310132057

plt.boxplot(df['displacement'])
plt.title('Boxplot of Displacement')
plt.ylabel('Frequency')
plt.show()

GRAPH:

6
Machine Learning 211310132057

Practical-3

Aim: Implement Linear regression on auto mpg dataset and evaluate its
performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'car_name']
data = pd.read_csv(url, names=column_names, delim_whitespace=True, na_values='?')
data = data.dropna()
X = data[['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year',
'origin']]
y = data['mpg']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
coefficients = pd.Series(model.coef_, index=X.columns)
coefficients_sorted = coefficients.abs().sort_values(ascending=False)
print("Enroll: 221313132004 Feature coefficients sorted by their influence
on MPG:")
print(coefficients_sorted)

Output:

Conclusion:
In this practical we have successfully implemented linear regression on autompg and we have
successfully evaluated its performance.

7
Machine Learning 211310132057

Practical-4

Aim: Implement kNN on the given dataset and evaluate its performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
iris_data = pd.read_csv(url, header=None, names=column_names)
X = iris_data.iloc[:, :-1].values
y = pd.factorize(iris_data['species'])[0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Enroll: 221313132004 Accuracy:
{accuracy:.2f}")

Output:

Conclusion:
In this practical, we have successfully implemented kNN on the dataset and evaluated it’s
performance.

8
Machine Learning 211310132057

Practical-5

Aim: Implement Decision Tree on the given dataset and evaluate its
performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']iris
= pd.read_csv(url, header=None, names=column_names)

X = iris.iloc[:, :-1]
y = pd.factorize(iris['species'])[0]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Decision Tree Classifier


myclassifier = DecisionTreeClassifier()
myclassifier.fit(X_train, y_train)

# Predict on the test set


y_pred = myclassifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Output accuracy
print(f"Enroll: 221313132004 Accuracy: {accuracy * 100:.2f}%")

9
Machine Learning 211310132057

Output:

Conclusion:
In this practical, we have successfully implemented decision tree of the dataset using Decision
Tree Classifier and also we have evaluated it’s performance.

10
Machine Learning 211310132057

Practical-6

Aim: Implement SVM on the given dataset and evaluate its performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

df = pd.read_csv(url, header=None, names=column_names)


df['species'] = df['species'].astype('category').cat.codes

X = df.drop(columns=['species'])
y = df['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)

y_pred = svm_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Enroll: 221313132004 Accuracy:
{accuracy:.2f}")

Output:

Conclusion:
In this practical, we have successfully implemented SVM using SVC and we have also
evaluated its performance.
11
Machine Learning 211310132057

Practical-7

Aim: Implement ANN on the given dataset and evaluate its performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

url='https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'

column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',


'acceleration', 'model_year', 'origin', 'car_name']

data = pd.read_csv(url, delim_whitespace=True, names=column_names)#

Drop 'car_name' because it's a string and not useful for prediction
data = data.drop('car_name', axis=1)

# Handle missing values (horsepower has missing values represented by '?')


data = data.replace('?', pd.NA)
data = data.dropna()

# 2. Prepare features (X) and target (y)


X = data.drop('mpg', axis=1)
y = data['mpg']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1)
test_loss = model.evaluate(X_test, y_test)
print(f"Test Loss (Mean Squared Error): {test_loss}")

12
Machine Learning 211310132057

Output:

Conclusion:
In this practical, we have successfully implemented ANN using tensorflow.keras importing
Sequential and Dense and also evaluated its performance.

13
Machine Learning 211310132057

Practical-8

Aim: Perform K-means clustering on the given dataset and evaluate its
performance.
Code:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score

# Load the Iris dataset


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
data = pd.read_csv(url, header=None)

# Extract features and true labels


X = data.iloc[:, :-1].values
y = pd.factorize(data.iloc[:, -1])[0] # Convert class labels to numeric

# Apply K-means clustering


kmeans = KMeans(n_clusters=3, random_state=42)
y_pred = kmeans.fit_predict(X)

# Evaluate accuracy (note: cluster numbers might differ from class labels)
accuracy = accuracy_score(y, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Output:

Conclusion:
K-means clustering on the Iris dataset effectively groups the data into three clusters,
corresponding closely to the three species. Although not a supervised method, it aligns well
with actual species labels, demonstrating that the dataset's features naturally separate into
meaningful clusters.

14
Machine Learning 211310132057

Practical-9

Aim: Read Wisconsin Breast Cancer Dataset (WBCD) and implement


various ensemble models on it.

• Bagging (Decision Tree)


Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the dataset


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-
wisconsin/wdbc.data"
columns = [
'ID', 'Diagnosis', 'Radius Mean', 'Texture Mean', 'Perimeter Mean',
'Area Mean', 'Smoothness Mean', 'Compactness Mean', 'Concavity Mean',
'Concave Points Mean', 'Symmetry Mean', 'Fractal Dimension Mean',
'Radius SE', 'Texture SE', 'Perimeter SE', 'Area SE',
'Smoothness SE', 'Compactness SE', 'Concavity SE', 'Concave
Points SE', 'Symmetry SE', 'Fractal Dimension SE','Radius
Worst', 'Texture Worst', 'Perimeter Worst',
'Area Worst', 'Smoothness Worst', 'Compactness Worst',
'Concavity Worst', 'Concave Points Worst', 'Symmetry Worst',
'Fractal Dimension Worst'
]

data = pd.read_csv(url, header=None, names=columns)


X = data.iloc[:, 2:].values # Features
y = (data['Diagnosis'] == 'M').astype(int) # Labels

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Bagging classifier


bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=50)
bagging.fit(X_train, y_train)

# Predict and calculate accuracy


y_pred = bagging.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Bagging Accuracy: {accuracy:.2f}")

15
Machine Learning 211310132057

Output:

• Boosting
Code:

Conclusion:
In task(i), we successfully implemented a Bagging ensemble model using a Decision Tree
Classifier on the Wisconsin Breast Cancer Dataset. The results showed that the Bagging model
achieved an accuracy of 95.61%. This high accuracy indicates that the Bagging ensemble
method effectively reduced variance and improved the stability of the Decision Tree Classifier.
In task(ii), we successfully implemented a Boosting ensemble model. The result shows that
boosting achieves an accuracy of 93.86%.

16
Machine Learning 211310132057

Practical-10

Aim: Apply various dimension reduction methods on WBCD and evaluate


their performance.
Code:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)y
= pd.Series (data.target, name='target')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pca = PCA (n_components=5)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train_pca, y_train)

y_pred = clf.predict(X_test_pca)

accuracy = accuracy_score (y_test, y_pred)


print(f"Accuracy: {accuracy * 100:.2f}%")

Output:

Conclusion:
Applying Principal Component Analysis (PCA) on the Wisconsin Breast Cancer Dataset
reduces the feature dimensions while retaining essential variance, achieving an accuracy of
95.61%. This shows that dimensionality reduction helps in improving model efficiency
without significant loss of performance.

17

You might also like