ML Foram
ML Foram
Practical-1
TASK 1: Find and analyse mean, median and mode of given data.
Mean:
import numpy as np
data = [10, 20, 25, 23, 25, 40]
mean = np.mean(data)
print("Answer: ", mean)
Output:
Median:
import numpy as np
data = [10, 20, 25, 23, 25, 40]
median = np.median(data)
print("Answer: ", median)
Output:
Mode:
from scipy import stats
data = [10, 20, 25, 23, 25, 40]
mode = stats.mode(data)
print("Answer: ", mode)
Output:
1
Machine Learning 211310132057
TASK 2: Find mean, median and mode for displacement auto mpg.
Mean:
import pandas as pd
url="C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)
mean_displacement = df['displacement'].mean()
print("Answer", mean_displacement)
Output:
Median:
import pandas as pd
url="C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
df = pd.read_csv(url, delim_whitespace=True, header=None,names=columns)
median_displacement = df['displacement'].median()
print("Answer", median_displacement)
Output:
Mode:
import pandas as pd
url="C:/Users/5FLAB2_7/Downloads/auto+mpg/auto-mpg.data"
columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'name']
2
Machine Learning 211310132057
Output:
3
Machine Learning 211310132057
Practical-2
Understand structure of data using various visualization methods like
scatter plot, histogram and box plot.
GRAPH:
4
Machine Learning 211310132057
GRAPH:
5
Machine Learning 211310132057
plt.boxplot(df['displacement'])
plt.title('Boxplot of Displacement')
plt.ylabel('Frequency')
plt.show()
GRAPH:
6
Machine Learning 211310132057
Practical-3
Aim: Implement Linear regression on auto mpg dataset and evaluate its
performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration',
'model_year', 'origin', 'car_name']
data = pd.read_csv(url, names=column_names, delim_whitespace=True, na_values='?')
data = data.dropna()
X = data[['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year',
'origin']]
y = data['mpg']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
coefficients = pd.Series(model.coef_, index=X.columns)
coefficients_sorted = coefficients.abs().sort_values(ascending=False)
print("Enroll: 221313132004 Feature coefficients sorted by their influence
on MPG:")
print(coefficients_sorted)
Output:
Conclusion:
In this practical we have successfully implemented linear regression on autompg and we have
successfully evaluated its performance.
7
Machine Learning 211310132057
Practical-4
Aim: Implement kNN on the given dataset and evaluate its performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
iris_data = pd.read_csv(url, header=None, names=column_names)
X = iris_data.iloc[:, :-1].values
y = pd.factorize(iris_data['species'])[0]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Enroll: 221313132004 Accuracy:
{accuracy:.2f}")
Output:
Conclusion:
In this practical, we have successfully implemented kNN on the dataset and evaluated it’s
performance.
8
Machine Learning 211310132057
Practical-5
Aim: Implement Decision Tree on the given dataset and evaluate its
performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']iris
= pd.read_csv(url, header=None, names=column_names)
X = iris.iloc[:, :-1]
y = pd.factorize(iris['species'])[0]
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
# Output accuracy
print(f"Enroll: 221313132004 Accuracy: {accuracy * 100:.2f}%")
9
Machine Learning 211310132057
Output:
Conclusion:
In this practical, we have successfully implemented decision tree of the dataset using Decision
Tree Classifier and also we have evaluated it’s performance.
10
Machine Learning 211310132057
Practical-6
Aim: Implement SVM on the given dataset and evaluate its performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
X = df.drop(columns=['species'])
y = df['species']
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Enroll: 221313132004 Accuracy:
{accuracy:.2f}")
Output:
Conclusion:
In this practical, we have successfully implemented SVM using SVC and we have also
evaluated its performance.
11
Machine Learning 211310132057
Practical-7
Aim: Implement ANN on the given dataset and evaluate its performance.
Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
url='https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
Drop 'car_name' because it's a string and not useful for prediction
data = data.drop('car_name', axis=1)
model = Sequential()
model.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=1)
test_loss = model.evaluate(X_test, y_test)
print(f"Test Loss (Mean Squared Error): {test_loss}")
12
Machine Learning 211310132057
Output:
Conclusion:
In this practical, we have successfully implemented ANN using tensorflow.keras importing
Sequential and Dense and also evaluated its performance.
13
Machine Learning 211310132057
Practical-8
Aim: Perform K-means clustering on the given dataset and evaluate its
performance.
Code:
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score
# Evaluate accuracy (note: cluster numbers might differ from class labels)
accuracy = accuracy_score(y, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Output:
Conclusion:
K-means clustering on the Iris dataset effectively groups the data into three clusters,
corresponding closely to the three species. Although not a supervised method, it aligns well
with actual species labels, demonstrating that the dataset's features naturally separate into
meaningful clusters.
14
Machine Learning 211310132057
Practical-9
15
Machine Learning 211310132057
Output:
• Boosting
Code:
Conclusion:
In task(i), we successfully implemented a Bagging ensemble model using a Decision Tree
Classifier on the Wisconsin Breast Cancer Dataset. The results showed that the Bagging model
achieved an accuracy of 95.61%. This high accuracy indicates that the Bagging ensemble
method effectively reduced variance and improved the stability of the Decision Tree Classifier.
In task(ii), we successfully implemented a Boosting ensemble model. The result shows that
boosting achieves an accuracy of 93.86%.
16
Machine Learning 211310132057
Practical-10
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)y
= pd.Series (data.target, name='target')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pca = PCA (n_components=5)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train_pca, y_train)
y_pred = clf.predict(X_test_pca)
Output:
Conclusion:
Applying Principal Component Analysis (PCA) on the Wisconsin Breast Cancer Dataset
reduces the feature dimensions while retaining essential variance, achieving an accuracy of
95.61%. This shows that dimensionality reduction helps in improving model efficiency
without significant loss of performance.
17