0% found this document useful (0 votes)
7 views

Lab 6

Jj

Uploaded by

usernamenew2710
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lab 6

Jj

Uploaded by

usernamenew2710
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Hello World of Machine Learning

The best small project to start with on a new tool is the classification of iris flowers (e.g. the iris
dataset).

 Attributes are numeric so you have to figure out how to load and handle data.
 It is a classification problem, allowing you to practice with perhaps an easier type of supervised
learning algorithm.
 It is a multi-class classification problem (multi-nominal) that may require some specialized handling.
 It only has 4 attributes and 150 rows, meaning it is small and easily fits into memory (and a screen or
A4 page).
 All of the numeric attributes are in the same units and the same scale, not requiring any special scaling
or transforms to get started.

To do

1. Installing the Python and SciPy platform.


pip install numpy pandas matplotlib seaborn scikit-learn
2. Loading the dataset.
3. Summarizing the dataset.
 Dimensions of the dataset.
 Peek at the data itself.
 Statistical summary of all attributes.
 Breakdown of the data by the class variable.
4. Visualizing the dataset.
 Univariate plots to better understand each attribute.

 Multivariate plots to better understand the relationships between attributes.

5. Evaluating some algorithms.


 Separate out a validation dataset.
 Set-up the test harness to use 10-fold cross validation.
 Build multiple different models to predict species from flower measurements
 Select the best model.
test 6 different algorithms:

o Logistic Regression (LR)


o Linear Discriminant Analysis (LDA)
o K-Nearest Neighbors (KNN).
o Classification and Regression Trees (CART).
o Gaussian Naive Bayes (NB).
o Support Vector Machines (SVM).

6. Making some predictions.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# Step 2: Load the Iris Dataset


iris = load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
data['target'] = iris.target

# Step 3: Summarize the Dataset


# Check dimensions
print("Dataset Dimensions:", data.shape)

# Peek at the data


print("\nFirst 5 Rows of the Dataset:")
print(data.head())

# Statistical summary of all attributes


print("\nStatistical Summary:")
print(data.describe())

# Breakdown of the data by class variable


print("\nClass Distribution:")
print(data['target'].value_counts())

# Step 4: Visualize the Dataset


# Univariate Plots (Histograms)
data.hist(figsize=(10, 8))
plt.suptitle('Histogram of Each Feature')
plt.show()

# Multivariate Plots (Pairplot)


sns.pairplot(data, hue='target', markers=["o", "s", "D"])
plt.suptitle('Pairplot of Features', y=1.02)
plt.show()

# Step 5: Evaluate Algorithms


X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


kfold = KFold(n_splits=10, random_state=42, shuffle=True)
models = {
'Logistic Regression (LR)': LogisticRegression(max_iter=200),
'Linear Discriminant Analysis (LDA)': LinearDiscriminantAnalysis(),
'K-Nearest Neighbors (KNN)': KNeighborsClassifier(),
'Classification and Regression Trees (CART)': DecisionTreeClassifier(),
'Gaussian Naive Bayes (NB)': GaussianNB(),
'Support Vector Machines (SVM)': SVC()
}
results = {}
print("\nCross-Validation Results:")
for name, model in models.items():
cv_results = cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')
results[name] = cv_results.mean()
print(f"{name}: {cv_results.mean():.4f}")

# Step 6: Make Predictions


best_model_name = max(results, key=results.get)
best_model = models[best_model_name]
print(f"\nBest Model: {best_model_name}")
best_model.fit(X_train, y_train)
predictions = best_model.predict(X_test)
print("\nAccuracy Score on Test Set:")
print(accuracy_score(y_test, predictions))

print("\nClassification Report:")
print(classification_report(y_test, predictions))

You might also like