0% found this document useful (0 votes)
29 views

6 - Steps of The Classification Algorithm in Supervised Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

6 - Steps of The Classification Algorithm in Supervised Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Steps of the classification algorithm in supervised learning

Steps of the classification algorithm in supervised learning

In supervised learning, classification algorithms aim to predict the categorical label of an input based
on its features. The classification process typically involves the following steps:
1. Understanding the problem
2. Data Collection
3. Data Preprocessing
4. Split Data
5. Choose a Model
6. Model Training
7. Model Evaluation
8. Hyperparameter Tuning
9. Model Validation
10. Deployment
11. Monitoring and Maintenance
1. Understanding the problem

• Before getting started with classification, it is important to understand the problem you are
trying to solve.

• What class labels are you trying to predict?


• What is the relationship between the input data and the class labels?

• Suppose we have to predict whether a patient has a certain disease or not on the basis of
independent variables called features. This means there can be only two possible outcomes:

• The patient has the disease, which means “True”


• The patient has no disease. which means “False”

• This is a binary classification problem.


2. Data Collection

• Gather a dataset containing samples with features and their corresponding labels.
Ensure that the dataset represents the problem you're trying to solve.

• A lot of pre-collected data from sources such as Kaggle, UCI, and other repositories
is acquired during the 'Data Collection' step of the machine learning process. These
platforms provide a wide range of datasets for various machine-learning tasks,
facilitating research, experimentation, and model development."
3. Data Preprocessing:

Handle missing values: Either remove the samples with missing values or impute them
using techniques like mean, median, or interpolation.

Encode categorical variables: Convert categorical variables into numerical form using
techniques like one-hot encoding or label encoding.

Feature scaling: Normalize or standardize numerical features to ensure all features


contribute equally to the model's performance.
4. Split Data:

Divide the dataset into training and testing sets. The training set is used to train the
model, while the testing set evaluates its performance.
5. Choose a Model:

• Select an appropriate classification algorithm based on the nature of the


problem, dataset size, and other factors. Common algorithms include:
1. Logistic Regression
2. Decision Trees
3. Random Forest
4. Support Vector Machines (SVM)
5. k-Nearest Neighbors (kNN)
6. Neural Networks
6. Model Training:

Use the training data to train the selected model. The model learns the patterns and
relationships between features and labels during training.
7. Model Evaluation:

• Evaluate the trained model's performance on the testing set to assess its
generalization ability.

• Common evaluation metrics for classification include:

1. Accuracy
2. Precision
3. Recall, and
4. F1-score etc.,
8. Hyperparameter Tuning:

Fine-tune the model's hyperparameters to improve its performance. This can be done
using techniques like grid search, random search, or Bayesian optimization.
9. Model Validation:

• Validate the final model using techniques like cross-validation to ensure its
robustness and generalization ability.
10. Deployment:

Once satisfied with the model's performance, deploy it into production to make
predictions on new, unseen data.
11. Monitoring and Maintenance:

Continuously monitor the model's performance in the production environment and


update it as necessary to maintain its effectiveness over time.
Example: Steps of the classification algorithm in supervised learning

# Importing the required libraries # DECISION TREE CLASSIFIER


import numpy as np dt = DecisionTreeClassifier(random_state=0)
import pandas as pd # train the model
from sklearn.model_selection import train_test_split dt.fit(X_train, y_train)
from sklearn.metrics import accuracy_score,
# make predictions
precision_score, recall_score, f1_score
dt_pred = dt.predict(X_test)
from sklearn import datasets
# print the accuracy
from sklearn.tree import DecisionTreeClassifier
print("Accuracy of Decision Tree Classifier: ",
# import the iris dataset accuracy_score(y_test, dt_pred))
iris = datasets.load_iris() # print other performance metrics
X = iris.data print("Precision of Decision Tree Classifier: ",
y = iris.target precision_score(y_test, dt_pred, average='weighted'))
print("Recall of Decision Tree Classifier: ",
# splitting X and y into training and testing sets recall_score(y_test, dt_pred, average='weighted'))
X_train, X_test, y_train, y_test = train_test_split( print("F1-Score of Decision Tree Classifier: ",
X, y, test_size=0.2, random_state=1) f1_score(y_test, dt_pred, average='weighted'))
Output:

Accuracy of Decision Tree Classifier: 0.9666666666666667

Precision of Decision Tree Classifier: 0.9714285714285714

Recall of Decision Tree Classifier: 0.9666666666666667

F1-Score of Decision Tree Classifier: 0.9672820512820512

You might also like