0% found this document useful (0 votes)
2 views

ml unit 3

Classification algorithms in machine learning are supervised learning techniques used to categorize new observations based on training data, distinguishing between binary and multi-class classifiers. Key evaluation methods include log loss, confusion matrix, and AUC-ROC curve, while decision trees serve as a fundamental model for both classification and regression tasks. Random Forest enhances prediction accuracy by aggregating multiple decision trees, effectively handling missing data and reducing overfitting.

Uploaded by

Suhaana Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ml unit 3

Classification algorithms in machine learning are supervised learning techniques used to categorize new observations based on training data, distinguishing between binary and multi-class classifiers. Key evaluation methods include log loss, confusion matrix, and AUC-ROC curve, while decision trees serve as a fundamental model for both classification and regression tasks. Random Forest enhances prediction accuracy by aggregating multiple decision trees, effectively handling missing data and reducing overfitting.

Uploaded by

Suhaana Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MACHINE LEARNING UNIT 3

Classification Algorithm in Machine Learning

As we know, the Supervised Machine Learning algorithm can be broadly classified into Regression and Classification
Algorithms. In Regression algorithms, we have predicted the output for continuous values, but to predict the
categorical values, we need Classification algorithms.

What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations
on the basis of training data. In Classification, a program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or
dog, etc. Classes can be called as targets/labels or categories.

Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or
animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data,
which means it contains input with the corresponding output.

In classification algorithm, a discrete output function(y) is mapped to input variable(x).


1. y=f(x), where y = categorical output

The best example of an ML classification algorithm is Email Spam Detector. The main goal of the Classification
algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for
the categorical data. Classification algorithms can be better understood using the below diagram. In the below diagram,
there are two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to
other classes.

The algorithm which implements the classification on a dataset is known as a classifier. There are two types of
Classifications:
o Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class
Classifier.
Example: Classifications of types of crops, Classification of types of music.

Learners in Classification Problems:

In the classification problems, there are two types of learners:


1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In
Lazy learner case, classification is done on the basis of the most related data stored in the training dataset. It
takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training dataset before receiving a
test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Na�ve Bayes, ANN.

Types of ML Classification Algorithms:

Classification Algorithms can be further divided into the Mainly two category:
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Na�ve Bayes
o Decision Tree Classification
o Random Forest Classification

Evaluating a Classification model:

Once our model is completed, it is necessary to evaluate its performance; either it is a Classification or Regression
model. So for evaluating a Classification model, we have the following ways:

1. Log Loss or Cross-Entropy Loss:


o It is used for evaluating the performance of a classifier, whose output is a probability value between the 0 and
1.
o For a good binary Classification model, the value of log loss should be near to 0.
o The value of log loss increases if the predicted value deviates from the actual value.
o The lower log loss represents the higher accuracy of the model.
o For Binary classification, cross-entropy can be calculated as:
1. ?(ylog(p)+(1?y)log(1?p))
Where y= Actual output, p= predicted output.

2. Confusion Matrix:
o The confusion matrix provides us a matrix/table as output and describes the performance of the model.
o It is also known as the error matrix.
o The matrix consists of predictions result in a summarized form, which has a total number of correct predictions
and incorrect predictions. The matrix looks like as below table:
o Actual Positive Actual Negative
Predicted Positive True Positive False Positive
Predicted Negative False Negative True Negative

3. AUC-ROC curve:
o ROC curve stands for Receiver Operating Characteristics Curve and AUC stands for Area Under the Curve.
o It is a graph that shows the performance of the classification model at different thresholds.
o To visualize the performance of the multi-class classification model, we use the AUC-ROC Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and FPR(False Positive
Rate) on X-axis.

Use cases of Classification Algorithms:

Classification algorithms can be used in different places. Below are some popular use cases of Classification Algorithms:
o Email Spam Detection
o Speech Recognition
o Identifications of Cancer tumor cells.
o Drugs Classification
o Biometric Identification, etc.

Decision Tree Regression and Classification

Decision Tree:

Decision tree is a simple diagram that shows different choices and their possible results helping you make decisions
easily. This article is all about what decision trees are, how they work, their advantages and disadvantages and their
applications.

Understanding Decision Tree:

A decision tree is a graphical representation of different options for solving a problem and show how different factors
are related. It has a hierarchical tree structure starts with one main question at the top called a node which further
branches out into different possible outcomes where:

• Root Node is the starting point that represents the entire dataset.
• Branches: These are the lines that connect nodes. It shows the flow from one decision to another.
• Internal Nodes are Points where decisions are made based on the input features.
• Leaf Nodes: These are the terminal nodes at the end of branches that represent final outcomes or predictions

Decision Tree Structure

They also support decision-making by visualizing outcomes. You can quickly evaluate and compare the “branches” to
determine which course of action is best for you.

Now, let’s take an example to understand the decision tree. Imagine you want to decide whether to drink coffee based
on the time of day and how tired you feel. First the tree checks the time of day—if it’s morning it asks whether you are
tired. If you’re tired the tree suggests drinking coffee if not it says there’s no need. Similarly in the afternoon the tree
again asks if you are tired. If you recommends drinking coffee if not it concludes no coffee is needed.
Classification of Decision Tree:

We have mainly two types of decision tree based on the nature of the target variable: classification
trees and regression trees.
• Classification trees: They are designed to predict categorical outcomes means they classify data into different
classes. They can determine whether an email is “spam” or “not spam” based on various features of the email.
• Regression trees : These are used when the target variable is continuous It predict numerical values rather
than categories. For example a regression tree can estimate the price of a house based on its size, location,
and other features.

How Decision Trees Work?

A decision tree working starts with a main question known as the root node. This question is derived from the features
of the dataset and serves as the starting point for decision-making.

From the root node, the tree asks a series of yes/no questions. Each question is designed to split the data into subsets
based on specific attributes. For example if the first question is “Is it raining?”, the answer will determine which branch
of the tree to follow. Depending on the response to each question you follow different branches. If your answer is
“Yes,” you might proceed down one path if “No,” you will take another path.

This branching continues through a sequence of decisions. As you follow each branch, you get more questions that
break the data into smaller groups. This step-by-step process continues until you have no more helpful questions .
You reach at the end of a branch where you find the final outcome or decision. It could be a classification (like “spam”
or “not spam”) or a prediction (such as estimated price).

Advantages of Decision Trees:

• Simplicity and Interpretability: Decision trees are straightforward and easy to understand. You can visualize
them like a flowchart which makes it simple to see how decisions are made.
• Versatility: It means they can be used for different types of tasks can work well for
both classification and regression
• No Need for Feature Scaling: They don’t require you to normalize or scale your data.
• Handles Non-linear Relationships: It is capable of capturing non-linear relationships between features and
target variables.
Disadvantages of Decision Trees:

• Overfitting: Overfitting occurs when a decision tree captures noise and details in the training data and it
perform poorly on new data.
• Instability: instability means that the model can be unreliable slight variations in input can lead to significant
differences in predictions.
• Bias towards Features with More Levels: Decision trees can become biased towards features with many
categories focusing too much on them during decision-making. This can cause the model to miss out other
important features led to less accurate predictions .

Applications of Decision Trees:

• Loan Approval in Banking: A bank needs to decide whether to approve a loan application based on customer
profiles.
o Input features include income, credit score, employment status, and loan history.
o The decision tree predicts loan approval or rejection, helping the bank make quick and reliable
decisions.
• Medical Diagnosis: A healthcare provider wants to predict whether a patient has diabetes based on clinical
test results.
o Features like glucose levels, BMI, and blood pressure are used to make a decision tree.
o Tree classifies patients into diabetic or non-diabetic, assisting doctors in diagnosis.
• Predicting Exam Results in Education : School wants to predict whether a student will pass or fail based on
study habits.
o Data includes attendance, time spent studying, and previous grades.
o The decision tree identifies at-risk students, allowing teachers to provide additional support.

Decision Tree Regression:

When it comes to predicting continuous values, Decision Tree Regression is a powerful and intuitive machine learning
technique. Unlike traditional linear regression, which assumes a straight-line relationship between input features and
the target variable, Decision Tree Regression is a non-linear regression method that can handle complex datasets
with intricate patterns. It uses a tree-like model to make predictions, making it both flexible and easy to interpret.

In this article, we’ll explore what Decision Tree Regression is, how it works, and why it’s a popular choice for regression
tasks. We’ll also walk through its implementation using Python’s scikit-learn library, making it easy for you to get
started with this versatile algorithm.

Working of Decision Tree Regression:

Decision Tree Regression predicts continuous values. It does this by splitting the data into smaller subsets based on
decision rules derived from the input features. Each split is made to minimize the error in predicting the target variable.
At leaf node of the tree the model predicts a continuous value which is typically the average of the target values in
that node.
The branches or edges represent the result of the node and the nodes have either:
1. Conditions (Decision Nodes): The diamond-shaped nodes represent conditions or decision points. They ask a
question about the input data (e.g., “Is feature X greater than 5?”).
2. Result (leaf Nodes): The rectangular boxes represent the final outcomes or results. In regression, the leaf node
contains the predicted continuous value.
Workflow of Decision Tree Regression

Branches connect nodes and represent the outcome of a decision. For example, if the answer to a condition is “Yes,”
you follow one branch; if “No,” you follow another.

For example, we want to predict house prices based on factors like size, location and age. A Decision Tree Regressor
can split the data based on these features such as checking the location first, then the size and finally the age. This way
it can accurately predicts the price by considering the most impactful factors first making it useful and easy to interpret.

Random Forest Algorithm in Machine Learning

Random Forest classification:

A Random Forest is a collection of decision trees that work together to make predictions. In this article, we'll explain
how the Random Forest algorithm works and how to use it.

Understanding Intuition for Random Forest Algorithm:

Random Forest algorithm is a powerful tree learning technique in Machine Learning to make predictions and then we
do voting of all the tress to make prediction. They are widely used for classification and regression task.
• It is a type of classifier that uses many decision trees to make predictions.
• It takes different random parts of the dataset to train each tree and then it combines the results by averaging
them. This approach helps improve the accuracy of predictions. Random Forest is based on ensemble learning.

Imagine asking a group of friends for advice on where to go for vacation. Each friend gives their recommendation based
on their unique perspective and preferences (decision trees trained on different subsets of data). You then make your
final decision by considering the majority opinion or averaging their suggestions (ensemble prediction).
As explained in image: Process starts with a dataset with rows and their corresponding class labels (columns).
• Then - Multiple Decision Trees are created from the training data. Each tree is trained on a random subset of
the data (with replacement) and a random subset of features. This process is known as bagging or bootstrap
aggregating.
• Each Decision Tree in the ensemble learns to make predictions independently.
• When presented with a new, unseen instance, each Decision Tree in the ensemble makes a prediction.

The final prediction is made by combining the predictions of all the Decision Trees. This is typically done through a
majority vote (for classification) or averaging (for regression).

Key Features of Random Forest:

• Handles Missing Data: Automatically handles missing values during training, eliminating the need for manual
imputation.
• Algorithm ranks features based on their importance in making predictions offering valuable insights for
feature selection and interpretability.
• Scales Well with Large and Complex Data without significant performance degradation.
• Algorithm is versatile and can be applied to both classification tasks (e.g., predicting categories) and regression
tasks (e.g., predicting continuous values).

How Random Forest Algorithm Works?

The random Forest algorithm works in several steps:


• Random Forest builds multiple decision trees using random samples of the data. Each tree is trained on a
different subset of the data which makes each tree unique.
• When creating each tree the algorithm randomly selects a subset of features or variables to split the data
rather than using all available features at a time. This adds diversity to the trees.
• Each decision tree in the forest makes a prediction based on the data it was trained on. When making final
prediction random forest combines the results from all the trees.
o For classification tasks the final prediction is decided by a majority vote. This means that the category
predicted by most trees is the final prediction.
o For regression tasks the final prediction is the average of the predictions from all the trees.
• The randomness in data samples and feature selection helps to prevent the model from overfitting making
the predictions more accurate and reliable.

Assumptions of Random Forest:

• Each tree makes its own decisions: Every tree in the forest makes its own predictions without relying on others.
• Random parts of the data are used: Each tree is built using random samples and features to reduce mistakes.
• Enough data is needed: Sufficient data ensures the trees are different and learn unique patterns and variety.
• Different predictions improve accuracy: Combining the predictions from different trees leads to a more
accurate final results.

Advantages of Random Forest:

• Random Forest provides very accurate predictions even with large datasets.
• Random Forest can handle missing data well without compromising with accuracy.
• It doesn’t require normalization or standardization on dataset.
• When we combine multiple decision trees it reduces the risk of overfitting of the model.

Limitations of Random Forest:

• It can be computationally expensive especially with a large number of trees.


• It’s harder to interpret the model compared to simpler models like decision trees.

Random Forest Regression:

A random forest is an ensemble learning method that combines the predictions from multiple decision trees to
produce a more accurate and stable prediction. It is a type of supervised learning algorithm that can be used for both
classification and regression tasks.

In regression task we can use Random Forest Regression technique for predicting numerical values. It predicts
continuous values by averaging the results of multiple decision trees.

Working of Random Forest Regression:

Random Forest Regression works by creating multiple of decision trees each trained on a random subset of the data.
The process begins with Bootstrap sampling where random rows of data are selected with replacement to form
different training datasets for each tree. After this we do feature sampling where only a random subset of features is
used to build each tree ensuring diversity in the models.

After the trees are trained each tree make a prediction and the final prediction for regression tasks is the average of all
the individual tree predictions and this process is called as Aggregation.
Random Forest Regression Model Working

This approach is beneficial because individual decision trees may have high variance and are prone to overfitting
especially with complex data. However by averaging the predictions from multiple decision trees Random Forest
minimizes this variance leading to more accurate and stable predictions and hence improving generalization of model.

Applications of Random Forest Regression:

The Random forest regression has a wide range of real-world problems including:
• Predicting continuous numerical values: Predicting house prices, stock prices or customer lifetime value.
• Identifying risk factors: Detecting risk factors for diseases, financial crises or other negative events.
• Handling high-dimensional data: Analyzing datasets with a large number of input features.
• Capturing complex relationships: Modeling complex relationships between input features and the target
variable.

Advantages of Random Forest Regression:

• Handles Non-Linearity: It can capture complex, non-linear relationships in the data that other models might
miss.
• Reduces Overfitting: By combining multiple decision trees and averaging predictions it reduces the risk of
overfitting compared to a single decision tree.
• Robust to Outliers: Random Forest is less sensitive to outliers as it aggregates the predictions from multiple
trees.
• Works Well with Large Datasets: It can efficiently handle large datasets and high-dimensional data without a
significant loss in performance.
• Handles Missing Data: Random Forest can handle missing values by using surrogate splits and maintaining
high accuracy even with incomplete data.
• No Need for Feature Scaling: Unlike many other algorithms Random Forest does not require normalization or
scaling of the data.
Disadvantages of Random Forest Regression:

• Complexity: It can be computationally expensive and slow to train especially with a large number of trees and
high-dimensional data.
• Less Interpretability: Since it uses many trees it can be harder to interpret compared to simpler models like
linear regression or decision trees.
• Memory Intensive: Storing multiple decision trees for large datasets require significant memory resources.
• Overfitting on Noisy Data: While Random Forest reduces overfitting, it can still overfit if the data is highly noisy,
especially with a large number of trees.
• Sensitive to Imbalanced Data: It may perform poorly if the dataset is highly imbalanced like one class is
significantly more frequent than another.
• Difficulty in Real-Time Predictions: Due to its complexity it may not be suitable for real-time predictions
especially with a large number of trees.

Support Vector Machine Regression and Classification

(svm classification – check notes)

Support Vector Machine Regression

What is a Support Vector Machine (SVM)?

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression
tasks. SVM works by finding a hyperplane in a high-dimensional space that best separates data into different classes.
It aims to maximize the margin (the distance between the hyperplane and the nearest data points of each class) while
minimizing classification errors. SVM can handle both linear and non-linear classification problems by using various
kernel functions. It’s widely used in tasks such as image classification, text categorization, and more.

So what exactly is Support Vector Machine (SVM)? We’ll start by understanding SVM in simple terms. Let’s say we have
a plot of two label classes as shown in the figure below:

Can you decide what the separating line will be? You might have come up with this:
The line fairly separates the classes. This is what SVM essentially does – simple class separation. Now, what is the data
was like this:

Here, we don’t have a simple line separating these two classes. So we’ll extend our dimension and introduce a new
dimension along the z-axis. We can now separate these two classes:

When we transform this line back to the original plane, it maps to the circular boundary as I’ve shown here:

This is exactly what Support Vector Machine Regression does! It tries to find a line/hyperplane (in multidimensional
space) that separates these two classes. Then it classifies the new point depending on whether it lies on the positive
or negative side of the hyperplane depending on the classes to predict.

Hyperparameters of the Support Vector Machine (SVM) Algorithm:

There are a few important parameters of SVM that you should be aware of before proceeding further:
• Kernel: A kernel helps us find a hyperplane in the higher dimensional space without increasing the
computational cost. Usually, the computational cost will increase if the dimension of the data increases. This
increase in dimension is required when we are unable to find a separating hyperplane in a given dimension
and are required to move in a higher dimension:

• Hyperplane: This is basically a separating line between two data classes in SVM. But in Support Vector
Regression, this is the line that will be used to predict the continuous output
• Decision Boundary: A decision boundary can be thought of as a demarcation line (for simplification) on one
side of which lie positive examples and on the other side lie the negative examples. On this very line, the
examples may be classified as either positive or negative. This same concept of SVM will be applied in Support
Vector Regression as well.

Introduction to Support Vector Regression (SVR):

Support Vector Regression (SVR) is a machine learning algorithm used for regression analysis. SVR Model in Machine
Learning aims to find a function that approximates the relationship between the input variables and a continuous
target variable while minimizing the prediction error.

Unlike Support Vector Machines (SVMs) used for classification tasks, SVR Model seeks a hyperplane that best fits the
data points in a continuous space. This is achieved by mapping the input variables to a high-dimensional feature space
and finding the hyperplane that maximizes the margin (distance) between the hyperplane and the closest data points,
while also minimizing the prediction error.

SVR Model can handle non-linear relationships between the input and target variables by using a kernel function to
map the data to a higher-dimensional space. This makes it a powerful tool for regression tasks where complex
relationships may exist.

Support Vector Regression (SVR) uses the same principle as SVM but for regression problems. Let’s spend a few minutes
understanding the idea behind SVR in Machine Learning.

The Idea Behind Support Vector Regression:

The problem of regression is to find a function that approximates mapping from an input domain to real numbers
based on a training sample. So, let’s dive deep and understand how SVR actually works.
Consider these two red lines as the decision boundary and the green line as the hyperplane. When we move on with
Support Vector Regression (SVR) in Machine Learning, our objective is to consider the points within the decision
boundary line. Our best fit line is the hyperplane with the maximum number of points.

The first thing that we’ll understand is what is the decision boundary (the danger red line above!). Consider these lines
as being at any distance, say ‘a’, from the hyperplane. So, these are the lines that we draw at distance ‘+a’ and ‘-a’ from
the hyperplane. This ‘a’ in the text is basically referred to as epsilon.

Assuming that the equation of the hyperplane is as follows:

Y = wx+b (equation of hyperplane)

Then the equations of decision boundary become:

wx+b= +a

wx+b= -a

Thus, any hyperplane that satisfies our SVM for Regression Model should satisfy:

-a < Y- wx+b < +a

Our main aim here is to decide a decision boundary at ‘a’ distance from the original hyperplane such that data points
closest to the hyperplane or the support vectors are within that boundary line.

Hence, we will take only those points within the decision boundary that have the least error rate or are within the
Margin of Tolerance. This will give us a better-fitting model.

Support Vector Regression (SVR) extends the principles of Support Vector Machines (SVM) to regression problems,
offering a powerful tool for predicting continuous outputs. By leveraging various kernels such as quadratic, radial basis
function, and sigmoid, SVR Model can handle complex and non-linear relationships in the data. Through this tutorial,
we’ve explored the essential hyperparameters, implemented SVR in Python, and applied it to real-world datasets,
demonstrating its versatility in artificial intelligence applications. Whether dealing with training samples in finance,
engineering, or healthcare, SVR Model provides a robust approach to model continuous data effectively, enhancing
the accuracy and reliability of predictive analytics.

You might also like