0% found this document useful (0 votes)

8 views

ml_unit1

Machine Learning (ML) is a subset of artificial intelligence that enables computers to learn from data and make decisions without explicit programming. It contrasts traditional programming by relying on data patterns rather than fixed rules, with various types including supervised, unsupervised, semi-supervised, and reinforcement learning, each with distinct applications and methodologies. Key applications of ML span across healthcare, finance, e-commerce, transportation, natural language processing, and security.

Uploaded by

Anurag singh Rathore

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

ml_unit1

Uploaded by

Anurag singh Rathore

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

UNIT – 1

What is Machine Learning?

Machine Learning (ML) is a subset of artiﬁcial intelligence (AI) that focuses on the
development of algorithms and statistical models that enable computers to perform
speciﬁc tasks without explicit instructions.

Instead of being programmed to perform a task, machine learning systems learn from data,
identify patterns, and make decisions based on that data.

Di erence between traditional and machine learning

Approach to Problem-Solving

 Traditional Programming:

 In traditional programming, a developer writes explicit rules and instructions

to solve a problem.

 Example: A program to calculate the area of a rectangle would require the

developer to specify the formula (length × width) and how to handle inputs.

 Machine Learning:

 In machine learning, the focus is on training a model using data. Instead of

writing explicit rules, the model learns patterns and relationships from the
data itself. The model can then make predictions or decisions based on new,
unseen data.
 Example: A machine learning model for predicting house prices would be
trained on historical data (features like size, location, etc.) and would learn
the relationships between these features and the prices.

2. Data Dependency

 Traditional Programming:

 The performance of traditional programs is largely dependent on the quality

of the algorithms and the logic implemented by the programmer.

 Machine Learning:

 Machine learning relies heavily on data. The quality, quantity, and diversity
of the training data signiﬁcantly a ect the model's performance. A well-
trained model can generalize to new data, but it may also fail if the training
data is biased or insu icient.

3. Flexibility and Adaptability

 Traditional Programming:

 Traditional programs are rigid. If the requirements change or if new

scenarios arise, the programmer must modify the code and logic explicitly.

 Machine Learning:

 Machine learning models can adapt to new data without needing explicit
reprogramming. If new patterns emerge in the data, the model can be
retrained to accommodate these changes, making it more ﬂexible in dynamic
environments.

4. Complexity of Problems

 Traditional Programming:

 Best suited for well-deﬁned problems with clear rules and logic. It works
e ectively for tasks that can be easily expressed through algorithms.

 Machine Learning:

 More e ective for complex problems where the relationships between

inputs and outputs are not easily deﬁned. This includes tasks like image
recognition, natural language processing, and predictive analytics, where
patterns may be intricate and not easily captured by traditional
programming.
5. Output Interpretation

 Traditional Programming:

 The output is deterministic and predictable based on the input and the
deﬁned rules. If the input is the same, the output will always be the same.

 Machine Learning:

 The output can be probabilistic. For instance, a model might predict that an
email is 80% likely to be spam. The model's predictions can vary based on
the data it has seen and the inherent uncertainty in the learning process.

6. Development Time and Maintenance

 Traditional Programming:

 Development can be straightforward for simple tasks, but as complexity

increases, maintaining and updating the code can become cumbersome.

 Machine Learning:

 Developing a machine learning model often requires more time for data
collection, preprocessing, and model training. However, once a model is
trained, it can be reused and adapted with new data.

1. Supervised Machine Learning

 It is based on supervision. It means in the supervised learning technique, we train
the machines using the "labelled" dataset, and based on the training, the machine
predicts the output.
 Here, the labelled data specifies that some of the inputs are already mapped to
the output.
 first, we train the machine with the input and corresponding output, and then we ask
the machine to predict the output using the test dataset.
 The main goal of the supervised learning technique is to map the input
variable(x) with the output variable(y).
 Some real-world applications are Risk Assessment, Fraud Detection, Spam
filtering, etc.

Categories of Supervised Machine Learning

o Classiﬁcation

o Regression

a) Classiﬁcation :

Classiﬁcation algorithms are used to solve the classiﬁcation problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc.

The classiﬁcation algorithms predict the categories present in the dataset.

Some real-world examples are Spam Detection, Email ﬁltering, etc.

Some popular classiﬁcation algorithms are given below:

o Random Forest Algorithm

o Decision Tree Algorithm

o Logistic Regression Algorithm

o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables.

These are used to predict continuous output variables, such as market trends, weather
prediction, etc.

Some popular Regression algorithms are given below:

o Linear Regression Algorithm

o Decision Tree Algorithm

Advantages and Disadvantages of Supervised Learning

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact
idea about the classes of objects.

o These algorithms are helpful in predicting the output on the basis of prior
experience.

Disadvantages:

o These algorithms are not able to solve complex tasks.

o It may predict the wrong output if the test data is di erent from the training data.

o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

 Image Segmentation

 Medical Diagnosis

 Fraud Detection

 Spam detection

 Speech Recognition

2. Unsupervised Machine Learning

 di erent from the Supervised learning technique; there is no need for supervision.
 It means, in unsupervised machine learning, the machine is trained using the
unlabeled dataset, and the machine predicts the output without any
supervision.
 In unsupervised learning, the models are trained with the data that is neither
classiﬁed nor labelled.
 The main aim of the unsupervised learning algorithm is to group the unsorted
dataset according to the similarities, patterns, and di erences. Machines are
instructed to ﬁnd the hidden patterns from the input dataset.

Categories of Unsupervised Machine Learning

o Clustering

o Association

1) Clustering

It is a way to group the objects into a cluster such that the objects with the most
similarities remain in one group and have fewer or no similarities with the objects of other
groups.

An example of the clustering algorithm is grouping the customers by their purchasing

behavior.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm

o DBSCAN Algorithm

2) Association

which ﬁnds interesting relations among variables within a large dataset.

The main aim of this learning algorithm is to ﬁnd the dependency of one data item on
another data item and map those variables accordingly so that it can generate maximum
proﬁt.

This algorithm is mainly applied in Market Basket analysis, Web usage mining, Customer
Segmentation, etc.

Some of the popular clustering algorithms are given below:

o Apriori Algorithm

o FP Growth Algorithm

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised
ones because these algorithms work on the unlabeled dataset.

o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.

Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.

o Working with Unsupervised learning is more di icult as it works with the unlabelled
dataset that does not map with the output.

Applications of Unsupervised Learning

o Recommendation Systems

o Anomaly Detection

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between

Supervised and Unsupervised machine learning.

It represents the intermediate ground between Supervised (With Labelled training data)
and Unsupervised learning (with no labelled training data) algorithms and uses the
combination of labelled and unlabeled datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and

unsupervised learning and operates on the data that consists of a few labels, it mostly
consists of unlabeled data.

It is completely di erent from supervised and unsupervised learning as they are based on
the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms,

the concept of Semi-supervised learning is introduced.

The main aim of semi-supervised learning is to e ectively use all the available data,
rather than only labelled data like in supervised learning.

Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabeled data into labelled data. It is because labelled data
is a comparatively more expensive acquisition than unlabeled data.

Advantages:

o It is simple and easy to understand the algorithm.

o It is highly e icient.

o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.

Disadvantages:
o Iterations results may not be stable.

o We cannot apply these algorithms to network-level data.

o Accuracy is low.

4. Reinforcement Learning :

Reinforcement learning works on a feedback-based process, in which an AI agent (A

software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance.

Agent gets rewarded for each good action and get punished for each bad action; hence
the goal of reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.

Reinforcement learning is employed in di erent ﬁelds such as Game theory, Operation

Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision

Process(MDP). In MDP, the agent constantly interacts with the environment and
performs actions; at each action, the environment responds and generates a new state.

Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning speciﬁes

increasing the tendency that the required behavior would occur again by adding
something. It enhances the strength of the behavior of the agent and positively
impacts it.

o Negative Reinforcement Learning: Negative reinforcement learning works exactly

opposite to the positive RL. It increases the tendency that the speciﬁc behavior
would occur again by avoiding the negative condition.

Real-world Use cases of Reinforcement Learning

o Video Games:
Some popular games that use RL algorithms are AlphaGO and AlphaGO Zero.

o Resource Management:
The "Resource Management with Deep Reinforcement Learning" paper showed that
how to use RL in computer to automatically learn and schedule resources to wait for
di erent jobs in order to minimize average job slowdown.

o Robotics:
Robots are used in the industrial and manufacturing area, and these robots are
made more powerful with reinforcement learning.

o Text Mining
Text-mining, one of the great applications of NLP, is now being implemented with
the help of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning

Advantages

o It helps in solving complex real-world problems which are di icult to be solved by

general techniques.

o The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.

o Helps in achieving long term results.

Disadvantage

o RL algorithms are not preferred for simple problems.

o RL algorithms require huge data and computations.

o Too much reinforcement learning can lead to an overload of states which can
weaken the results.

o The curse of dimensionality limits reinforcement learning for real physical systems.

 Applications :

1. Healthcare

 Disease Diagnosis: ML algorithms can analyze medical images (like X-rays, MRIs,
and CT scans) to assist in diagnosing diseases such as cancer, pneumonia, and
other conditions.

 Predictive Analytics: ML models can predict patient outcomes, readmission rates,

and disease progression based on historical data.

2. Finance
 Fraud Detection: ML algorithms analyze transaction patterns to identify potentially
fraudulent activities in real-time.

 Credit Scoring: Assessing the creditworthiness of individuals by analyzing their

ﬁnancial history and behavior.

3. E-commerce

 Recommendation Systems: ML algorithms analyze user behavior and preferences

to suggest products, enhancing the shopping experience (e.g., Amazon, Netﬂix).

 Inventory Management: Predicting demand for products to optimize inventory

levels and reduce costs.

 Customer Segmentation: Analyzing customer data to identify distinct segments for

targeted marketing campaigns.

4. Transportation

 Autonomous Vehicles: Self-driving cars use ML to interpret sensor data, recognize

objects, and make driving decisions.

 Route Optimization: ML algorithms optimize delivery routes based on tra ic

patterns, weather conditions, and other factors.

5. Natural Language Processing (NLP)

 Chatbots and Virtual Assistants: ML powers conversational agents that can

understand and respond to user queries (e.g., Siri, Alexa).

 Sentiment Analysis: Analyzing social media posts, reviews, and customer

feedback to gauge public sentiment about products or services.

6. Security

 Anomaly Detection: Identifying unusual patterns in network tra ic to detect

potential security breaches or cyberattacks.

 Facial Recognition: Using ML for identity veriﬁcation and surveillance applications.

 Linear Regression

a type of supervised machine-learning algorithm that learns from the labelled datasets
and maps the data points with most optimized linear functions which can be used for
prediction on new datasets.
It computes the linear relationship between the dependent variable and one or more
independent features by ﬁtting a linear equation with observed data.

It predicts the continuous output variables based on the independent input variable.

Assumptions are:

 Linearity: It assumes that there is a linear relationship between the independent

and dependent variables. This means that changes in the independent variable lead
to proportional changes in the dependent variable.

 Independence: The observations should be independent from each other that is

the errors from one observation should not inﬂuence other.

What is the best Fit Line?

Our primary objective while using linear regression is to locate the best-ﬁt line, which
implies that the error between the predicted and actual values should be kept to a
minimum. There will be the least error in the best-ﬁt line.

The slope of the line indicates how much the dependent variable changes for a unit change
in the independent variable(s).

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is
called a regression line. A regression line can show two types of relationship:
o Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable
increases on X-axis, then such a relationship is termed as a Positive linear
relationship.

o Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.

Finding the best ﬁt line:

When working with linear regression, our main goal is to find the best fit line that means the
error between predicted values and actual values should be minimized. The best fit line will
have the least error.
so we need to calculate the best values for a0 and a1 to find the best fit line, so to calculate
this we use cost function.

Cost function-

o Cost function optimizes the regression coe icients or weights.

o We can use the cost function to ﬁnd the accuracy of the mapping function, which
maps the input variable to the output variable.

o For Linear Regression, we use the Mean Squared Error (MSE) cost function, which
is the average of squared error occurred between the predicted values and actual
values.

Gradient Descent:

o Gradient descent is used to minimize the MSE by calculating the gradient of the
cost function.

o A regression model uses gradient descent to update the coe icients of the line by
reducing the cost function.

o It is done by a random selection of values of coe icient and then iteratively update
the values to reach the minimum cost function.

 Naïve Bayes Classiﬁer :

Naive Bayes classifiers are supervised machine learning algorithms used for classification
tasks, based on Bayes’ Theorem to find probabilities.

The Naive Bayes Classifier is a simple probabilistic classifier , are used to build the ML
models that can predict at a faster speed than other classification algorithms.

It is a probabilistic classiﬁer because it assumes that one feature in the model is

independent of existence of another feature.

Naïve Bayes Algorithm is used in spam ﬁltration, Sentimental analysis, etc.

Why it is Called Naive Bayes?

It is named as “Naive” because it assumes that the occurrence of a certain feature is

independent of the occurrence of other features.

The “Bayes” part of the name refers to for the basis in Bayes’ Theorem.

Bayes' Theorem:
o is used to determine the probability of a hypothesis with prior knowledge. It
depends on the conditional probability.

o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Advantages :

o fast and easy to predict a class of datasets.

o used for Binary as well as Multi-class Classiﬁcations.

o performs well in Multi-class predictions.

Disadvantages :

o Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between features.

Applications of Naïve Bayes Classiﬁer:

o It is used for Credit Scoring.

o It is used in medical data classiﬁcation.

o It is used in Text classiﬁcation such as Spam ﬁltering and Sentiment analysis.

 Decision Tree Classiﬁcation Algorithm :

o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems.
o It is a tree-structured classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node represents
the outcome.

o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any
further branches.

o It is a graphical representation for getting all the possible solutions to a

problem/decision based on given conditions.

o It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.

o In order to build a tree, we use the CART algorithm, which stands for Classiﬁcation
and Regression Tree algorithm.

o A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.

Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.
Why use Decision Trees?

o Decision Trees usually mimic human thinking ability while making a decision, so
it is easy to understand.

o The logic behind the decision tree can be easily understood because it shows a
tree-like structure.

Decision Tree Terminologies

 Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the ﬁnal output node, and the tree cannot be segregated
further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from
the root node of the tree. This algorithm compares the values of root attribute with the
record (real dataset) attribute and, based on the comparison, follows the branch and
jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.

o Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).

o Step-3: Divide the S into subsets that contains possible values for the best
attributes.

o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the ﬁnal node as a leaf node.

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure.

There are two popular techniques for ASM, which are:

o Information Gain

o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the segmentation

of a dataset based on an attribute.

o It calculates how much information a feature provides us about a class.

o According to the value of information gain, we split the node and build the decision
tree.
o A decision tree algorithm always tries to maximize the value of information gain, and
a node/attribute having the highest information gain is split ﬁrst. It can be calculated
using the below formula:

Information Gain = Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It speciﬁes

randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples

o P(yes)= probability of yes

o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in
the CART(Classiﬁcation and Regression Tree) algorithm.

o An attribute with the low Gini index should be preferred

o It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.

o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

A too-large tree increases the risk of overﬁtting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the
learning tree without reducing accuracy is known as Pruning.

Advantages of the Decision Tree

o It is simple to understand

o useful for solving decision-related problems.

o think about all the possible outcomes for a problem.

o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

o It may have an overﬁtting issue, which can be resolved using the Random Forest
algorithm.

o For more class labels, the computational complexity of the decision tree may
increase.

 K-Nearest Neighbor (KNN) Algorithm :

simple way to classify things by looking at what’s nearby.

Imagine a streaming service wants to predict if a new user is likely to cancel their
subscription (churn) based on their age. They checks the ages of its existing users and
whether they churned or stayed. If most of the “K” closest users in age of new user
canceled their subscription KNN will predict the new user might churn too. The key
idea is that users with similar ages tend to have similar behaviors and KNN uses this
closeness to make decisions.

K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at the time of
classiﬁcation it performs an action on the dataset.

KNN Algorithm working visualization

The new point is classiﬁed as Category 2 because most of its closest neighbors are blue
squares. KNN assigns the category based on the majority of nearby points.

The image shows how KNN predicts the category of a new data point based on its closest
neighbors.
What is ‘K’ in K Nearest Neighbour ?

In the k-Nearest Neighbours (k-NN) algorithm k is just a number that tells the algorithm
how many nearby points (neighbours) to look at when it makes a decision.

Example:

Imagine you’re deciding which fruit it is based on its shape and size. You compare it to fruits
you already know.

 If k = 3, the algorithm looks at the 3 closest fruits to the new one.

 If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is
an apple because most of its neighbours are apples.

How to choose the value of k for KNN Algorithm?

The value of k is critical in KNN as it determines the number of neighbors to consider

when making predictions.

Selecting the optimal value of k depends on the characteristics of the input data. If the
dataset has significant outliers or noise a higher k can help smooth out the predictions
and reduce the influence of noisy data. However choosing very high value can lead to
underfitting where the model becomes too simplistic.

Statistical Methods for Selecting k:

 Elbow Method: In the elbow method we plot the model’s accuracy for di erent
values of k. As we increase k the error usually decreases initially. However after a
certain point the error rate starts to decrease more slowly. This point where the
curve forms an “elbow” that point is considered as best k.

 Odd Values for k: It’s also recommended to choose an odd value for k especially in
classiﬁcation tasks to avoid ties when deciding the majority class.

Distance Metrics Used in KNN Algorithm

KNN uses distance metrics to identify nearest neighbour, these neighbours are used for
classiﬁcation and regression task. To identify nearest neighbour we use below distance
metrics:

1. Euclidean Distance

Euclidean distance is deﬁned as the straight-line distance between two points in a plane or
space. You can think of it like the shortest path you would walk if you were to go directly
from one point to another.
distance(x,Xi)=∑j=1d(xj–Xij)2]distance(x,Xi)=∑j=1d(xj–Xij)2]

2. Manhattan Distance

This is the total distance you would travel if you could only move along horizontal and
vertical lines (like a grid or city streets). It’s also called “taxicab distance” because a taxi
can only drive along the grid-like streets of a city.

d(x,y)=∑i=1n∣xi−yi∣d(x,y)=∑i=1n∣xi−yi∣

Working of KNN algorithm

Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity where it
value of a new data point by considering the labels .

Step 1: Selecting the optimal value of K

 K represents the number of nearest neighbors that needs to be considered while

making prediction.

Step 2: Calculating distance

 Distance is calculated between data points in the dataset and target point.

Step 3: Finding Nearest Neighbors

 The k data points with the smallest distances to the target point are nearest
neighbors.

Step 4: Voting for Classiﬁcation or Taking Average for Regression

 When you want to classify a data point into a category (like spam or not spam), the
K-NN algorithm looks at the K closest points in the dataset. These closest points
are called neighbors. The algorithm then looks at which category the neighbors
belong to and picks the one that appears the most. This is called majority voting.

 In regression, the algorithm still looks for the K closest points. But instead of voting
for a class in classiﬁcation, it takes the average of the values of those K neighbors.
This average is the predicted value for the new point for the algorithm.

 What is Logistic Regression?

 Logistic regression is a supervised machine learning algorithm used

for classiﬁcation tasks where the goal is to predict the probability that an instance
belongs to a given class or not.
 Logistic regression is a statistical algorithm which analyze the relationship between
two data factors.
 Logistic regression is used for binary classiﬁcation where we use sigmoid
function, that takes input as independent variables and produces a probability value
between 0 and 1.

For example, we have two classes Class 0 and Class 1 if the value of the logistic function
for an input is greater than 0.5 (threshold value) then it belongs to Class 1 otherwise it
belongs to Class 0. It’s referred to as regression because it is the extension of linear
regression but is mainly used for classiﬁcation problems.

Key Points:

 Logistic regression predicts the output of a categorical dependent variable.

Therefore, the outcome must be a categorical or discrete value.

 It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.

 In Logistic regression, instead of ﬁtting a regression line, we ﬁt an “S” shaped logistic

function, which predicts two maximum values (0 or 1).

Assumptions of Logistic Regression

We will explore the assumptions of logistic regression as understanding these assumptions

is important to ensure that we are using appropriate application of the model. The
assumption include:

1. Independent observations: Each observation is independent of the other. meaning

there is no correlation between any input variables.
3. Linearity relationship between independent variables and log odds: The relationship
between the independent variables and the log odds of the dependent variable
should be linear.

Understanding Sigmoid Function

So far, we’ve covered the basics of logistic regression, but now let’s focus on the most
important function that forms the core of logistic regression.

 The sigmoid function is a mathematical function used to map the predicted values
to probabilities.

 It maps any real value into another value within a range of 0 and 1. The value of the
logistic regression must be between 0 and 1, which cannot go beyond this limit, so it
forms a curve like the “S” form.

 The S-form curve is called the Sigmoid function or the logistic function.

 In logistic regression, we use the concept of the threshold value, which deﬁnes the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.

How does Logistic Regression work?

The logistic regression model transforms the linear regression function continuous value
output into categorical value output using a sigmoid function, which maps any real-
valued set of independent variables input into a value between 0 and 1. This function is
known as the logistic function.

Sigmoid Function

Now we use the sigmoid function where the input will be z and we ﬁnd the probability
between 0 and 1. i.e. predicted y.

σ(z)=11+e−zσ(z)=1+e−z1

Sigmoid function
As shown above, the ﬁgure sigmoid function converts the continuous variable data into
the probability i.e. between 0 and 1.

 σ(z) σ(z) tends towards 1 as z→∞z→∞

 σ(z) σ(z) tends towards 0 as z→−∞z→−∞

 σ(z) σ(z) is always bounded between 0 and 1

where the probability of being a class can be measured as:

P(y=1)=σ(z)P(y=0)=1−σ(z)P(y=1)=σ(z)P(y=0)=1−σ(z)

Equation of Logistic Regression:

then the ﬁnal logistic regression equation will be:

p(X;b,w)=ew⋅X+b1+ew⋅X+b=11+e−w⋅X+bp(X;b,w)=1+ew⋅X+bew⋅X+b=1+e−w⋅X+b1

Likelihood Function for Logistic Regression

The predicted probabilities will be:

 for y=1 The predicted probabilities will be: p(X;b,w) = p(x)

 for y = 0 The predicted probabilities will be: 1-p(X;b,w) = 1-p(x)

Gradient of the log-likelihood function

To ﬁnd the maximum likelihood estimates, we di erentiate w.r.t w

Terminologies involved in Logistic Regression

 Logistic function: The formula used to represent how the independent and
dependent variables relate to one another. The logistic function transforms the input
variables into a probability value between 0 and 1, which represents the likelihood
of the dependent variable being 1 or 0.

 Odds: It is the ratio of something occurring to something not occurring. it is di erent

from probability as the probability is the ratio of something occurring to everything
that could possibly occur.

 Log-odds: The log-odds, also known as the logit function, is the natural logarithm of
the odds. In logistic regression, the log odds of the dependent variable are modeled
as a linear combination of the independent variables and the intercept.

 Coe icient: The logistic regression model’s estimated parameters, show how the
independent and dependent variables relate to one another.
 Intercept: A constant term in the logistic regression model, which represents the
log odds when all independent variables are equal to zero.

 Maximum likelihood estimation: The method used to estimate the coe icients of
the logistic regression model, which maximizes the likelihood of observing the data
given the model

 Support Vector Machine Algorithm :

SVM is Supervised Learning algorithms, which is used for Classiﬁcation as well as

Regression problems.

The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine.

Example: Suppose we see a strange cat that also has some features of dogs, so if we want
a model that can accurately identify whether it is a cat or dog, so such a model can be
created by using the SVM algorithm. We will ﬁrst train our model with lots of images of cats
and dogs so that it can learn about di erent features of cats and dogs, and then we test it
with this strange creature. So as support vector creates a decision boundary between
these two data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat.
Consider the below diagram:

SVM algorithm can be used for Face detection, image classiﬁcation, text
categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line and
classifier is used called as Linear SVM classifier.

o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in

n-dimensional space, but we need to ﬁnd out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane.

The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which a ect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.

How does SVM works?

Linear SVM:

Suppose we have a dataset that has two tags (green and blue), and the dataset has two
features x1 and x2. We want a classiﬁer that can classify the pair(x1, x2) of coordinates in
either green or blue.

So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes.
Hence, the SVM algorithm helps to ﬁnd the best line or decision boundary; this best
boundary or region is called as a hyperplane.

SVM algorithm ﬁnds the closest point of the lines from both the classes. These points are
called support vectors. The distance between the vectors and the hyperplane is called
as margin.

And the goal of SVM is to maximize this margin. The hyperplane with maximum margin
is called the optimal hyperplane.
 Random Forest Algorithm :

It can be used for both Classiﬁcation and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining multiple classiﬁers to
solve a complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classiﬁer that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset."

Instead of relying on one decision tree, the random forest takes the prediction from each
tree and based on the majority votes of predictions, and it predicts the ﬁnal output.

The greater number of trees in the forest leads to higher accuracy and prevents the
problem of overﬁtting.

Assumptions for Random Forest

o There should be some actual values in the feature variable of the dataset so that
the classiﬁer can predict accurate results rather than a guessed result.

o The predictions from each tree must have very low correlations.

Why use Random Forest?

o It takes less training time as compared to other algorithms.

o It predicts output with high accuracy, even for the large dataset it runs e iciently.

o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?

Random Forest works in two-phase ﬁrst is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the ﬁrst phase.

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, ﬁnd the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.

Example: Suppose there is a dataset that contains multiple fruit images. So, this dataset is
given to the Random forest classifier. The dataset is divided into subsets and given to each
decision tree. During the training phase, each decision tree produces a prediction result,
and when a new data point occurs, then based on the majority of results, the Random
Forest classifier predicts the final decision.

Applications of Random Forest

1. Banking: Banking sector mostly uses this algorithm for the identiﬁcation of loan
risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease
can be identiﬁed.

3. Marketing: Marketing trends can be identiﬁed using this algorithm.

Advantages of Random Forest

o both Classiﬁcation and Regression tasks.

o handling large datasets with high dimensionality.

o It enhances the accuracy of the model and prevents the overﬁtting issue.

Disadvantages of Random Forest

o Although random forest can be used for both classiﬁcation and regression tasks, it
is not more suitable for Regression tasks.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Applied ML notes
No ratings yet
Applied ML notes
123 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Machine Learning Lab Viva
100% (1)
Machine Learning Lab Viva
9 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
machine learning and AI
No ratings yet
machine learning and AI
13 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Machine Learning-Supervised Learning
No ratings yet
Machine Learning-Supervised Learning
31 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
Machine Learning Unit-I
No ratings yet
Machine Learning Unit-I
41 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
Machine Learning Is The Branch of
No ratings yet
Machine Learning Is The Branch of
12 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
13 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
20 pages
Chapter Five
No ratings yet
Chapter Five
178 pages
Machine Learning - its types
No ratings yet
Machine Learning - its types
8 pages
Full Notes
No ratings yet
Full Notes
37 pages
Machine Learning - Part -1
No ratings yet
Machine Learning - Part -1
17 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Meta Motion Fitness Tracker 241109 213742[1] Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742[1] Removed
20 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Chap 10-Machine Learning
No ratings yet
Chap 10-Machine Learning
25 pages
AIML UNIT-3
No ratings yet
AIML UNIT-3
52 pages
UNIT4
No ratings yet
UNIT4
12 pages
Ml Solutions
No ratings yet
Ml Solutions
34 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Basics of Machine Learning
No ratings yet
Basics of Machine Learning
38 pages
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
No ratings yet
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
12 pages
Pricas Supervised Learning
No ratings yet
Pricas Supervised Learning
18 pages
Unit IV ML
No ratings yet
Unit IV ML
10 pages
Learning Algorithms
No ratings yet
Learning Algorithms
28 pages
ML
No ratings yet
ML
17 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit 1
No ratings yet
Unit 1
52 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
Machine Learning Notes
100% (1)
Machine Learning Notes
8 pages
An Overview of Machine Learning
No ratings yet
An Overview of Machine Learning
20 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Machine Learning Lecture
No ratings yet
Machine Learning Lecture
10 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
78 pages
1 What is Machine
No ratings yet
1 What is Machine
15 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
AI(Part-II)
No ratings yet
AI(Part-II)
11 pages
Machine Learning Question Bank - (NOT GIVEN BY MAM)
No ratings yet
Machine Learning Question Bank - (NOT GIVEN BY MAM)
50 pages
ML IN FASHION INDUSTRY
No ratings yet
ML IN FASHION INDUSTRY
40 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
4 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
ML COMPLETE NOTES
No ratings yet
ML COMPLETE NOTES
54 pages
iT presentation
No ratings yet
iT presentation
10 pages
EXP 7
No ratings yet
EXP 7
12 pages
IT Report Guidelines 2024-25
No ratings yet
IT Report Guidelines 2024-25
13 pages
EXP 8
No ratings yet
EXP 8
7 pages
EXP 6
No ratings yet
EXP 6
5 pages
LAB EXP LAST 5
No ratings yet
LAB EXP LAST 5
15 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Edu Data Mining
100% (1)
Edu Data Mining
6 pages
636-ArticleText-2070-1-10-20231207
No ratings yet
636-ArticleText-2070-1-10-20231207
17 pages
Default_of_Credit_Card_Clients
No ratings yet
Default_of_Credit_Card_Clients
33 pages
Real Estate Price Forecasting Based On SVM Optimized by PSO: Optik
No ratings yet
Real Estate Price Forecasting Based On SVM Optimized by PSO: Optik
5 pages
Thangaraj Et Al. - 2023
No ratings yet
Thangaraj Et Al. - 2023
14 pages
Business Data Mining - Syllabus7675535
No ratings yet
Business Data Mining - Syllabus7675535
1 page
(PDF Download) Artificial Intelligence and Machine Learning For EDGE Computing 1st Edition Rajiv Pandey Fulll Chapter
100% (3)
(PDF Download) Artificial Intelligence and Machine Learning For EDGE Computing 1st Edition Rajiv Pandey Fulll Chapter
64 pages
Chandan Gautam Resume
No ratings yet
Chandan Gautam Resume
4 pages
Crime Rate Analysis Using Machine Learning Final
No ratings yet
Crime Rate Analysis Using Machine Learning Final
37 pages
Crop Prediction and Fertilizer Recommend-99770082
No ratings yet
Crop Prediction and Fertilizer Recommend-99770082
5 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Acknowledgement
No ratings yet
Acknowledgement
24 pages
1 s2.0 S2468227623001953 Main
No ratings yet
1 s2.0 S2468227623001953 Main
7 pages
Prediction of COVID-19 Possibilities Using KNeares
No ratings yet
Prediction of COVID-19 Possibilities Using KNeares
9 pages
batch23
No ratings yet
batch23
6 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
(Artificial Intelligence and Soft Computing For Industrial Transformation) Subhendu Kumar Pani (Editor), Sujata Dash (Editor), S. Balamurugan (Editor), Ajith Abraham (Editor) - Biomedical Data Mining
No ratings yet
(Artificial Intelligence and Soft Computing For Industrial Transformation) Subhendu Kumar Pani (Editor), Sujata Dash (Editor), S. Balamurugan (Editor), Ajith Abraham (Editor) - Biomedical Data Mining
424 pages
1 s2.0 S1877050915007000 Main
No ratings yet
1 s2.0 S1877050915007000 Main
9 pages
Diagnosis of Neurological Disorders Based On Deep Learning Techniques (Jyotismita Chaki)
No ratings yet
Diagnosis of Neurological Disorders Based On Deep Learning Techniques (Jyotismita Chaki)
234 pages
20BCE
No ratings yet
20BCE
18 pages
Level 3 Rancho Labs Artificial Intelligence Program
No ratings yet
Level 3 Rancho Labs Artificial Intelligence Program
20 pages
JSRT+roadaccidentproject
No ratings yet
JSRT+roadaccidentproject
12 pages
Prediction of In-Place Density of Soil Using SPT N
No ratings yet
Prediction of In-Place Density of Soil Using SPT N
10 pages