0% found this document useful (0 votes)
14 views

Lecture 1

Machine learning

Uploaded by

dpmanish
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 1

Machine learning

Uploaded by

dpmanish
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter 1 Introduction to Machine Learning

Contents:

1.1 Machine Learning, Types of Machine Learning, Issues in Machine Learning, Application of
Machine Learning, Steps in developing a Machine Learning Application.

1.2 Training Error, Generalization error, Overfitting, Underfitting, Bias Variance trade-off.

1.3 Learning with Regression and Trees

1.1 Machine Learning

1.1.1 Introduction to Machine Learning

Machine Learning is the most popular technique of predicting the future or classifying
information to help people in making necessary decisions.

Artificial intelligence (AI) has an area called "machine learning" that focuses on creating models
and algorithms that let computers make decisions and predictions based on data without having
to be explicitly programmed. Creating computer systems that can automatically learn from
experience or examples is the main objective of machine learning.

Definition: Machine learning is a branch of artificial intelligence (AI) and computer science
which focuses on the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy.

In short machine learning is –

i. The ability for a machine to automatically learn from data,


ii. Enhance performance based on prior experiences, and
iii. Make predictions.

A collection of algorithms used in machine learning operate on vast amounts of data. These
algorithms are fed data to train them, and after training, they develop a model and carry out a
certain task.

1.1.2. Types of Machine Learning

Machine Learning Algorithms

Machine Learning algorithms are trained over instances or examples through which they learn
from past experiences and also analyze the historical data.

Machine learning algorithms are categorized as –


Supervised Learning – “Teach me what to learn”

1
Unsupervised Learning – “I will find what to learn”
Reinforcement Learning – “I’ll learn from my mistakes at every step (Hit & Trial)”

Machine Learning
Types of Algorithm

1. Supervised 2. Unsupervised 3. Reinforcement


Learning Learning Learning
I’ll learn from my
Teach me what to learn I will find what to learn mistakes at every step
(Hit & Trial)”

Fig 1. Machine Learning Algorithms

Supervised learning

Supervised learning is the machine learning task of learning a function that maps an
input to an output based on example input-output pairs. It infers a function from
labeled training data consisting of a set of training examples.

Some popular examples of supervised machine learning algorithms are:

 Linear regression for regression problems.


 Random forest for classification and regression problems.
 Support vector machines for classification problems.

Unsupervised learning
Unsupervised learning is a type of machine learning algorithm used to draw inferences
from datasets consisting of input data without labeled responses. The most
common unsupervised learning method is cluster analysis, which is used for exploratory data
analysis to find hidden patterns or grouping in data.
Unsupervised Learning Algorithms allow users to perform more complex processing tasks
compared to supervised learning.

Unsupervised learning can be more unpredictable compared with other natural learning methods.
Unsupervised learning algorithms include clustering, anomaly detection, neural networks, etc.

2
Reinforcement Learning

Reinforcement Learning is defined as a Machine Learning method that is concerned with how
software agents should take actions in an environment. Reinforcement Learning is a part of the
deep learning method that helps you to maximize some portion of the cumulative reward.

How Reinforcement Learning works?

Let's see some simple example.

Consider the scenario of teaching new tricks to your cat

 As cat doesn't understand English or any other human language, we can't tell her directly
what to do. Instead, we follow a different strategy. We emulate a situation, and the cat
tries to respond in many different ways. If the cat's response is the desired way, we will
give her fish.
 Now whenever the cat is exposed to the same situation, the cat executes a similar action
with even more enthusiastically in expectation of getting more reward (food).
 That's like learning that cat gets from "what to do" from positive experiences.
 At the same time, the cat also learns what not do when faced with negative experiences.
 Your cat is an agent that is exposed to the environment. In this case, it is your house. An
example of a state could be your cat sitting, and you use a specific word in for cat to
walk.
 Our agent reacts by performing an action transition from one "state" to another "state."
 For example, your cat goes from sitting to walking.
 The reaction of an agent is an action, and the policy is a method of selecting an action
given a state in expectation of better outcomes.
 After the transition, they may get a reward or penalty in return.

Characteristics of Reinforcement Learning

Here are important characteristics of reinforcement learning

 There is no supervisor, only a real number or reward signal


 Sequential decision making
 Time plays a crucial role in Reinforcement problems
 Feedback is always delayed, not instantaneous
 Agent's actions determine the subsequent data it receives

Applications of Reinforcement Learning

Here are applications of Reinforcement Learning:

 Robotics for industrial automation.


 Business strategy planning

3
 Machine learning and data processing
 It helps you to create training systems that provide custom instruction and materials
according to the requirement of students.
 Aircraft control and robot motion control

1.1.3 How machine learning works?

Machine Learning algorithms are trained over instances or examples through which they learn
from past experiences and also analyze the historical data.

Therefore, as it trains over the examples, again and again, it is able to identify patterns in order to
make predictions about the future.

Steps in developing a Machine Learning Application

Data Gathering

Preprocessing/ Cleaning Data

Algorithm
Model Building/ Selecting susitable ML

Training Model

Testing Model
Model
Evaluating
and
Deployment

Fig. 2 Steps in developing a Machine Learning Application

In order to develop the model we –

i. Determine the kind of the problems and prepare the data.


ii. Choose machine learning techniques like classification, regression, cluster analysis,
association, etc., and
iii. Training and testing a model is essential so that it can understand the various patterns,
rules, and, features
iv. Deploy and evaluate the model.

All these steps in developing a Machine Learning Application as shown in Fig 2 are explained in
detail.

4
1. Data Gathering

The first stage of the machine learning life cycle is data gathering. This step's objective is to
locate and collect all data-related issues.

As data can be gathered from a variety of sources, including files, databases, the internet, and
mobile devices, we must first identify the various data sources in this stage. One of the most
crucial phases of the life cycle, it. The effectiveness of the output will depend on the quantity and
calibre of the data gathered. The prediction will be more accurate the more data there is.

The following tasks are part of this step:

i.Identify different sources of data


ii. Collect data
iii. Integrate the data obtained from different sources

2. Data Preprocessing

We have to prepare the data for further use after gathering it. Data preparation is the process of
organizing and preparing our data for use in machine learning training.
In this stage, we initially group all the data together before randomly arranging them.
This method can be separated into two different steps:
Data exploration is done to determine the type of data we are working with. We must recognize
the qualities, formats, and properties of the data.
Better we understand the data, we can get more accurate results. We discover correlations, broad
trends, and outliers in this.
Pre-processing of data:
The preparation of data for its analysis is the next step. It is also called as data cleaning or data
wrangling. In this process raw data is iterated till we get structured data for analysis.
Cleaning and transforming unusable raw data into a usable format is known as data wrangling. It
is the process of preparing the data for analysis in the following phase by properly formatting it,
choosing the variable to utilize, and cleaning the data. It is among the most crucial steps in the
entire procedure. In order to address the quality issues, data cleaning is necessary.

The information we have gathered may not always be beneficial to us; some of it may not even
be. The challenges that acquired data may have in real-world applications include:

i. Missing Values
ii. Duplicate data
iii. Invalid data

3. Data Modeling

5
The data has now been cleansed and readied for the analysis step. This action entails:

i. choice of analytical methods


ii. developing models
iii. Review the result

The goal of this step is to create a machine learning model that will analyze the data with a
variety of analytical methods and then evaluate the results.

4. Training a Model

We train our machine learning model to improve its performance for getting more accurate
results of the problem. Various machine learning algorithms are used to train the model using
datasets.

5. Testing a model

We test the machine learning model once it has been trained on a specific dataset. In this phase,
we give our model a test dataset to see if it is accurate. Testing the model establishes its accuracy
in percentages in relation to the requirements.

6. Deployment and Evaluating the model


Evaluation of the model is done while deployment. Deployment, the final stage of the machine
learning life cycle, involves implementing the model in a practical system. After deployment,
Machine learning model is evaluated based on accuracy and other measures.

We implement the model in the actual system if it delivers an accurate output that is prediction
that meets our requirements quickly and as planned. However, we will first determine whether
the project is using the given data to improve performance before deploying it. The project's final
report is made during the deployment phase.

1.1.4 Issues in Machine Learning

1. Poor Quality of Data

A poor data quality is a major issue for ML applications. Noisy data, incomplete data,
erroneous data, and unclean data produce low-quality outcomes and less accurate
classification.

We must make sure that the data preprocessing procedure, which involves eliminating
outliers, filtering out missing values, and eliminating undesired characteristics, is carried out
to the highest degree of accuracy.

6
2. The Process of Machine Learning is Complex

The machine learning sector is new and evolving rapidly. Experiments using quick hits and
trials are being conducted. The learning process is complicated since there are many
opportunities for error because the process is changing. It entails a variety of tasks, such as
data analysis, bias removal, data training, using sophisticated mathematical computations,
and much more. As a result, it is a tremendously sophisticated process and a significant
challenge for machine learning experts.

3. Lack of Training Data


To achieve an accurate output in the machine learning process, it is important to train the
data. Less amount training data will produce inaccurate or too biased predictions.

4. Slow Implementation
Slow programs, overloading of data, and extreme requirements are usually taking a lot of
time to provide accurate results in machine learning. It also requires constant monitoring and
maintenance to deliver the best outcomes.

5. Lack of skilled resources: As Machine learning and artificial intelligence algorithms are
comparatively new and still in development stage, they are continuously changing to get
mature. Machine learning methods need mathematical, technical and analytical support and
become complex to implement. Due to complex implementation methods, skilled man
power is essential to work with machine learning methods. As still growing area, there is
lack of such skilled resources.

1.1.5 Applications of unsupervised machine learning

Some applications of unsupervised machine learning techniques are:

 Clustering automatically split the dataset into groups base on their similarities
 Anomaly detection can discover unusual data points in your dataset. It is useful for
finding fraudulent transactions
 Association mining identifies sets of items which often occur together in your dataset
 Latent variable models are widely used for data preprocessing. Like reducing the number
of features in a dataset or decomposing the dataset into multiple components

Application of Machine Learning

As machine learning is becoming popular now a days, many fields are applying machine
learning methods. Following are some prominent areas of application of machine learning.

7
1. Image Processing: for image recognition and image matching are common applications
of machine learning in image processing. Used to identify, distinguish and match images
of persons, objects or places. Face recognition and object detection and tracking is
another important area. eg. Automatic friend tagging suggestion.
2. Speech recognition: is one of the popular application of machine learning. We can
operate a device or computer based on Speech commands. E.g. "Search by voice, “in
Google.
3. Prediction: Different prediction algorithms like, traffic prediction, predicting population
growth, stock market predictions are the distinguished applications of machine learning.
Eg. Google Maps can show shortest path and can predict traffic conditions or Share khan
can predict future stock prices of shares.
4. Product recommendations: E-commerce sites such as Amazon or entertainment
channels like Netflix etc. use ML for product recommendations to their users based on
previous views or search results.
5. Self-Driving vehicles: Self-driving automobiles are one of the most promising uses of
machine learning. Self-driving cars depend significantly on machine learning. E.g.The
world's largest automaker, Tesla, is developing a self-driving vehicle. In order to train the
car models to recognize people and objects while driving, unsupervised learning was
used.
6. Virtual personal assistant: based on voice and speech recognition, there are devices for
personal assistance. E.g., Alexa, Cortana, Siri, Google assistant are few common
applications.
7. Email Spam and Malware Filtering: Every new email that we get is immediately
classified as necessary, common, or spam. Machine learning is the technology that
enables us to consistently get essential emails in our inbox with the important symbol and
spam emails in our spam box.
8. Fraud Detection: Online banking transactions are not safe and secure due to various
frauds. Machine learning is making our online transaction safe and secure by detecting
fraud transaction
9. Medical Diagnosis: Machine learning plays a vital role in medical applications like
Automatic disease detection, patient monitoring, stroke detection, cancer detection etc.
E.g. 3-D machine learning models are built which can predict the exact position, growth
and stages of disease like cancer and many other.
10. Natural Language Processing: Machine learning algorithms in assistance with natural
language processing (NLP) can now process language-based inputs from people, such as
text messages sent through a company's website. These algorithms can identify the topic
and tone of a communication using NLP to learn more about what customers want. E.g.
Chatbots- that many businesses utilize Chatbots to generate automatic response and
answer customer questions submitted through their websites. or Google's GNMT (Google
Neural Machine Translation) which also is a machine learning that helps users to convert
the text into his/her known languages.

8
1.2.1 Training Error and generalization error

In supervised learning applications in machine learning and statistical learning theory,


generalization error (also known as the out-of-sample error) is a measure of how accurately an
algorithm is able to predict outcome values for previously unseen data. [Wikipedia]

Be aware that the gap between forecasts and observed data is caused by model inaccuracy,
sampling error, and noise. While certain flaws can be minimised, others cannot. Even while
choosing the right approach and fine-tuning model parameters can increase the model's accuracy,
we will never be able to make predictions that are 100 percent accurate.

Training error is defined as the average loss that occurred during the training process. It is given
by:

Here, m_t is the size of the training set and loss function is the square of the difference between
the actual output and the predicted output. The above equation can be written as:

We can take the root of the above equation to calculate the Root Mean Square Error (RMSE). It
should be noted that the train error will be low as compared to the test error.

As the degree of complexity of the model is increased, the training error keeps on decreasing as
shown in Fig 3. Here, complexity is referred to as the number or complexity (x-power) of the
input features used in the model. Plotting the relationship looks like this:

Simple model - low complexity, high error

Complex model – high complexity, low error


Fig. 3 . Training Error Vs. Model complexity

The generalization error is measured in an independent test set. The generalization error vs.
model complexity can be plotted in Fig. 4 :

Complexity of the model

Fig. 4. Generalization Error Vs. Model complexity

10
1.2.2 Bias and Variance in Machine Learning

Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the actual
values. It is an assumptions made by a model to make a function easier to learn. It is actually the
error rate of the training data. When the error rate has a high value, we call it High Bias and
when the error rate has a low value, we call it low Bias.

Variance: If the machine learning model performs well with the training dataset, but does not
perform well with the test dataset, then variance occurs. It is the difference between the error
rates of training data and testing data is called variance. If the difference is high then it’s called
high variance and when the difference in errors is low then it’s called low variance. Usually, we
want to make a low variance for generalized our model.

11
Fig.5 Example of bias and varience

Bias vs. variance: A trade-off

Fig. 6 Bias-Variance Tradeoff

12
Low bias results when a data engineer modifies the ML algorithm to better fit a particular data
set, but variance increases higher. This increases the likelihood of incorrect predictions while
making the model suitable for the data set.

The same is true when building a high bias, low variance model. Although the probability of
incorrect predictions will be reduced, the model will still not accurately match the data set.

As shown in Fig. 6, Variance and bias are inversely related. i.e. any ML model cannot have both
a low bias and a low variance.

The trade-off can be handled in various ways...

Increasing the complexity of the model- By making the model more complex by adjusting for
bias and variance, the overall bias is reduced while the variance is raised to an acceptable
amount. The model is matched to the training dataset in this way without introducing large
variance errors.

Increasing the training data set- This trade-off can also be somewhat balanced by expanding
the training data set. When dealing with overfitting models, this approach is preferred.
Additionally, this enables users to add complexity without introducing variance errors into the
model, as would happen with a large data collection.

Indication:High Bias Indication: High Varience

Fail to capture data trends Noise in dataset

Underfitting Overfitting

Very simple Complex

High error rate Forcing data points together

1.2.3 Underfitting and Overfitting

1. Underfitting

When a statistical model or machine learning algorithm is unable to recognize the underlying
pattern in the data, or when it only performs well on training data but badly on testing data, this
is referred to as underfitting.

13
When the model is unable to match the input data to the target data, underfitting occurs.
This occurs when the model gives poor performance with the training dataset and is not complex
enough to match all the available data.

Reasons for Underfitting

i. High bias and low variance.


ii. The size of the training dataset used is not enough.
iii. The model is too simple.
iv. Training data is not cleaned and also contains noise in it.

Techniques to Reduce Underfitting

i. Increase model complexity.


ii. Increase the number of features, performing feature engineering.
iii. Remove noise from the data.
iv. Increase the number of epochs or increase the duration of training to get better results.

2. Overfitting

When a model tries to match erroneous data, it is said to be overfit. When working with
extremely complicated models, this can happen since the model will virtually always match the
provided data points and perform well on training datasets. To accurately anticipate the outcome,
the model would not be able to generalize the data point in the test data set.

A statistical model is said to be overfitted when the model does not make accurate predictions on
testing data. When a model gets trained with so much data, it starts learning from the noise and
inaccurate data entries in our data set. And when testing with test data, it results in High
variance. Then the model does not categorize the data correctly, because of too many details and
noise. The causes of overfitting are the non-parametric and non-linear methods because these
types of machine learning algorithms have more freedom in building the model based on the
dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is
using a linear algorithm if we have linear data or using the parameters like the maximal depth if
we are using decision trees.

Reasons for Overfitting:

i. High variance and low bias.


ii. The model is too complex.
iii. The size of the training data.

Methods to reduce overfitting:

14
i. Increase training data in a dataset.
ii. Reduce model complexity by simplifying the model by selecting one with fewer
parameters
iii. Ridge Regularization and Lasso Regularization
iv. Early stopping during the training phase
v. Reduce the noise
vi. Reduce the number of attributes in training data.
vii. Constraining the model.

1.3 Learning with Regression and Trees


Regression and decision trees are the two fundamental concepts in machine learning which are
frequently applied to various tasks and modelling techniques. Let's discuss each one in next:
Regression: For the purpose of predicting continuous numerical values, regression is a
supervised learning technique. In this type of problem, the algorithm makes predictions about
brand-new, unforeseen data points by learning from prior data. Finding a mathematical
connection between the objective output (dependent variable) and the input features
(independent variables) is the aim of regression analysis. The output is a continuous variable
since it can have any real value within a range.
For example is Regression can be used, for instance, to predict share values depending on
different features, such as quarterly profit gain, duration and current market trends etc. Another
example is of predicting the rainfalls for the next day or week or month based on historical
weather data.

Decision Trees:
One of the most effective supervised learning methods for both classification and regression
applications is the decision tree. They make excellent solutions for numerous machine learning
issues because they are simple to understand and straightforward. A decision tree simplifies
complex decision-making into a sequence of straightforward judgments based on features, which
ultimately result in a choice or a numerical value.
It creates a tree structure resembling a flowchart where each internal node represents a test on an
attribute, each branch a test result, and each leaf node (terminal node) a class label. A stopping
requirement, such as the maximum depth of the tree or the least number of samples needed to
split a node, is reached by repeatedly separating the training data into subsets based on the values
of the attributes.

15
The process of building a decision tree involves recursively splitting the data based on different
features to maximize the information gain or minimize the impurity at each step. Each internal
node in a decision tree for classification stands in for a test on a particular feature, each branch
for the result of the test, and each leaf node for a class label. The projected continuous values are
contained in the leaf nodes of a decision tree for regression.
The level of impurity or unpredictability in the subsets is measured using metrics like entropy or
Gini impurity, and the Decision Tree algorithm chooses the optimum attribute to split the data
depending on these metrics during training. Finding the property that maximizes information
gain or impurity reduction following the split is the objective.
Tree Terminology
Root node: The root node of a tree represents the entire dataset and is located at the top of the
tree. It is where the decision-making process begins.
Internal/Decision Node: A node that represents a decision regarding an input feature. Internal
nodes connect to leaf nodes or other internal nodes by branching off of them.
A leaf or terminal node: is a node that has no children and represents a class name or a number
value.
Splitting: The process of splitting a node into two or more sub-nodes based on a split criterion
and a chosen attribute.
Branch/Sub-Tree: The decision tree is divided into subsections, each of which begins at an
internal node and terminates at a leaf node.
Parent node: The node that divides into one or more child nodes is known as the parent node.
The nodes that appear when a child node emerges
Impurity: A measurement of the homogeneity of the target variable in a sample of data. It
speaks of the level of uncertainty or unpredictability in a collection of examples. In decision trees
used for classification tasks, the Gini index and entropy are two impurity metrics that are
frequently utilized.
Variance: Measured by variance is the degree to which the anticipated and the target variables
diverge among various dataset samples. It is applied to decision tree regression issues. The
variance for the decision tree's regression tasks is measured using Mean squared error, Mean
Absolute Error, friedman_mse, or Half Poisson deviance.
Information Gain: The amount of impurity reduced by segmenting a dataset based on a specific
feature in a decision tree is known as information gain. The feature that provides the maximum
information gain serves as the splitting criterion, which is used to choose the most informative
feature to split on at each node of the tree in order to produce pure subsets.

16
Pruning: is the procedure of eliminating from a tree any branches that don't provide any new
information or cause overfitting.

Root
SUBTREE
Node

SubTree
Decision Decision
Node Node

Decision
TN TN TN
Node

TN TN TN: Terminal Node

Fig 7. Decision Tree

How do Decision Trees work?


A decision tree is a tree structure that resembles a flowchart, where each internal node represents
a feature, branches represent rules, and leaf nodes provide the algorithm's output. It is a flexible
supervised machine-learning approach that may be applied to classification and regression
issues. One of the most potent algorithms is this one. Additionally, Random Forest, one of the
most potent machine learning algorithms, uses it to train on various subsets of training data.

Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable, also
called as a predictor.

17
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other than
other variables, then such condition is called Multicollinearity. It should not be present in
the dataset, because it creates problem while ranking the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is
called underfitting.

Regression Analysis:
A predictive modelling method called regression analysis that analyses the relationship between
the target or dependent variable and independent variable in a dataset. When the target and
independent variables show a linear or non-linear relationship between one another and the target
variable has continuous values, one of the various types of regression analysis techniques is
applied. Regression analysis is frequently used to identify cause and effect relationships, forecast
trends, time series, and predictor strength.
Machine learning can be used to tackle the regression problem using two different types of
regression analysis techniques:
1. Logistic regression and 2. Linear regression.
They are the most renowned regression approaches. Regression analysis approaches in machine
learning come in a variety of forms, and their use depends on the type of data being used.
1. Linear regression: One of the most fundamental kinds of regression in machine learning
is linear regression. A predictor variable and a dependent variable that are linearly related
to one another make up the linear regression model. In case the data involves more than
one independent variable, then linear regression is called multiple linear regression
models.

18
Fig.8 Linear Regression

The linear regression model is denoted by following equation-


y=mx + c + e ……..… (1)
Where-
m is the slope of the line,
c is an intercept, and
e represents the error in the model.

Linear regression has two major types - simple linear regression and multiple linear
regression. The formula for simple linear regression is-

у = ß0 + ß1X1 + ……..+ ßn Xn + Ɛt ……….. (2)


Where,
y is the predicted value of the dependent variable (y) for any value of the independent
variable (x)
β0 is the intercepted, aka the value of y when x is zero
β1 is the regression coefficient, meaning the expected change in y when x increases
x is the independent variable
∈ is the estimated error in the regression.

You can use simple linear regression:

 To determine how strongly two variables are correlated such as the rate of global
warming and carbon emissions.
 To determine the dependent variable's value based on an explicit independent
variable value. Finding the amount of atmospheric temperature increase associated
with a specific carbon dioxide emission, for instance.

2. Logistic regression

19
When the dependent variable is discrete, one form of regression analysis technique is used:
logistic regression. For instance, true or false, 0 or 1, etc. As a result, the target variable can only
take on two values, and the relationship between the target variable and the independent variable
is represented by a sigmoid curve, as shown in Fig 7.
In logistic regression, the logic function is used to quantify the connection between the
dependent and independent variables. The logistic regression equation is shown below.
It should be noted that while choosing logistic regression as the regression analyst approach, the
quantity of the data is significant and the occurrence of values in the target variables is almost
equal. Additionally, there shouldn't be any multicollinearity, which means that the dataset's
independent variables shouldn't be correlated with one another.

Fig 9 Sigmoid Curve

Logistic Regression in Machine Learning

The probability of a binary event occurring is determined using logistic regression, which is also
used to solve categorization problems. For instance, detecting if an incoming email is spam or
not, or predicting whether a credit card transaction is fraudulent or not.

o Logistic regression is one of the most popular Supervised Learning technique in Machine
Learning. It is used for predicting the categorical dependent variable using a given set of
independent variables.

20
o Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic
values which lie between 0 and 1.
o The main difference between linear regression and logistic regression is how they are
used. While logistic regression is used to solve classification difficulties, linear regression
is used to solve regression problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Because it can categorize new data using both continuous and discrete datasets, logistic
regression is a key machine learning algorithm.
o Logistic regression may be used to categorize observations using a wide range of data
types and can quickly identify the variables that will work best for the classification. The
logistic function is depicted in the picture below.

S- Curve for Logistic regression

Threshold

Fig 10. Logistic function

Assumptions:

Logistic regression presupposes that the dependent variable is categorical in character.

21
There shouldn't be any multi-collinearity in the independent variable.

Equation for Logistic Regression:

The equation for Logistic regression can be obtained from the Linear Regression
equation. The mathematical steps to get Logistic Regression equations are given below:

We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):

o But we need range between -[infinity] to +[infinity], then take logarithm of the equation it
will become:

The above equation is the final equation for Logistic Regression.

Type of Logistic Regression

There are three types of Logistic Regression based on the categories:

o Binomial: In binomial Logistic regression, there are only two possible types of the
dependent variables, E.g. 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there are 3 or more possible unordered
types of the dependent variable, such as "apple", "sweet lemon", or "orange".
o Ordinal: In ordinal Logistic regression, there are 3 or more possible ordered types of
dependent variables, such as "High", "Medium", or “Low".

University Questions

22
Q. 1 What is Machine learning? How is it different from Data mining? (5M)
Q.2 Define Machine learning. Explain with example importance of Machine learning. (5M)
Q.3 What are the key tasks of Machine learning? (5M)
Q.4 Explain how supervised learning is different from unsupervised learning. (5M)
Q.5 Explain steps in developing Machine learning application. (5M)

Q.6 Write short note on- (5M each


 Machine learning applications
 Training error and generalization error
 Overfitting and underfitting
Chapter Summary:

 Machine learning is a branch of artificial intelligence (AI) and computer science which
focuses on the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy. In short machine learning is –
i. The ability for a machine to automatically learn from data,
ii. Enhance performance based on prior experiences, and
iii. Make predictions
 Types of Machine Learning Algorithm
o Supervised Learning – “Teach me what to learn”
o Unsupervised Learning – “I will find what to learn”
o Reinforcement Learning – “I’ll learn from my mistakes at every step (Hit &
Trial)”

 Steps in Machine Learning


o Data Gathering
o Preprocessing/ Cleaning Data
o Model Building/ Selecting suitable ML Algorithm
o Training Model
o Testing Model
o Deployment and Evaluating Model
 Bias: Bias is a prediction error that is introduced in the model due to oversimplifying the
machine learning algorithms. Or it is the difference between the predicted values and the
actual values. It is an assumptions made by a model to make a function easier to learn. It
is actually the error rate of the training data. When the error rate has a high value, we call
it High Bias and when the error rate has a low value, we call it low Bias.
 Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs. It is the difference
between the error rate of training data and testing data is called variance. If the difference

23
is high then it’s called high variance and when the difference in errors is low then it’s
called low variance. Usually, we want to make a low variance for generalized our model.
 In supervised learning applications in machine learning and statistical learning theory,
generalization error (also known as the out-of-sample error) is a measure of how
accurately an algorithm is able to predict outcome values for previously unseen data.
[Wikipedia]

Training error is defined as the average loss that occurred during the training process. It
is given by:

 When a statistical model or machine learning algorithm is unable to recognize the


underlying pattern in the data, or when it only performs well on training data but badly on
testing data, this is referred to as underfitting.
 When a model tries to match erroneous data, it is said to be overfit. When working with
extremely complicated models, this can happen since the model will virtually always
match the provided data points and perform well on training datasets.
 Finding a mathematical connection between the objective output (dependent variable)
and the input features (independent variables) is the aim of regression analysis. The
output is a continuous variable since it can have any real value within a range.
 Decision Trees:
One of the most effective supervised learning methods for both classification and
regression applications is the decision tree. They make excellent solutions for numerous
machine learning issues because they are simple to understand and straightforward. A
decision tree simplifies complex decision-making into a sequence of straightforward
judgments based on features, which ultimately result in a choice or a numerical value.

24

You might also like