0% found this document useful (0 votes)
2 views

Unit -1 Deep Learning

The document provides an overview of deep learning, its foundations in artificial intelligence and machine learning, and key concepts such as neural networks and probabilistic modeling. It discusses the evolution of machine learning from early neural networks to modern deep learning techniques, highlighting significant breakthroughs and applications in various fields. Additionally, it covers the historical context of machine learning, including milestones and the role of kernel methods in solving complex problems.

Uploaded by

aiml22
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Unit -1 Deep Learning

The document provides an overview of deep learning, its foundations in artificial intelligence and machine learning, and key concepts such as neural networks and probabilistic modeling. It discusses the evolution of machine learning from early neural networks to modern deep learning techniques, highlighting significant breakthroughs and applications in various fields. Additionally, it covers the historical context of machine learning, including milestones and the role of kernel methods in solving complex problems.

Uploaded by

aiml22
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

SACET-CHIRALA

Deep Learning(R2032423)

Unit-1

Fundamentals of Deep Learning: Artificial Intelligence, History of Machine learning:


Probabilistic Modeling, Early Neural Networks, Kernel Methods, Decision Trees, Random
forests and Gradient Boosting Machines, Fundamentals of Machine Learning: Four
Branches of Machine Learning, Evaluating Machine learning Models, Overfitting and
Underfitting. [Text Book 2]

----------------------------------------------------------------------------------------------------------------
--------

Introduction to Deep Learning (Part-1)

Deep learning is a branch of machine learning which is based on artificial neural


networks. It is capable of learning complex patterns and relationships within data. In deep
learning, we don’t need to explicitly program everything. It has become increasingly popular
in recent years due to the advances in processing power and the availability of large datasets.
Because it is based on artificial neural networks (ANNs) also known as deep neural network s
(DNNs). These neural networks are inspired by the structure and function of the human
brain’s biological neurons, and they are designed to learn from large amounts of data.

Deep learning is a specific subfield of machine learning: a new way for learning
representationsfrom data that puts an importance on learning successive layers of
increasinglymeaningful representations.

The “deep” here stands for the idea of successive layers of representations. How many layers
contribute to a model of the data iscalled the depth of the model. Other appropriate names for
the field could have beenlayered representations learning and hierarchical representations
learning.Modern deeplearning often involves tens or even hundreds of successive layers of
representations.In deep learning, these layered representations are (almost always) learned
viamodels called neural networks.

Artificial Intelligence, Machine Learning, and Deep Learning

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Figure 1: Artificial intelligence,machine learning, and deep learning


Artificial intelligence:
 Artificial intelligence was born in the 1950, when some pioneers from
computer science started thinking whether the computers could be made to
“think”.
 The definition is as follow, “the effort to automate intellectual tasks normally
performed by humans”.
 AI is a general field that encompasses machine learning and deep learning
 Early chess games used only hardcoded rules which do not qualify as machine
learning, but later the human intelligence is integrated in the form of explicit
rules for taking the decision as human do. This type of approach is called
Symbolic AI.
 It was dominant paradigm from 1950 to 1980, before the expert systems came
into existence.
Machine learning:
 Lady Ada Lovelace was a friend and collaborator of Charles Babbage, the
inventor of Analytical Engine.
 In those days the Analytical Engine was used to automate mechanical
operations to perform mathematical operations.
 The limitation of the Analytical Engine is that it just assisted humans, but
cannot take decisions on its own.
 In the Year 1950, Alan Turing introduced the Turing Test and also key
concepts that shaped AI.
 Machine learning arises from this question: could a computer go beyond
“what we know how to order it to perform”. This enables the computer to
learn the data processing rules from the data itself.
 In classical programming that is Symbolic AI, the programmer inputs the rules
and data to be processed using these rules, and the system will produce output
in the form of answers.
 A machine-learning system is trained rather than explicitly programmed.
 It started flourishing from 1990 and has become most successful subfield of
AI.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Symbolic AI (Classical Programming) Machine Learning

Learning representations from data:


 Before understanding the difference between deep learning and other learning
approaches it is good to know the idea of what machine learning algorithms
do.
 Every machine learning model expects THREE things:
o Input data points
o Examples of the expected output
o A way to measure
 A machine-learning model transforms its input data into meaningful outputs.
The central problem in machine learning and deep learning is meaningfully
transform data.
 Let us take an example to understand THREE things. Consider an x-axis, and
y-axis, and some points represented by their coordinates in the (x, y) system,
as shown in figure.

Figure 2: Different Transformations


 As you can see, we have a few white points and a few black points. Let’s
develop a model that can use the coordinates of the points and determine
whether that point is “BLACK” or “WHITE”. (Eg: K-Means)
 In this case the
o The inputs are coordinates of the points
o The outputs are “BLACK’ and “WHITE” Colors.
o The measure is percentage that clearly gives how many points are
correctly classified.
 What we need here is a new representation that clearly separates white from
black points.
 If we are searching the different possible coordinate change and come up with
a solution which has good percentage of points being classified correctly.
Then it becomes a machine learning model.

The deep learning:

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

 Deep learning is a mathematical framework for learning representations from


data and is sub field of AI.
 Modern deep learning often involves tens or even hundreds of successive
layers of representations—and they’re all learned automatically from exposure
to training data.
 In deep learning, these layered representations are (almost always) learned via
models called neural networks.
 The term neural network is a reference to neurobiology; some of the concepts
are derived from the inspiration from human brain.
 Let us look at one example how deep learning works to recognize the digit
from the hand written image.
 The network transforms the digit image into different representations from the
original image and increasingly informative about the final result.

Figure 3:A deep neural network for digit classification


 It appears to be multistage information- distillation operation, where
information goes through successive filters and comes out increasingly
purified.
How Deep Learning Works
 At this point the machine learning maps the input into targets by observing the
examples.
 Whereas the Deep learning do this input-to-target mapping via a deep
sequence of simple data transformations (Layers) and these transformations
are learned from the examples.
 The specification of what each layer does to its input will be stored in the
layer’s weights, which are bunch of numbers.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Figure 4: The loss score is used as a feedback signal to adjust the weights

 In technical terms the transformation is parametrized by the layer’s weights.


These weights sometimes also called parameters of layers. Initially these are
set using random values.
 In this context, learning means finding aset of values for the weights of all
layers in a network, such that the network will correctly map example inputs
to their associated targets
 Finding the correct value for all of them may a daunting (frightening) task,
because the change in one parameter will affect other layers.
 To control the neural network, first we have to observe predicted value, and
we need to measure how far this output is from what you expected. This is the
job of the loss function of the network, also called the objective function.
 The loss function takes the predictions of the network and the true target(what
you wanted the network to output) and computes a distance score.
 Since the weights are initialized randomly using random process, the Loss
score obviously high.
 But with every example (item or image) the network processes, the weights
are adjusted a little in the correct direction, and the loss score decreases.
 This is the training loop, which is repeated sufficient number of times to
reduce the loss score. Then The outputs will be close to the targets.
Applications of Deep Learning

In particular, deep learning has achieved the following breakthroughs, all in


historicallyDifficult areas of machine learning:

 Near-human-level image classification


 Near-human-level speech recognition
 Near-human-level handwriting transcription
 Improved machine translation
 Improved text-to-speech conversion
 Digital assistants such as Google Now and Amazon Alexa

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

 Near-human-level autonomous driving


 Improved ad targeting, as used by Google, Baidu, and Bing
 Improved search results on the web
 Ability to answer natural-language questions
 Superhuman Go playing

A brief history of machine learning

The history of machine learning dates back several decades and has undergone significant
developments over time. Here's a brief overview of the key milestones in the history of
machinelearning:
1. Early Foundations (1950s-1960s):
- The field of machine learning emerged from the intersection of computer
science and statistics, with early pioneers including Alan Turing and Arthur
Samuel.
- In 1950, Alan Turing proposed the "Turing Test" as a way to measure a machine's
ability toexhibit intelligent behavior.
- In the 1950s, Arthur Samuel developed the concept of machine learning by creating
programs that could improve their performance over time through experience, specifically
in the domain ofgame-playing, such as checkers.
2. Symbolic AI and Expert Systems (1960s-1980s):
- During this period, researchers focused on symbolic AI and expert systems, which
relied onrules and logical reasoning.
- Machine learning took a backseat as rule-based systems dominated the field, with
projects like DENDRAL (a system for molecular biology) and MYCIN (a system for
diagnosing bacterialinfections) gaining attention.
3. Connectionism and Neural Networks (1980s-1990s):
- Interest in neural networks and connectionism resurged during this period.
- Backpropagation, a widely used algorithm for training neural networks, was developed
in the1980s.
- The field saw advancements in areas such as pattern recognition and speech
recognition, fueled by neural network models like the Multi-Layer Perceptron (MLP).
4. Statistical Learning and Data-Driven Approaches (1990s-2000s):
- Researchers started emphasizing statistical learning and data-driven approaches.
- Support Vector Machines (SVMs) gained popularity for classification tasks, offering
strongtheoretical foundations.
- The field saw the emergence of ensemble methods, such as Random Forests and
Boosting,which combined multiple models to improve performance.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

5. Big Data and Deep Learning (2010s-present):


- The rise of big data, increased computational power, and advancements in deep
learning models revolutionized the field.
- Deep learning, specifically Convolutional Neural Networks (CNNs) and Recurrent
NeuralNetworks (RNNs), achieved remarkable success in computer vision, speech
recognition, and natural language processing.
- Deep learning frameworks like TensorFlow and PyTorch gained widespread
adoption,making it easier for researchers and practitioners to build and train deep
neural networks.
Today, machine learning is a rapidly evolving field that continues to push
boundariesin areas such as reinforcement learning, generative models, and explain ability.
It has become anintegral part of numerous applications, including recommendation
systems, fraud detection, autonomous vehicles, and personalized medicine, among many
others.

Deep learning has got more public attention in the recent times and industries also
have invented never before seen in the history. The deep learning may not solve all the
problems, it needs sufficient data. Sometimes other machine learning methods could solve the
problem efficiently than deep learning.

Probabilistic Modeling:

Probabilistic modeling is the process of applying the principles of statistics to perform data
analysis.

Here are some key aspects and applications of probabilistic modeling:


1. Probability Distributions: In probabilistic modeling, we assign probability
distributions to uncertain variables. These distributions describe the likelihood of
different values the variablescan take. Commonly used probability distributions include
the Gaussian (normal) distribution,Bernoulli distribution, Poisson distribution, and more.
2. Bayesian Inference: Bayesian inference is a fundamental approach in probabilistic
modeling that allows us to update our beliefs about uncertain variables based on observed
data. It combinesprior knowledge or beliefs (expressed as prior distributions) with
observed data to obtain posterior distributions, which represent our updated beliefs.
3. Generative Models: Probabilistic modeling enables the construction of generative
models, which can generate new samples that resemble the observed data. Generative
models learn theunderlying probabilistic structure of the data and can be used for tasks
such as data generation,anomaly detection, and missing data imputation.
4. Bayesian Networks: Bayesian networks, also known as probabilistic graphical models,

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

are graphical representations of probabilistic dependencies among variables. They use


directed acyclic graphs to model the conditional dependencies and allow efficient inference
and reasoningabout the joint distribution of variables.
5. Uncertainty Quantification: Probabilistic modeling provides a natural framework for
quantifying and expressing uncertainty. By representing uncertain variables as probability
distributions, we can estimate confidence intervals, calculate probabilities of different
outcomes,and assess the uncertainty associated with predictions or decisions.
6. Applications: Probabilistic modeling finds applications in various fields, including
finance, healthcare, natural language processing, computer vision, and more. It is used
for tasks such asrisk assessment, fraud detection, recommendation systems, sentiment
analysis, image recognition, and predictive modeling.
Notable probabilistic modeling techniques include Bayesian regression, Hidden
MarkovModels (HMMs), Gaussian Processes (GPs), and Variational Autoencoders
(VAEs). These techniques provide powerful tools for modeling complex systems and
making principled inferences in the presence of uncertainty.

Early Neural Networks:

Figure 5: Structure of Neural Networks

 The early neural networks have been replaced by the modern neural networks.
 The early neural networks have laid the path to the deep learning. The core
idea of neural networks coined in the year 1950, and due its structure itself
was ignored for decades.
 When some people independently rediscovered the Backpropagation
algorithm has initiated the neural networks again.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

 The backpropagation is used to optimize the parameters or weights used in


the neural network using the gradient descent optimization by control the
learning rate.
 The first successful practical application of neural nets came in 1989 from
Bell Labs, when Yann LeCun combined the earlier ideas of convolutional
neural networks and backpropagation, and applied them to the problem of
classifying handwritten digits.

Kernel Methods:

Kernel methods are a family of machine learning techniques that operate in a high-
dimensional feature space implicitly through a kernel function. They are particularly useful
for solving complex nonlinear problems while preserving the computational efficiency of
linear methods. Kernel methods have applications in various fields, including
classification, regression, dimensionality reduction, and anomaly detection.
The kernel methods are group of classification algorithms. The support vector
machine is one of the best known algorithm under this category.SVM was developed by
Vladimir Vapnik and cornnacortes in 1990s at Bell Labs.SVMs aim at solving classification
problems by finding good decision boundariesbetween two sets of points belonging to two
different categories. This decision boundary is a line which can be linear or non-linear and
separates two spaces belong to two categories.SVMs proceed to find these boundaries in two
steps:

 The data is mapped to a new high-dimensional representation where the


decision boundary can be expressed as a hyperplane.
 A good decision boundary is computed by maximizing the distance between
two closest data points on either side, which is also called “margin”.

The process of mapping the data to a high-dimensional space can be carried out using the
Kernel methods. An example of kernel methods is given below.

 These Kernel methods are used to transform the non-linear data into linear
(Ex: y=power(x,2)).

 Let us consider a small dataset as shown below:

 If we use only the first Feature, that is x. It appears to be non-linear.

x y=power(x,2)
1.2 1.44
1.4 1.96
1.3 1.69
1.5 2.25
1.3 1.69

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

1.2 1.44
x
2
1.5
1
x
0.5
0
0 2 4 6 8

But, if we add second feature using the polynomial expression y=power(x,2), then the dataset
becomes linearly separable as shown below.

y=power(x,2)
2.5

1.5

1 y=power(x,2)

0.5

0
0 0.5 1 1.5 2

Figure 6: Polynomial Kernel

Here are some key aspects of kernel methods:


1. Kernel Functions: A kernel function measures the similarity or distance between pairs
of data points in the input space. It takes two inputs and returns a similarity measure or
inner product in a high-dimensional feature space. Popular kernel functions include the
linear kernel, polynomial kernel, Gaussian (RBF) kernel, and sigmoid kernel.
2. Kernel Trick: The kernel trick is a central concept in kernel methods. It allows us to
implicitlymap the original input space into a higher-dimensional feature space without
explicitly computing the transformed features. This is computationally efficient as it
avoids the need to compute and store the high-dimensional feature representations
explicitly.
3. Support Vector Machines (SVM): SVM is a widely used kernel-based algorithm for
classification and regression tasks. It aims to find a hyperplane that separates data points of
different classes while maximizing the margin between the classes. SVMs use kernel
functions toimplicitly operate in a high-dimensional feature space and find the optimal
decision boundary.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

4. Kernel PCA: Kernel Principal Component Analysis (PCA) is an extension of


traditional PCAthat uses kernel functions to perform nonlinear dimensionality reduction.
It captures nonlinearnrelationships in the data by mapping it to a high-dimensional
feature space and computing principal components in that space.
5. Gaussian Processes (GPs): Gaussian processes are probabilistic models that use kernel
functions to define the covariance structure between data points. GPs are flexible and can
model complex nonlinear relationships while providing uncertainty estimates. They are
used for regression, classification, and Bayesian optimization tasks.
6. Kernel-based Clustering: Kernel methods can also be applied to clustering algorithms,
such asKernel K-means and Spectral Clustering. These methods use kernel functions to
measure similarity or dissimilarity between data points and group them into clusters.

Decision Trees:

Decision trees are Tree-like structures that let you classify input data points or
predictoutput values given inputs as Shown in the Figure 7.Decision Tree is a supervised
learning technique that can be used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems.They’re easy to visualize and
interpret.IT contains 3 main elements: Decision Nodes, Branch, and Leaf Nodes. The
Decision nodes can have multiple branches whereas the Leaf nodes cannot contain any
further branches.

Figure 7: Decision Tree


Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It represents the

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

entire dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodesaccording to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other
nodesare called the child nodes.

Algorithm

• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node.

Example:

Figure 8: Example Decision Tree to Accept Offer

Random Forest:

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML. It
is a collection of large number of specialized decision trees. It is based on the concept
of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model. The greater number of trees
in the forest leads to higher accuracy and prevents the problem of overfitting.

Figure 9: Random Forest

For the same data different decision trees are created, instead of depending on one decision
tree, the random forest takes the decision from each tree and based on the majority votes of
prediction the final output will be predicted.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Advantages of Random Forests:


1. It takes less time to train model as compared to other algorithms
2. It gives high accuracy
3. IT can also maintain when large portion of data is missing.

Gradient Boosting Machine:

Gradient Boosting Machines (GBMs) are a powerful ensemble learning method that
combines multiple weak prediction models, typically decision trees, to create a strong
predictive model. GBMs iteratively build an ensemble of models by optimizing a loss
function in a gradient descent manner, focusing on reducing the errors made by the
previous models in the ensemble. They are known for their effectiveness in a wide range
of machine learning tasks, including regression and classification.

Here are the key characteristics and concepts of Gradient Boosting Machines:

1. Boosting: GBMs belong to the boosting family of algorithms, where weak models are
sequentially trained to correct the mistakes of the previous models. Each subsequent model
in the ensemble focuses on reducing the errors made by the previous models, leading to an
ensemble with improved overall predictive performance.
2. Gradient Descent: GBMs optimize the ensemble by minimizing a differentiable loss
function using gradient descent. The loss function measures the discrepancy between the
predicted values and the true values of the target variable. Gradient descent updates the
model parameters in the direction of steepest descent to iteratively improve the model's
predictions.
3. Weak Learners: GBMs use weak learners as building blocks, typically decision trees
with a small depth (often referred to as "shallow trees" or "decision stumps"). These weak
learners are simple models that make predictions slightly better than random guessing.
They are usually shallow to prevent overfitting and to focus on capturing the specific
patterns missed by previous models.
4. Residuals: In GBMs, the subsequent weak learners are trained to predict the residuals
(the differences between the true values and the predictions of the ensemble so far). By
focusing on the residuals, the subsequent models are designed to correct the errors made
by the previous models and improve the overall prediction accuracy.
5. Learning Rate: GBMs introduce a learning rate parameter that controls the contribution
of each weak learner to the ensemble. A smaller learning rate makes the learning process
more conservative, slowing down the convergence but potentially improving the
generalization ability.
6. Regularization: To prevent overfitting, GBMs often include regularization techniques.
Common regularization methods include limiting the depth or complexity of the weak
learners, applying shrinkage (reducing the impact of each weak learner), and using
subsampling techniques to train each weak learner on a random subset of the data.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

7. Feature Importance: GBMs can provide estimates of feature importance based on how
frequently and effectively they are used in the ensemble. This information helps identify
the most informative features for the task.
Gradient Boosting Machines, particularly popular implementations such as
XGBoost, LightGBM, and CAT Boost, have achieved state-of-the-art performance in
various machine learning competitions and real-world applications. They excel at handling
complex, high-dimensional data and have become an essential tool in the machine learning
practitioner's toolkit.

Fundamentals of Machine Learning (Continue)

Four branches of machine learning


Basically there are three types of machine learning problems such as Binary
classification, multiclass classification and scalar regression. All these are instance of
supervised learning algorithms. The goal is to learn the relationship between input variables
and targets. These machine learning algorithms are categorized into 4 categories as follow:
1. Supervised learning
2. Unsupervised learning
3. Self-supervised learning
4. Reinforcement learning

1. Supervised Machine Learning


As its name suggests, Supervised machine learning is based on supervision. It means in
the supervised learning technique, we train the machines using the "labelled" dataset, and
based on the training, the machine predicts the output. Here, the labelled data specifies
that some of the inputs are already mapped to the output. More preciously, we can say;
first, we train the machine with the input and corresponding output, and then we ask the
machine to predict the output using the test dataset.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture
of a cat and ask the machine to identify the object and predict the output. Now, the
machine is well trained, so it will check all the features of the object, such as height, shape,
colour, eyes, ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This
is the process of how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x)
with the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.

Categories of Supervised Machine Learning


Supervised machine learning can be classified into two types of problems, which are given
below:

• Classification
• Regression

a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

• Random Forest Algorithm


• Decision Tree Algorithm
• Logistic Regression Algorithm
• Support Vector Machine Algorithm
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous
output variables, such as market trends, weather prediction, etc.
Some popular Regression algorithms are given below:

• Simple Linear Regression Algorithm


• Multivariate Regression Algorithm
• Decision Tree Algorithm
• Lasso Regression
Advantages and Disadvantages of Supervised Learning
Advantages:

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

• Since supervised learning work with the labelled dataset so we can have an
exact ideaabout the classes of objects.
• These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:

• These algorithms are not able to solve complex tasks.


• It may predict the wrong output if the test data is different from the training data.
• It requires lots of computational time to train the algorithm.
Applications of Supervised Learning
Some common applications of Supervised Learning are given below:

• ImageSegmentation:
Supervised Learning algorithms are used in image segmentation. In this
process,image classification is performed on different image data with pre-
defined labels.

• MedicalDiagnosis:
Supervised algorithms are also used in the medical field for diagnosis
purposes. It is done by using medical images and past labelled data with labels
for disease conditions. With such a process, the machine can identify a disease
for the new patients.

• Fraud Detection - Supervised Learning classification algorithms are used for


identifying fraud transactions, fraud customers, etc. It is done by using historic data
to identify the patterns that can lead to possible fraud.
• Spam detection - In spam detection & filtering, classification algorithms are used.
These algorithms classify an email as spam or not spam. The spam emails are sent
to the spam folder.
• Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications
can be done using the same, such as voice-activated passwords, voice commands,
etc.
2. Unsupervised Machine Learning
Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabeled dataset, and the machine predicts the output without
any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown
to the model, and the task of the machine is to find the patterns and categories of the
objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
Categories of Unsupervised Machine Learning
Unsupervised Learning can be further classified into two types, which are given below:

• Clustering
• Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data.
It is a way to group the objects into a cluster such that the objects with the most
similarities remain in
one group and have fewer or no similarities with the objects of other groups. An example
of the clustering algorithm is grouping the customers by their purchasing behaviour.
Some of the popular clustering algorithms are given below:

• K-Means Clustering algorithm


• Mean-shift algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis
2) Association
Association rule learning is an unsupervised learning technique, which finds interesting
relations among variables within a large dataset. The main aim of this learning algorithm is
to find the dependency of one data item on another data item and map those variables
accordingly so that it can generate maximum profit. This algorithm is mainly applied in
Market Basket analysis, Web usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growthalgorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm


Advantages:

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

• These algorithms can be used for complicated tasks compared to the supervised
onesbecause these algorithms work on the unlabeled dataset.
• Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset
is easier as compared to the labelled dataset.
Disadvantages:

• The output of an unsupervised algorithm can be less accurate as the dataset


is notlabelled, and algorithms are not trained with the exact output in prior.
• Working with Unsupervised learning is more difficult as it works with the
unlabelleddataset that does not map with the output.
Applications of Unsupervised Learning
• Network Analysis: Unsupervised learning is used for identifying plagiarism and
copyright in document network analysis of text data for scholarly articles.
• Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.

• Anomaly Detection: Anomaly detection is a popular application of unsupervised


learning, which can identify unusual data points within the dataset. It is used to
discover fraudulent transactions.
• Singular Value Decomposition: Singular Value Decomposition or SVD is used to
extract particular information from the database. For example, extracting
information of each user located at a particular location.

3. Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between
Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabeled
datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it mostly
consists of unlabeled data. As labels are costly, but for corporate purposes, they may have
few labels. It is completely different from supervised and unsupervised learning as they are
based on the presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning


algorithms, the concept of Semi-supervised learning is introduced. The main aim of
semi-supervised learning is to effectively use all the available data, rather than only
labelled data like in supervised learning. Initially, similar data is clustered along with an

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

unsupervised learning algorithm, and further, it helps to label the unlabeled data into
labelled data. It is because labelled data is a comparatively more expensive acquisition than
unlabeled data.

We can imagine these algorithms with an example. Supervised learning is where a student
is under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under
unsupervised learning. Under semi-supervised learning, the student has to revise himself
after analyzing the same concept under the guidance of an instructor at college.
Advantages and disadvantages of Semi-supervised Learning
Advantages:

• It is simple and easy to understand the algorithm.


• It is highly efficient.
• It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:

• Iterations results may not be stable.


• We cannot apply these algorithms to network-level data.
• Accuracy is low.
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance. Agent gets
rewarded for each good action and get punished for each bad action; hence the goal of
reinforcement learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning
is to play a game, where the Game is the environment, moves of an agent at each step
define states, and the goal of the agent is to get a high score. Agent receives feedback in
terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such as
Game theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision


Process(MDP). In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
Categories of Reinforcement Learning

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Reinforcement learning is categorized mainly into two types of methods/algorithms:

• Positive Reinforcement Learning: Positive reinforcement learning specifies


increasing the tendency that the required behavior would occur again by adding
something. It enhances the strength of the behavior of the agent and positively
impacts it.
• Negative Reinforcement Learning: Negative reinforcement learning works
exactly opposite to the positive RL. It increases the tendency that the specific
behaviour would occur again by avoiding the negative condition.
Real-world Use cases of Reinforcement Learning
• VideoGames:
RL algorithms are much popular in gaming applications. It is used to gain super-
human performance. Some popular games that use RL algorithms are AlphaGO and
AlphaGO Zero.
• ResourceManagement:
The "Resource Management with Deep Reinforcement Learning" paper showed that
howto use RL in computer to automatically learn and schedule resources to wait for
different jobs in order to minimize average job slowdown.
• Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial
and manufacturing area, and these robots are made more powerful with
reinforcement learning. There are different industries that have their vision of
building intelligent robotsusing AI and Machine learning technology.
• TextMining
Text-mining, one of the great applications of NLP, is now being implemented with
the help of Reinforcement Learning by Salesforce company.
Advantages and Disadvantages of Reinforcement Learning
Advantages

• It helps in solving complex real-world problems which are difficult to be


solved by general techniques.
• The learning model of RL is similar to the learning of human beings; hence most
accurateresults can be found.
• Helps in achieving long term results.
Disadvantage

• RL algorithms are not preferred for simple problems.


• RL algorithms require huge data and computations.
• Too much reinforcement learning can lead to an overload of states which can

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

weaken theresults.
The curse of dimensionality limits reinforcement learning for real physical systems.

Evaluating Machine learning Models

Machine learning is a field of study and application that focuses on developing algorithms
and models that enable computers to learn and make predictions or decisions without being
explicitly programmed. It involves the development of mathematical and statistical
techniques that allow systems to automatically learn patterns and relationships from data
and improve their performance through experience.
Once the model is trained it is evaluated to know the performance. The model is
evaluated on the data which is never-before-seen. If the evaluation is done on the same data it
leads to the model overfitting. Hence the training data will be split into three sets.

Training, Validation and Test sets:


The data will be split into three sets: training set, validation set, and test set. The
model is trained using the training set and is evaluated using the validation set. This will help
to fine tune the model. One final test will be conducted on the test set. We may get doubt; it
may be done with two sets: training and test sets which will be much simpler. Yes, we can do
that also which has been followed by most of the machine learning models, which is much
simpler. The main goal of nay machine learning is to achieve models that generalize the
performance well on never-before-seen data. The tuning will be done in the form of learning-
which is a search for a good solution in the parameter space.
Tuning the model parameters based on the performance on the validation set can
quickly put the model in overfitting to validation set. The central problem here is, the
moment when your model is trained and is validated using validation set, the information
leak happens. If the validation is done number of times on the validation set, then the
significant number of amount of information leak will happen into the model.
At the end the model obviously work well on the validation set, but new data that is
never-before-seen is given its performance will the decreasing. We have to take enough cate
to test our model on completely different data, on the only on the validation set. Here comes
the role of the test set. There are different techniques to divide the data into three sets. They
are as follow:
1. Simple Hold-out validation,

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

2. K-fold validation,
3. Iterated K-fold validation with shuffling
SIMPLE HOLD-OUT VALIDATION:
Here dataset will be divided into two parts: Training set and Hold-out Validation set. The
model is trained using the Training set and is tested with Validation set. This is preferred to
prevent information leaks that occur when we divide that data into Three Parts: Training,
Validation and Test sets. Before starting the process the random shuffling can be done to mix
the data well.

Figure 10: Hold-out validation split


Drawback:
Though this is a simple protocol, it suffers from one drawback. If the dataset size is
small then obviously we have only few samples or records in the validation set. Hence, the
model may not be tuned well. This can be addressed with help of K-Folds Validation and
Iterated K-Fold validation.

K-FOLD VALIDATION
Here we split your data into K partitions of equal size.For each partition i, train a
model on the remaining K – 1 partitions, and evaluate it on partition i.The Same process is
repeated for K Times. The final score of the model is the average of all the scores obtained in
K Scores. This is preferred when your model is giving significance variance on the test set.
Here, only one fold may not be considered as validation set.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Figure 11: K-Fold Validation


ITERATED K-FOLD VALIDATION WITH SHUFFLING
This one is for situations in which you have relatively little data available and you needto
evaluate your model as precisely as possible.It consists of applying K-fold validation
multiple times,shuffling the data every time before splitting it K ways.The final score is
the average of thescores obtained at each run of K-fold validation.

Overfitting and Underfitting


Overfitting and underfitting are two common problems in machine learning that arise
when a model's performance on the training data does not generalize well to unseen
data. These issues affect the model's ability to make accurate predictions on new
examples. Understanding overfitting and underfitting is crucial for building reliable
and effective machine learning models
The central challenge in machine learning is that we must perform well on new,
previouslyunseen inputs—not just those on which our model was trained. The ability to
perform well on previously unseen inputs is called generalization.

We can compute some error measure on the training set called the trainingerror, and we
reduce this training error.What separates machine learning from optimization isthat we want
the generalization error, also called the test error to be low aswell. The generalization error
is defined as the expected value of the error on anew input.

We typically estimate the generalization error of a machine learning model bymeasuring its
performance on a test setof examples that were collected separatelyfrom the training set. The
test error will be computer using the MSE (Means Square Error) as follow:

1. Measuring the distance of the observed y-values from the predicted y-values at each
value of x;(y-y`)
2. Squaring each of these distances;Eg: (y-y`)2
3. Calculating the mean of each of the squared distances. 1/n* (y-y`)2

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

The factors determining how well a machine learning algorithm will perform are its ability to:

1. Make the training error small.


2. Make the gap between training and test error small.

These two factors correspond to the two central challenges in machine learning: Underfitting
and overfitting. Underfitting occurs when the model is not able to obtain a sufficiently low
error value on the training set. That means the model has not learned from the training
sufficient enough. Overfitting occurs when the gap between the training error and test error is
too large.In this case the model has learned completely from the training set and results in
low training error and when the new item or sample is given the difference between the
training error and test error will be large.

The capacity play the major role in controlling the Underfitting and Overfitting. The capacity
is nothing the number of functions that are applied on the dataset to fit it. Models with low
capacity may struggle to fit the training set. Models with high capacity can Overfit.

To prevent a model from learning misleading or irrelevant patterns found in thetraining data,
the best solution is to get more training data. A model trained on more datawill naturally
generalize better.The processing of fighting overfitting this way is called regularization.
Let’s review some of the most common regularization techniques:

1. Reducing the network’s size

The simplest way to prevent overfitting is to reduce the size of the model: the number
of learnable parameters in the model. It is often referred as Capacity.

2. Adding weight regularization -

A simple model in this context is a model where the distribution of parameter values
has less entropy (or a model with fewer parameters, as you saw in the previous
section). Thus a common way to mitigate overfitting is to put constraints on the
complexityof a network by forcing its weights to take only small values, which makes
thedistribution of weight values more regular. This is called weight regularization. It
is done with help of cost function.This cost comes in two flavors:

 L1 regularization—The cost added is proportional to the absolute value


of theweight coefficients (the L1 norm of the weights).
 L2 regularization—The cost added is proportional to the square of the
value of theweight coefficients (the L2 norm of the weights). L2
regularization is also calledweight decay in the context of neural
networks

3. Adding dropout

Dropout is one of the most effective and most commonly used regularization
techniques for neural networks, developed by Geoff Hinton and his students at the
University of Toronto. Dropout, applied to a layer, consists of randomly dropping out
(setting to zero) a number of output features of the layer during training.

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem


SACET-CHIRALA

Let’s say a given layer would normally return a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a
given input sample during training. After applying dropout, this vector will have a
few zero entries distributed at random: for example, [0, 0.5, 1.3, 0, 1.1].

DEPARTMENT OF CSE- AIML(Dr.PSNK) III Year-II Sem

You might also like