0% found this document useful (0 votes)

192 views58 pages

Machine Learning Report

This project report summarizes the key concepts and methods learned from an online Machine Learning course taken through Internshala. The report includes an introduction to supervised and unsupervised learning methods. It also provides overviews of Bayesian decision theory, parametric vs non-parametric methods, neural networks, and artificial neural network structure and working. The report was submitted in partial fulfillment of a Bachelor of Technology degree in Electronics and Communication Engineering.

Uploaded by

Indraysh Vijay [EC - 76]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

192 views58 pages

Machine Learning Report

Uploaded by

Indraysh Vijay [EC - 76]

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 58

A

Project Report
On

MACHINE LEARNING

TAKEN AT

“INTERNSHALA”

Submitted in partial fulfilment for the award of degree of

Bachelor of Technology

Electronics and Communication Engineering

2021-22

Guided by: Submitted by:

Department of Electronic and Communication Engineering

Acknowledgement
I am grateful to INTERNSHALA for providing me a quality
education and helping me to make understand Machine Learning
online. I would like to also thank my institute, Jaipur Engineering
college and Research Center, for giving permission and necessary
administrative support to take up the online course and guiding me
throughout the course as well.

Deepest thank to our Trainer Mr. Sarvesh Agarwal for his guidance,
good content, quality education and making such a good course for
understanding Machine learning. He has made some of the quizzes
and questions after some topics throughout the course so that I can
check my self-learning also.
Preface

The purpose of this document is to provide a conceptual

introduction to statistical or machine learning (ML) techniques for
those that would not normally be exposed to such approaches during
their typical required statistical training. Machine learning can be
described as a form of statistical analysis, often even utilizing well-
known and familiar techniques, that has bit of a different focus than
traditional analytical practice in applied disciplines. The key notion is
that flexible, automatic approaches are used to detect patterns within
the data, with a primary focus on making predictions on future data.
Table of contents
• Introduction
• Supervised learning
• Bayesian decision theory
• Parametric methods
• Neural network or Artificial neural network
• Back-propagation
• Deep neural network or Deep learning
• Linear regression
• Logistic regression
• K-Nearest neighbors
• Random forest
• Ensemble learning
• Gradient boosted Decision trees
• Overfitting
• Underfitting
• Regularization
• L1 and L2 regularization
• Performance metrics for regression
Introduction

Machine learning is a subfield of artificial intelligence (AI). The goal of

machine learning generally is to understand the structure of data and fit that
data into models that can be understood and utilized by people.

Although machine learning is a field within computer science, it

differs from traditional computational approaches. In traditional
computing, algorithms are sets of explicitly programmed instructions
used by computers to calculate or problem solve.
Machine learning algorithms instead allow for computers to train on
data inputs and use statistical analysis in order to output values that
fall within a specific range. Because of this, machine learning
facilitates computers in building models from sample data in order to
automate decision-making processes based on data inputs.
Machine Learning Methods
In machine learning, tasks are generally classified into broad categories. These
categories are based on how learning is received or how feedback on the
learning is given to the system developed.
Two of the most widely adopted machine learning methods are supervised
learning which trains algorithms based on example input and output data that
is labeled by humans, and unsupervised learning which provides the
algorithm with no labeled data in order to allow it to find structure within its
input data. Let’s explore these methods in more detail.

Supervised Learning
In supervised learning, the computer is provided with example inputs
that are labeled with their desired outputs. The purpose of this method
is for the algorithm to be able to “learn” by comparing its actual
output with the “taught” outputs to find errors, and modify the model
accordingly. Supervised learning therefore uses pattern to predict
label values on additional unlabeled data.
A common use case of supervised learning is to use historical data to
predict statistically likely future events.

Unsupervised Learning
In unsupervised learning, data is unlabeled, so the learning algorithm
is left to find commonalities among its input data. As unlabeled data
are more abundant than labeled data, machine learning methods that
facilitate unsupervised learning are particularly valuable.
The goal of unsupervised learning may be as straightforward as
discovering hidden patterns within a dataset, but it may also have a
goal of feature learning, which allows the computational machine to
automatically discover the representations that are needed to classify
raw data.
Unsupervised learning is commonly used for transactional data. You
may have a large dataset of customers and their purchases, but as a
human you will likely not be able to make sense of what similar
attributes can be drawn from customer profiles and their types of
purchases. With this data fed into an unsupervised learning algorithm,
it may be determined that women of a certain age range who buy
unscented soaps are likely to be pregnant, and therefore a marketing
campaign related to pregnancy and baby products can be targeted to
this audience in order to increase their number of purchases.
Without being told a “correct” answer, unsupervised learning
methods can look at complex data that is more expansive and
seemingly unrelated in order to organize it in potentially meaningful
ways. Unsupervised learning is often used for anomaly detection
including for fraudulent credit card purchases, and recommender
systems that recommend what products to buy next.
Introduction to Bayesian
Decision Theory
Whether you are building Machine Learning models or making
decisions in everyday life, we always choose the path with the least
amount of risk. As humans, we are hardwired to take any action that
helps our survival; however, machine learning models are not initially
built with that understanding. These algorithms need to be trained and
optimized to choose the best option with the least amount of risk.
Additionally, it is important to know that some risky decisions can
lead to severe consequences if they are not correct.

Bayes’ Theorem
One of the most well-known equations in the world of statistics and
probability is Bayes’ Theorem (see formula below). The basic
intuition is that the probability of some class or event occurring,
given some feature (i.e., attribute), is calculated based on the
likelihood of the feature’s value and any prior information about the
class or event of interest. This seems like a lot to digest, so I will
break it down for you. First off, the case of cancer detection is a two-
class problem. The first class, Ω1, represents the event that a tumor is
present, and ω2 represents the event that a tumor is not present.
Parametric vs Nonparametric
Methods in Machine Learning
Parametric Methods
In parametric methods, we typically make an assumption with regards
to the form of the function f. For example, you could make an
assumption that the unknown function f is linear. In other words, we
assume that the function is of the form f(X) = β₀ + β₁ X₁ + … + βₚ
Xₚ
where f(X) is the unknown function to be estimated, β are the
coefficients to be learned, p is the number of independent variables
and X are the corresponding inputs.

Now that we have made an assumption about the form of the function
to be estimated and selected a model that aligns with this assumption,
we need a learning process that will eventually help us to train the
model and estimated the coefficients.

To summaries, parametric methods in Machine Learning usually take

a model-based approach where we make an assumption with respect
to form of the function to be estimated and then we select a suitable
model based on this assumption in order to estimate the set of
parameters.

The biggest disadvantage of parametric methods is that the

assumptions we make may not always be true.
For instance, you may assume that the form of the function is linear,
whilst it is not. Therefore, these methods involve fewer flexible
algorithms and are usually used for less complex problems.
Non-Parametric Methods
On the other hand, non-parametric methods refer to a set of
algorithms that do not make any underlying assumptions with respect
to the form of the function to be estimated. And since no assumption
is being made, such methods are capable of estimating the unknown
function f that could be of any form.

Non-parametric methods tend to be more accurate as they seek to best

fit the data points. However, this comes at the expense of requiring a
very large number of observations that is needed in order to estimate
the unknown function f accurately. Additionally, these methods tend
to be less efficient when it comes to training the models.
Furthermore, non-parametric methods may sometimes introduce
overfitting.
Artificial Intelligence Neural
Network
The Artificial neural network is one of its advancements which is
inspired by the structure of the human brain that helps computers and
machines more like a human. This article helps you to understand the
structure of Artificial Intelligence Neural Networks and their working
procedure.

What is a Neural Network?

A neural network is either a system software or hardware that works

similar to the tasks performed by neurons of the human brain. Neural
networks include various technologies like deep learning, and
machine learning as a part of Artificial Intelligence (AI).

Artificial neural networks (ANN) are the key tool of machine

learning. These are systems developed by the inspiration of neuron
functionality in the brain, which will replicate the way we humans
learn. Neural networks (NN) constitute both the input & output
layers, as well as a hidden layer containing units that change input
into the output so that the output layer can utilize the value.
These are the tools for finding patterns that are numerous & complex
for programmers to retrieve and train the machine to recognize the
patterns.
Backpropagation
In machine learning, backpropagation (backprop,[1] BP) is a widely
used algorithm for training feedforward neural networks.

Generalizations of backpropagation exist for other artificial neural

networks (ANNs), and for functions generally. These classes of
algorithms are all referred to generically as "backpropagation".[2] In
fitting a neural network, backpropagation computes the gradient of
the loss function with respect to the weights of the network for a
single input–output example, and does so efficiently, unlike a naive
direct computation of the gradient with respect to each weight
individually. This efficiency makes it feasible to use gradient
methods for training multilayer networks, updating weights to
minimize loss; gradient descent, or variants such as stochastic
gradient descent, are commonly used. The backpropagation algorithm
works by computing the gradient of the loss function with respect to
each weight by the chain rule, computing the gradient one layer at a
time, iterating backward from the last layer to avoid redundant
calculations of intermediate terms in the chain rule; this is an example
of dynamic programming.[3] The term backpropagation strictly refers
only to the algorithm for computing the gradient, not how the
gradient is used; however, the term is often used loosely to refer to
the entire learning algorithm, including how the gradient is used, such
as by stochastic gradient descent.[4] Backpropagation generalizes the
gradient computation in the delta rule, which is the single layer
version of backpropagation, and is in turn generalized by automatic
differentiation, where backpropagation is a special case of reverse
accumulation (or "reverse mode").[5] The term backpropagation and
its general use in neural networks was announced in Rumelhart,
Hinton & Williams (1986a), then elaborated and popularized in
Rumelhart, Hinton & Williams (1986b), but the technique was
independently rediscovered many times, and had many predecessors
dating to the 1960s; see § History.[6] A modern overview is given in
the deep learning textbook by Goodfellow, Bengio & Courville
(2016).[7]

 Deep Neural Networks

It is a neural network that incorporates the complexity of a
certain level, which means several numbers of hidden layers are
encompassed in between the input and output layers.
They are highly proficient on model and process non-linear
associations.

 Deep Belief Networks

A deep belief network is a class of Deep Neural Network that
comprises of multi-layer belief networks. Steps to perform
DBN:
With the help of the Contrastive Divergence algorithm, a layer
of features is learned from perceptible units.
Next, the formerly trained features are treated as visible units,
which perform learning of features.
Lastly, when the learning of the final hidden layer is
accomplished, then the whole DBN is trained.

 Recurrent Neural Networks

It permits parallel as well as sequential computation, and it is
exactly similar to that of the human brain (large feedback
network of connected neurons). Since they are capable enough
to reminisce all of the imperative things related to the input they
have received, so they are more precise.

Linear Regression
Linear Regression is a machine learning algorithm based on
supervised learning. It performs a regression task. Regression models
a target prediction value based on independent variables. It is mostly
used for finding out the relationship between variables and
forecasting. Different regression models differ based on – the kind of
relationship between dependent and independent variables, they are
considering and the number of independent variables being used.

Linear regression performs the task to predict a dependent variable

value (y) based on a given independent variable (x). So, this
regression technique finds out a linear relationship between x (input)
and y(output). Hence, the name is Linear Regression.
In the figure above, X (input) is the work experience and Y (output)
is the salary of a person. The regression line is the best fit line for our
model.

Hypothesis

While training the model we are given:

x: input training data (univariate – one input variable(parameter))
y: labels to data (supervised learning)
When training the model – it fits the best line to predict the value of y
for a given value of x. The model gets the best regression fit line by
finding the best θ1 and θ2 values.
θ1: intercept θ2: coefficient of x Once we find the best θ1 and θ2
values, we get the best fit line. So, when we are finally using our
model for prediction, it will predict the value of y for the input value
of x.
Logistic Regression in Machine
Learning
Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised

Learning technique. It is used for predicting the categorical dependent

variable using a given set of independent variables.

 Logistic regression predicts the output of a categorical

dependent variable. Therefore, the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and
1, it gives the probabilistic values which lie between 0 and 1.

 Logistic Regression is much similar to the Linear Regression

except that how they are used. Linear Regression is used for
solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
In Logistic regression, instead of fitting a regression line, we fit
an "S" shaped logistic function, which predicts two maximum
values (0 or 1).

 The curve from the logistic function indicates the likelihood of

something such as whether the cells are cancerous or not, a
mouse is obese or not based on its weight, etc.

 Logistic Regression is a significant machine learning algorithm

because it has the ability to provide probabilities and classify
new data using continuous and discrete datasets.

 Logistic Regression can be used to classify the observations

using different types of data and can easily determine the most
effective variables used for the classification. The below image
is showing the logistic function:

Logistic Function (Sigmoid Function):

 The sigmoid function is a mathematical function used to map

the predicted values to probabilities.
 It maps any real value into another value within a range of
0 and 1.
 The value of the logistic regression must be between 0 and 1,
which cannot go beyond this limit, so it forms a curve like the
"S" form. The S-form curve is called the Sigmoid function or
the logistic function.
In logistic regression, we use the concept of the threshold value,
which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold
values tends to 0.
Assumptions for Logistic Regression:

 The dependent variable must be categorical in nature.

 The independent variable should not have multicollinearity.

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified

into three types:

 Binomial: In binomial Logistic regression, there can be only

two possible types of the dependent variables, such as 0 or 1,
Pass or Fail, etc.

 Multinomial: In multinomial Logistic regression, there can be

3 or more possible unordered types of the dependent variable,
such as "cat", "dogs", or "sheep".

 Ordinal: In ordinal Logistic regression, there can be 3 or more

possible ordered types of dependent variables, such as "low",
"Medium", or "High".

Steps in Logistic Regression:

To implement the Logistic Regression using Python, we will use

the same steps as we have done in previous topics of Regression.
Below are the steps:

 Data Pre-processing step

 Fitting Logistic Regression to the Training set

 Predicting the test result

 Test accuracy of the result (Creation of Confusion matrix)

 Visualizing the test set result

Machine Learning Basics with
the K-Nearest Neighbors
Algorithm
Breaking it down
A supervised machine learning algorithm (as opposed to an
unsupervised machine learning algorithm) is one that relies on
labeled input data to learn a function that produces an appropriate
output when given new unlabeled data.

Imagine a computer is a child, we are its supervisor (e.g., parent,

guardian, or teacher), and we want the child (computer) to learn what
a pig looks like. We will show the child several different pictures,
some of which are pigs and the rest could be pictures of anything
(cats, dogs, etc.).

When we see a pig, we shout “pig!” When it’s not a pig, we shout
“no, not pig!” After doing this several times with the child, we show
them a picture and ask “pig?” and they will correctly (most of the
time) say “pig!” or “no, not pig!” depending on what the picture is.
That is supervised machine learning.

Supervised machine learning algorithms are used to solve

classification or regression problems.

A classification problem has a discrete value as its output.

For example, “likes pineapple on pizza” and “does not like pineapple
on pizza” are discrete. There is no middle ground. The analogy above
of teaching a child to identify a pig is another

Image showing randomly generated data.

This image shows a basic example of what classification data might

look like. We have a predictor (or set of predictors) and a label. In the
image, we might be trying to predict whether someone likes
pineapple (1) on their pizza or not (0) based on their age (the
predictor).

It is standard practice to represent the output (label) of a classification

algorithm as an integer number such as 1, -1, or 0. In this instance,
these numbers are purely representational.

Mathematical operations should not be performed on them because

doing so would be meaningless. Think for a moment. What is “likes
pineapple” + “does not like pineapple”? Exactly. We cannot add
them, so we should not add their numeric representations.

A regression problem has a real number (a number with a decimal

point) as its output. For example, we could use the data in the table
below to estimate someone’s weight given their height.
K-Nearest Neighbors
The KNN algorithm assumes that similar things exist in close
proximity. In other words, similar things are near to each other.

“Birds of a feather flock together.”

Image showing how similar data points typically exist close to each other

Notice in the image above that most of the time, similar data points
are close to each other. The KNN algorithm hinges on this
assumption being true enough for the algorithm to be useful. KNN
captures the idea of similarity (sometimes called distance, proximity,
or closeness) with some mathematics we might have learned in our
childhood— calculating the distance between points on a graph.
Note: An understanding of how we calculate the distance between
points on a graph is necessary before moving on. If you are
unfamiliar with or need a refresher on how this
calculation is done, thoroughly read “Distance Between 2 Points” in
its entirety, and come right back.

There are other ways of calculating distance, and one way might be
preferable depending on the problem we are solving.
However, the straight-line distance (also called the Euclidean
distance) is a popular and familiar choice.

The KNN Algorithm

1. Load the data

2. Initialize K to your chosen number of neighbors

1. For each example in the data

 Calculate the distance between the query example and the
current example from the data.

 Add the distance and the index of the example to an ordered

collection

2. Sort the ordered collection of distances and indices from smallest

to largest (in ascending order) by the distances
 Pick the first K entries from the sorted collection
 Get the labels of the selected K entries
 If regression, return the mean of the K labels
 If classification, return the mode of the K labels

Random Forest Algorithm

Random Forest is a popular machine learning algorithm that belongs
to the supervised learning technique. It can be used for both
Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the
performance of the model.

As the name suggests, "Random Forest is a classifier that contains a

number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that dataset."
Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher

accuracy and prevents the problem of overfitting.
The below diagram explains the working of the Random Forest
algorithm:
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class
of the dataset, it is possible that some decision trees may predict the
correct output, while others may not. But together, all the trees
predict the correct output. Therefore, below are two assumptions for a
better Random forest classifier:

 There should be some actual values in the feature variable of the

dataset so that the classifier can predict accurate results rather
than a guessed result.

 The predictions from each tree must have very low correlations.

Why use Random Forest?

Below are some points that explain why we should use the Random
Forest algorithm:
<="" li="">
 It takes less training time as compared to other algorithms.
 It predicts output with high accuracy, even for the large dataset it
runs efficiently.
 It can also maintain accuracy when a large proportion of data is
missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest

by combining N decision tree, and second is to make predictions for
each tree created in the first phase.

The Working process can be explained in the below steps and

diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data
points (Subsets).

Step-3: Choose the number N for decision trees that you want to
build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision
tree, and assign the new data points to the category that wins the
majority votes.

The working of the algorithm can be better understood by the below

example:
Example: Suppose there is a dataset that contains multiple fruit
images. So, this dataset is given to the Random forest classifier. The
dataset is divided into subsets and given to each decision tree. During
the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results,
the Random Forest classifier predicts the final decision. Consider the
below image:

Applications of Random Forest

There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the
identification of loan risk.

2. Medicine: With the help of this algorithm, disease trends

and risks of the disease can be identified.

3. Land Use: We can identify the areas of similar land use by

this algorithm.

4. Marketing: Marketing trends can be identified using this

algorithm.
 Random Forest is capable of performing both Classification and
Regression tasks.

 It is capable of handling large datasets with high dimensionality.

 It enhances the accuracy of the model and prevents the

overfitting issue.

Disadvantages of Random Forest

Although random forest can be used for both classification and
regression tasks, it is not more suitable for Regression tasks.

Python Implementation of Random Forest Algorithm

Now we will implement the Random Forest Algorithm tree using

Python. For this, we will use the same dataset "user_data.csv", which
we have used in previous classification models. By using the same
dataset, we can compare the Random Forest classifier with other
classification models such as Decision tree Classifier,
KNN,
SVM,
Logistic Regression, etc.

Implementation Steps are given below:

 Data Pre-processing step
 Fitting the Random forest algorithm to the Training set
 Predicting the test result
 Test accuracy of the result (Creation of Confusion matrix)
 Visualizing the test set result.
Ensemble Methods in Machine Learning:
What are They and Why Use Them?
Ensemble Methods, what are they? Ensemble methods is a machine
learning technique that combines several base models in order to
produce one optimal predictive model. To better understand this
definition let’s take a step back into ultimate goal of machine learning
and model building. This is going to make more sense as I dive into
specific examples and why Ensemble methods are used.

I will largely utilize Decision Trees to outline the definition and

practicality of Ensemble Methods (however it is important to note
that Ensemble Methods do not only pertain to Decision Trees).
A Decision Tree determines the predictive value based on series of
questions and conditions. For instance, this simple Decision Tree
determining on whether an individual should play outside or not. The
tree takes several weather factors into account, and given each factor
either makes a decision or asks another question. In this example,
every time it is overcast, we will play outside. However, if it is
raining, we must ask if it is windy or not? If windy, we will not play.
But given no wind, tie those shoelaces tight because were going
outside to play.
Decision Trees can also solve quantitative problems as well with the
same format. In the Tree to the left, we want to know whether or not
to invest in a commercial real estate property. Is it an office building?
A Warehouse? An Apartment building? Good economic conditions?
Poor Economic Conditions? How much will an investment return?
These questions are answered and solved using this decision tree.

When making Decision Trees, there are several factors we must take
into consideration: On what features do we make our decisions on?
What is the threshold for classifying each question into a yes or no
answer? In the first Decision Tree, what if we wanted to ask
ourselves if we had friends to play with or not. If we have friends, we
will play every time. If not, we might continue to ask ourselves
questions about the weather. By adding an additional question, we
hope to greater define the Yes and No classes.

This is where Ensemble Methods come in handy! Rather than just

relying on one Decision Tree and hoping we made the right decision
at each split, Ensemble Methods allow us to take a sample of
Decision Trees into account, calculate which features to use or
questions to ask at each split, and make a final predictor based on the
aggregated results of the sampled Decision Trees.

An Introduction to Gradient
Boosting Decision Trees
Gradient boosting works by building simpler (weak) prediction
models sequentially where each model tries to predict the error left
over by the previous model. Because of this, the algorithm tends to
overfit rather quick.
But what is a weak learning model? A model that does slightly better
than random predictions.

I will show you the exact formula shortly.

But for clearly understanding the underlying principles and working

of GBT, it’s important to first learn the basic concepts of decision
trees and ensemble learning.

This tutorial will take you through the concepts behind gradient
boosting and also through two practical implementations of the
algorithm:

1. Gradient Boosting from scratch

2. Using the scikit-learn in-built function.

Decision trees
A decision tree is a machine learning model that builds upon
iteratively asking questions to partition data and reach a solution. It is
the most intuitive way to zero in on a classification or label for an
object. Visually too, it resembles and upside-down tree with
protruding branches and hence the name.
For example, if you went hiking, and saw an animal that you couldn’t
immediately recognize through its features. You could later come
home and ask yourself a set of questions about its features which
could help you decide what exact species of animal did you notice. A
decision tree for this problem would look something like this.

A decision tree is a flowchart-like tree structure where each node is

used to denote feature of the dataset, each branch is used to denote a
decision, and each leaf node is used to denote the outcome.

The topmost node in a decision tree is known as the root node. It

learns to partition on the basis of the feature value. It partitions the
tree in a recursive manner, also call recursive partitioning. This
flowchart-like structure helps in decision making.

It’s visualization, as shown above, is like a flowchart diagram which

easily mimics the human level thinking. That is why decision trees
are easy to understand and interpret.

WHY BOOSTING?

Boosting works on the principle of improving mistakes of the

previous learner through the next learner.

In boosting, weak learner are used which perform only slightly better
than a random chance.
Boosting focuses on sequentially adding up these weak learners and
filtering out the observations that a learner gets correct at every step.
Basically, the stress is on developing new weak learners to handle the
remaining difficult observations at each step.

One of the very first boosting algorithms developed was Adaboost.

Gradient boosting improvised upon some of the features of Adaboost
to create a stronger and more efficient algorithm.

Let’s look at a brief overview of Adaboost.

AdaBoost

Adaboost used decision stumps as weak learners. Decision stumps are

decision trees with only a single split. It also attached weights to
observations, adding more weight too difficult to classify instances
and less weight to easy to classify instances.

The aim was to put stress on the difficult to classify instances for
every new weak learner. Further, the final result was average of
weighted outputs from all individual learners. The weights associated
with outputs were proportional to their accuracy.

Gradient boosting algorithm is slightly different from Adaboost.

Instead of using the weighted average of individual outputs as the

final outputs, it uses a loss function to minimize loss and converge
upon a final output value. The loss function optimization is done
using gradient descent, and hence the name gradient boosting.

Further, gradient boosting uses short, less-complex decision trees

instead of decision stumps. To understand this in more detail, let’s
see how exactly a new weak learner in gradient boosting algorithm
learns from the mistakes of previous weak learners.

Overfitting in Machine Learning

In the real world, the dataset present will never be clean and perfect.
It means each dataset contains impurities, noisy data, outliers,
missing data, or imbalanced data. Due to these impurities, different
problems occur that affect the accuracy and the performance of the
model. One of such problems is Overfitting in Machine Learning.
Overfitting is a problem that a model can exhibit.

Example to Understand Overfitting

We can understand overfitting with a general example. Suppose there
are three students, X, Y, and Z, and all three are preparing for an
exam. X has studied only three sections of the book and left all other
sections. Y has a good memory, hence memorized the whole book.
And the third student, Z, has studied and practiced all the questions.
So, in the exam, X will only be able to solve the questions if the exam
has questions related to section
Student Y will only be able to solve questions if they appear exactly
the same as given in the book. Student Z will be able to solve all the
exam questions in a proper way.

The same happens with machine learning; if the algorithm learns

from a small part of the data, it is unable to capture the required data
points and hence under fitted.
Suppose the model learns the training dataset, like the Y student.
They perform very well on the seen dataset but perform badly on
unseen data or unknown instances. In such cases, the model is said to
be Overfitting.
And if the model performs well with the training dataset and also
with the test/unseen dataset, similar to student Z, it is said to be a
good fit.

How to detect Overfitting?

Overfitting in the model can only be detected once you test the data.
To detect the issue, we can perform Train/test split.
In the train-test split of the dataset, we can divide our dataset into
random test and training datasets. We train the model with a training
dataset which is about 80% of the total dataset. After training the
model, we test it with the test dataset, which is 20 % of the total
dataset.

Underfitting
Underfitting occurs when our machine learning model is not able to
capture the underlying trend of the data. To avoid the overfitting in
the model, the fed of training data can be stopped

at an early stage, due to which the model may not learn enough from
the training data. As a result, it may fail to find the best fit of the
dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from
the training data, and hence it reduces the accuracy and produces
unreliable predictions.
An underfitted model has high bias and low variance.

Introduction to Regularization
Machine Learning
Regularization is that the method of adding data so as to resolve an
ill-posed drawback or to forestall overfitting. It applies to objective
functions in ill-posed improvement issues. Often, a regression model
overfits the information it’s coaching upon. During the method of
regularization, we tend to try and cut back the complexness of the
regression operate while not really reducing the degree of the
underlying polynomial operate. Regularization are often intended as a
method to enhance the generalizability of a learned model. In this
topic, we are going to learn about Regularization Machine Learning.

Some more about Regularization Machine Learning:

 Regularization is even for classification. As classifiers is usually

an undetermined drawback because it tries to infer to operate of
any x given.

 The term regularization is additionally supplementary to a loss

operate.

 Regularization will serve multiple functions, together with

learning easier models to be distributed and introducing cluster
structure into the educational drawback.

 The goal of this learning drawback is to seek out to operate that

matches or predicts the result that minimizes the expected error
overall potential inputs and labels.

1. Lasso Regularization (L1 regularization)

Regularization or Lasso Regularization adds a penalty to the error

operate. The penalty is that the total of absolutely the values of
weights.

p is that the standardization parameter that decides what proportion

we wish to penalize the model.
This lasso regularization is additionally referred to as L1
regularization.

2. Ridge Regularization (L2 regularization)

error operates.

However, the penalty here is that the total of the squared values
of L2 Regularization or Ridge Regularization conjointly add a
penalty to the weights.

p is that the standardization parameter that decides what

proportion we wish to penalize the model.
This ridge regularization is additionally referred to as L2
regularization.
The distinction between these each technique is that lasso
shrinks the slighter options constant to zero so, removing some
feature altogether. So, this works well for feature choice just in
case we’ve got a vast range of options.
Future scope of machine learning
We have heard a lot about the scope of Machine Learning, its
applications, job and salary scopes, etc. But, do you know, what is
Machine Learning? Why do we need Machine Learning? Where do
we use it?

To answer these questions popping up in your mind, this blog will

use an application of Machine Learning in the investment sector or
the stock market and try to understand the need and future scope of
Machine Learning.
Scope of Machine Learning (ML) is vast, and in the near future, it
will deepen its reach into various fields like medical, finance, social
media, facial and voice recognition, online fraud detection, and
biometrics. Gartner predicts that 30% of Government and large
enterprise contracts will require AI-fueled solutions by 2025.

Let’s understand the scope of machine learning in the future in

various fields:
 Medical:

 Cybersecurity:

 Digital voice assistants:

 Education:

 Job opportunities:
 Search engines:
If the current state of ML is exciting since it is the near future of
machine learning opens significantly more and highly complicated
chances for technologists. Let us look at these one-by-one Machine
learning is the process of automatically getting insights from data that
can drive business value.

Lavanya Tekumalla
Gathering and preparing large volumes of data that the machine will
use to teach itself. Feeding the data into ML models and training
them to make right decisions through supervision and correction.
Deploying the model to make analytical predictions or feed with new
kinds of data to expand its capabilities.

CONCLUSION
This tutorial has introduced you to Machine Learning. Now, you
know that Machine Learning is a technique of training machines to
perform the activities a human brain can do, albeit bit faster and
better than an average human-being. Today we have seen that the
machines can beat human champions in games such as Chess,
AlphaGO, which are considered very complex. You have seen that
machines can be trained to perform human activities in several areas
and can aid humans in living better lives.
Machine Learning can be a Supervised or Unsupervised. If you have
lesser amount of data and clearly labelled data for training, opt for
Supervised Learning. Unsupervised Learning would generally give
better performance and results for large data sets. If you have a huge
data set easily available, go for deep learning techniques. You also
have learned Reinforcement Learning and Deep Reinforcement
Learning. You now know what Neural Networks are, their
applications and limitations.

Unit 3 - DS - 1st Year
No ratings yet
Unit 3 - DS - 1st Year
5 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Unit 2 ML
No ratings yet
Unit 2 ML
141 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
29 pages
ML Lecture - 1
No ratings yet
ML Lecture - 1
33 pages
There Are Key Areas in The Process of Machine Learning, Like
No ratings yet
There Are Key Areas in The Process of Machine Learning, Like
45 pages
Unit 1
100% (1)
Unit 1
13 pages
Internship Report On Machine Learning With Python For Business
No ratings yet
Internship Report On Machine Learning With Python For Business
25 pages
UNIT 1 All Notes
No ratings yet
UNIT 1 All Notes
24 pages
INTRODUCTION
No ratings yet
INTRODUCTION
51 pages
Python Final Internship Report
No ratings yet
Python Final Internship Report
29 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Report On Machine Learning-Jyoti Poddar-EC084
No ratings yet
Report On Machine Learning-Jyoti Poddar-EC084
70 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Machine Learning 1.4.19
No ratings yet
Machine Learning 1.4.19
23 pages
Salary Prediction-2
No ratings yet
Salary Prediction-2
26 pages
DAIOT UNIT 5 (1) Own
No ratings yet
DAIOT UNIT 5 (1) Own
13 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
CH 1
No ratings yet
CH 1
34 pages
Machine Learning Algorithms - A Review - ART20203995
No ratings yet
Machine Learning Algorithms - A Review - ART20203995
6 pages
Technical Report 2.0
No ratings yet
Technical Report 2.0
8 pages
Name: Ayush Sagar REG.: RA171100501230 Dept.: Eee Sub.: Data Minning and Analytics
No ratings yet
Name: Ayush Sagar REG.: RA171100501230 Dept.: Eee Sub.: Data Minning and Analytics
9 pages
Machine Learning Internship Report
100% (6)
Machine Learning Internship Report
29 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
19 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Module 1
No ratings yet
Module 1
54 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
MY Internship
No ratings yet
MY Internship
30 pages
Report
No ratings yet
Report
27 pages
ML Unit-1 Notes
No ratings yet
ML Unit-1 Notes
13 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Unit V
No ratings yet
Unit V
67 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
No ratings yet
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
25 pages
Unit Iii Supervised Learning
No ratings yet
Unit Iii Supervised Learning
67 pages
Unit 1
No ratings yet
Unit 1
24 pages
Unit 1
No ratings yet
Unit 1
52 pages
InTech-Types of Machine Learning Algorithms
No ratings yet
InTech-Types of Machine Learning Algorithms
30 pages
5th Sem Report
No ratings yet
5th Sem Report
29 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
225 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
Unit 5
No ratings yet
Unit 5
30 pages
Unit 3 ML
No ratings yet
Unit 3 ML
119 pages
Machine Learning for Professionals
No ratings yet
Machine Learning for Professionals
32 pages
ML Type
No ratings yet
ML Type
13 pages
Machine Learning Guide for Learners
No ratings yet
Machine Learning Guide for Learners
73 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Module 1
No ratings yet
Module 1
28 pages
ML Chapter 1
No ratings yet
ML Chapter 1
37 pages
Machine Learning (MCA)
No ratings yet
Machine Learning (MCA)
5 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
12 pages
Report - Stock Price Prediction DL
No ratings yet
Report - Stock Price Prediction DL
37 pages
ML Notes UT-1
No ratings yet
ML Notes UT-1
21 pages
2 ML
No ratings yet
2 ML
9 pages
Build A Machine Learning Portfolio
No ratings yet
Build A Machine Learning Portfolio
18 pages
Pattern Recognition in AI
No ratings yet
Pattern Recognition in AI
3 pages
Sattelite Communication
No ratings yet
Sattelite Communication
55 pages
Microwave Measurement
No ratings yet
Microwave Measurement
56 pages
MEMS Thin Film Deposition Processes
No ratings yet
MEMS Thin Film Deposition Processes
4 pages
Machine Learning With Python Report
100% (1)
Machine Learning With Python Report
41 pages
Mpws Project
No ratings yet
Mpws Project
8 pages
Mobile Based IVR System
No ratings yet
Mobile Based IVR System
17 pages
Mobile Based IVR System
No ratings yet
Mobile Based IVR System
17 pages
Causal Inference in Health, Economic and Social Sciences: Welcome and Introduction 9:15-9:30 (BST)
No ratings yet
Causal Inference in Health, Economic and Social Sciences: Welcome and Introduction 9:15-9:30 (BST)
4 pages
Sociological Foundations of Education
No ratings yet
Sociological Foundations of Education
5 pages
2022 STEM Fellowship UCalgary Research Program Testimonials
No ratings yet
2022 STEM Fellowship UCalgary Research Program Testimonials
4 pages
22.04.2862 DP
No ratings yet
22.04.2862 DP
2 pages
Motivation and Action (3rd Edition) Heckhausen
No ratings yet
Motivation and Action (3rd Edition) Heckhausen
10 pages
ACTION RESEARCH Chapter 1 - 1
No ratings yet
ACTION RESEARCH Chapter 1 - 1
74 pages
A Meta-Analytical Review of Burnout in Latin American Countries
No ratings yet
A Meta-Analytical Review of Burnout in Latin American Countries
16 pages
Importance of Mental Health
No ratings yet
Importance of Mental Health
5 pages
Managing Change & Innovation Guide
100% (34)
Managing Change & Innovation Guide
32 pages
Directorate of Technical Education, Maharashtra State, Mumbai
No ratings yet
Directorate of Technical Education, Maharashtra State, Mumbai
45 pages
Kano University of Science and Technology - Wikipedia
100% (1)
Kano University of Science and Technology - Wikipedia
4 pages
Girls and Juvenile Justice: Power, Status, and The Social Construction of Delinquency 1st Edition Carla P. Davis (Auth.)
No ratings yet
Girls and Juvenile Justice: Power, Status, and The Social Construction of Delinquency 1st Edition Carla P. Davis (Auth.)
63 pages
Unit VIII Human Intelligence
No ratings yet
Unit VIII Human Intelligence
53 pages
AI and Mobile Tech Trends 2023
No ratings yet
AI and Mobile Tech Trends 2023
1 page
CropRecommendationSystem PDF
No ratings yet
CropRecommendationSystem PDF
4 pages
Colegio de Sebastian - Pampanga,: Research Title (In Inverted Pyramid If More Than One Line)
No ratings yet
Colegio de Sebastian - Pampanga,: Research Title (In Inverted Pyramid If More Than One Line)
16 pages
Culturalization & Acculturation Insights
No ratings yet
Culturalization & Acculturation Insights
1 page
Sociology: Paper 9699/11 Socialisation, Identity and Research Methods
No ratings yet
Sociology: Paper 9699/11 Socialisation, Identity and Research Methods
31 pages
Group 6 - Leading and Managing Scenario Projects
No ratings yet
Group 6 - Leading and Managing Scenario Projects
14 pages
Article Design Canvas
No ratings yet
Article Design Canvas
1 page
Formulating Research Questions
No ratings yet
Formulating Research Questions
11 pages
Reporting Findings Drawing Conclusions and Making Recommendations
No ratings yet
Reporting Findings Drawing Conclusions and Making Recommendations
39 pages
Personal-Development Reviewer 1quarter
No ratings yet
Personal-Development Reviewer 1quarter
3 pages
Comprehensive Guide to Career Planning
No ratings yet
Comprehensive Guide to Career Planning
6 pages
Assessment Material Development Guide
100% (3)
Assessment Material Development Guide
7 pages
Credit Evaluation
No ratings yet
Credit Evaluation
7 pages
Students' Views On E-Learning and Knowledge of Learning Platforms: Case of A Professional License at The Higher Normal School of Casablanca
No ratings yet
Students' Views On E-Learning and Knowledge of Learning Platforms: Case of A Professional License at The Higher Normal School of Casablanca
8 pages
The Parental "Acceptance-Rejection Syndrome": Universal Correlates of Perceived Rejection
100% (1)
The Parental "Acceptance-Rejection Syndrome": Universal Correlates of Perceived Rejection
11 pages
Major Assignment
No ratings yet
Major Assignment
3 pages
Understanding Harmony in the Self
No ratings yet
Understanding Harmony in the Self
28 pages