0% found this document useful (0 votes)
12 views

UNIT 1 Notes

The document provides an overview of machine learning, including its definition, types (supervised, unsupervised, reinforcement), and the importance of data in building predictive models. It discusses the need for machine learning due to the vast amount of data generated and its applications in decision-making and pattern recognition. Additionally, it covers concepts like dimensionality reduction, feature vectors, and hypothesis space in the context of machine learning algorithms.

Uploaded by

archanashrma6266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

UNIT 1 Notes

The document provides an overview of machine learning, including its definition, types (supervised, unsupervised, reinforcement), and the importance of data in building predictive models. It discusses the need for machine learning due to the vast amount of data generated and its applications in decision-making and pattern recognition. Additionally, it covers concepts like dimensionality reduction, feature vectors, and hypothesis space in the context of machine learning algorithms.

Uploaded by

archanashrma6266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Gyan Ganga Institute of Technology and Sciences

Department of CSE-AIML
Subject Notes
AL 405- Machine
Learning UNIT-I
Introduction to machine learning:
Machine learning is a tool for turning information into knowledge. Machine learning techniques are
used to automatically find the valuable underlying patterns within complex data that we would
otherwise struggle to discover. The hidden patterns and knowledge about a problem can be used to
predict future events and perform all kinds of complex decision making.
Tom Mitchell gave a “well-posed” mathematical and relational definition that “A computer program is
said to learn from experience E with respect to some task T and some performance measure P, if its
performance on T, as measured by P, improves with experience E.
For Example:
A checkers learning problem:
Task(T): Playing checkers.
Performance measures (P): Performance of games won.
Training Experience ( E ): Playing practice games against
itself. Need For Machine Learning
• Ever since the technical revolution, we’ve been generating an immeasurable amount of data.
• With the availability of so much data, it is finally possible to build predictive models that can
study and analyse complex data to find useful insights and deliver more accurate results.
• Top Tier companies such as Netflix and Amazon build such Machine Learning models by using
tons of data in order to identify profitable opportunities and avoid unwanted risks.
ML Vs AI Vs DL

• Model: A model is the main component of Machine Learning. A model is trained by using a
Machine Learning Algorithm. An algorithm maps all the decisions that a model is supposed to
take based on the given input, in order to get the correct output.
• Predictor Variable: It is a feature(s) of the data that can be used to predict the output.
• Response Variable: It is the feature or the output variable that needs to be predicted by using the
predictor variable(s).
• Training Data: The Machine Learning model is built using the training data. The training data
helps the model to identify key trends and patterns essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate how accurately it can
predict an outcome. This is done by the testing data set.
Gyan Ganga Institute of Technology and Sciences

Note: A Machine Learning process begins by feeding the machine lots of data, by using this data
the machine is trained to detect hidden insights and trends. These insights are then used to build a
Machine Learning Model by using an algorithm in order to solve a problem in Figure above.

• Increase in Data Generation: Due to excessive production of data, need a method that can be
used to structure, analyze and draw useful insights from data. This is where Machine Learning
comes in. It uses data to solve problems and find solutions to the most complex tasks faced
by organizations.
• Improve Decision Making: By making use of various algorithms, Machine Learning can be
used to make better business decisions.
For example, Machine Learning is used to forecast sales, predict downfalls in the stock
market, identify risks and anomalies, etc.
• Uncover patterns & trends in data: Finding hidden patterns and extracting key insights from
data is the most essential part of Machine Learning. By building predictive models and using
statistical techniques, Machine Learning allows you to dig beneath the surface and explore the
data at a minute scale. Understanding data and extracting patterns manually will take days,
whereas Machine Learning algorithms can perform such computations in less than a second.
• Solve complex problems: Building self-driving cars, Machine Learning can be used to solve
the most complex problems.
Limitations
1. What algorithms exist for learning general target function from specific training examples?
2. In what setting will particular algorithm converge to the desired function, given sufficient
training data?
3. Which algorithm performs best for which types of problems and representations?
Gyan Ganga Institute of Technology and Sciences

4. How much training data is sufficient?


5. When and how can prior knowledge held by the learner guide the process of generalizing from
examples?
6. What is the best way to reduce the learning task to one more function approximation problem?
7. Machine Learning Algorithms Require Massive Stores of Training Data.
8. Labeling Training Data Is a Tedious Process.
9. Machines Cannot Explain Themselves.
Machine Learning Types:
A Machine can learn to solve a problem by any one of the following three approaches.
These are the ways in which a machine can learn:

Types of Machine Learning


Machine learning is sub-categorized to three types:
1. Supervised Learning – Train Me!
2. Unsupervised Learning – I am self-sufficient in learning
3. Reinforcement Learning – My life My rules! (Hit &
Trial) Supervised Learning
Supervised Learning is the one, where you can consider the learning is guided by a teacher. We have
a dataset which acts as a teacher and its role is to train the model or the machine. Once the model gets
trained it can start making a prediction or decision when new data is given to it.
Gyan Ganga Institute of Technology and Sciences

Figure 1.3 Supervised Learning


Unsupervised Learning
The model learns through observation and finds structures in the data. Once the model is given a
dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it. What it
cannot do is add labels to the cluster; like it cannot say this a group of apples or mangoes, but it will
separate all the apples from mangoes.
Suppose we presented images of apples, bananas and mangoes to the model, so what it does, based on
some patterns and relationships it creates clusters and divides the dataset into those clusters. Now if a
new data is fed to the model, it adds it to one of the created clusters.

Figure 1.4 Un-Supervised Learning


Reinforcement Learning
It is the ability of an agent to interact with the environment and find out what is the best outcome. It
follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a correct or
a wrong answer, and on the basis of the positive reward points gained the model trains itself. And again

once trained it gets ready to predict the new data presented to it.
Gyan Ganga Institute of Technology and Sciences
Gyan Ganga Institute of Technology and Sciences

k-Fold Cross-Validation
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data
sample. Cross-validation is a statistical method used to estimate the skill of machine learning models.

It is commonly used in applied machine learning to compare and select a model for a given predictive
modeling problem because it is easy to understand, easy to implement, and results in skill estimates that
generally have a lower bias than other methods.

The procedure has a single parameter called k that refers to the number of groups that a given data sample is
to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is
chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-
validation.

Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning
model on unseen data. That is, to use a limited sample in order to estimate how the model is expected to
perform in general when used to make predictions on data not used during the training of the model.

It is a popular method because it is simple to understand and because it generally results in a less biased or
less optimistic estimate of the model skill than other methods, such as a simple train/test split.

The general procedure is as follows:

1. Shuffle the dataset randomly.


2. Split the dataset into k groups
3. For each unique group:
1. Take the group as a hold out or test data set
2. Take the remaining groups as a training data set
3. Fit a model on the training set and evaluate it on the test set
4. Retain the evaluation score and discard the model
4. Summarize the skill of the model using the sample of model evaluation scores
Importantly, each observation in the data sample is assigned to an individual group and stays in that group
for the duration of the procedure. This means that each sample is given the opportunity to be used in the hold
out set 1 time and used to train the model k-1 times
Gyan Ganga Institute of Technology and Sciences

What is Dimensionality Reduction?


In machine learning classification problems, there are often too many factors on the basis of
which the final classification is done. These factors are basically variables called features. The
higher the number of features, the harder it gets to visualize the training set and then work on it.
Sometimes, most of these features are correlated, and hence redundant. This is where
dimensionality reduction algorithms come into play. Dimensionality reduction is the process of
reducing the number of random variables under consideration, by obtaining a set of principal
variables. It can be divided into feature selection and feature extraction.
Why is Dimensionality Reduction important in Machine Learning and Predictive Modeling?
An intuitive example of dimensionality reduction can be discussed through a simple e-mail
classification problem, where we need to classify whether the e-mail is spam or not. This can
involve a large number of features, such as whether or not the e-mail has a generic title, the
content of the e-mail, whether the e-mail uses a template, etc. However, some of these features
may overlap. In another condition, a classification problem that relies on both humidity and
rainfall can be collapsed into just one underlying feature, since both of the aforementioned are
correlated to a high degree. Hence, we can reduce the number of features in such problems. A 3-
D classification problem can be hard to visualize, whereas a 2-D one can be mapped to a simple
2 dimensional space, and a 1-D problem to a simple line. The below figure illustrates this
concept, where a 3-D feature space is split into two 1-D feature spaces, and later, if found to be
correlated, the number of features can be reduced even further.
Gyan Ganga Institute of Technology and Sciences

There are two components of dimensionality reduction:

 Feature selection: In this, we try to find a subset of the original set of


variables, or features, to get a smaller subset which can be used to model the
problem. It usually involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a
lowerdimension space, i.e. a space with lesser no. of dimensions.

Methods of Dimensionality Reduction

The various methods used for dimensionality reduction include:

 Principal Component Analysis (PCA)


 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)

Dimensionality reduction may be both linear or non-linear, depending upon the


method used. The prime linear method, called Principal Component Analysis, or
PCA, is discussed below.

 Principal Component Analysis


This method was introduced by Karl Pearson. It works on a condition that while the
data in a higher dimensional space is mapped to data in a lower dimension space, the
variance of the data in the lower dimensional space should be maximum.
Gyan Ganga Institute of Technology and Sciences

It involves the following steps:


 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a
large fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been
some data loss in the process. But, the most important variances should be retained
by the remaining eigenvectors.
Advantages of Dimensionality Reduction
 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
Disadvantages of Dimensionality Reduction
 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes
undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some
thumb rulesare applied.
Features and feature Vector

A vector is a series of numbers, like a matrix with one column but multiple rows, that can often be represented
spatially. A feature is a numerical or symbolic property of an aspect of an object. A feature vector is a vector
containing multiple elements about an object. Putting feature vectors for objects together can make up a feature space.

The features may represent, as a whole, one mere pixel or an entire image. The granularity depends on what someone is
trying to learn or represent about the object. You could describe a 3-dimensional shape with a feature vector indicating its
height, width, depth, etc.
Gyan Ganga Institute of Technology and Sciences

Understanding Hypothesis
In most supervised machine learning algorithm, our main goal is to find out a possible
hypothesis from the hypothesis space that could possibly map out the inputs to the proper
outputs.
The following figure shows the common method to find out the possible hypothesis from the
Hypothesis space:

Hypothesis Space (H):


Hypothesis space is the set of all the possible legal hypothesis. This is the set from which the
machine learning algorithm would determine the best possible (only one) which would best
describe the target function or the outputs.

Hypothesis (h):
A hypothesis is a function that best describes the target in supervised machine learning. The
hypothesis that an algorithm would come up depends upon the data and also depends upon the
restrictions and bias that we have imposed on the data. To better understand the Hypothesis
Space and Hypothesis consider the following coordinate that shows the distribution of some data:
Gyan Ganga Institute of Technology and Sciences

Say suppose we have test data for which we have to determine the outputs or
results. The test data is as shown below:

We can predict the outcomes by dividing the coordinate as shown below


Gyan Ganga Institute of Technology and Sciences

So the test data would yield the following result:

But note here


that we could have divided the coordinate plane as:

he way in which the coordinate would be divided depends on the data, algorithm and constraints.
Gyan Ganga Institute of Technology and Sciences

 All these legal possible ways in which we can divide the coordinate plane to predict the
outcome of the test data composes of the Hypothesis Space.
 Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:

You might also like