0% found this document useful (0 votes)
10 views

Machine Learning- UNIT I (1)

The document outlines a course on machine learning, detailing its characteristics, types of algorithms, and applications. It covers supervised, unsupervised, and reinforcement learning, as well as specific methods like K-Nearest Neighbors and clustering techniques. The course aims to equip learners with the ability to recognize, apply, and evaluate machine learning algorithms for real-world problems.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Machine Learning- UNIT I (1)

The document outlines a course on machine learning, detailing its characteristics, types of algorithms, and applications. It covers supervised, unsupervised, and reinforcement learning, as well as specific methods like K-Nearest Neighbors and clustering techniques. The course aims to equip learners with the ability to recognize, apply, and evaluate machine learning algorithms for real-world problems.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Course

Outcome
1. Recognize the characteristics of machine learning that make it
useful
to real-world problems.
2. Able to use regularized regression and Classification algorithms.
3. Evaluate machine learning algorithms and model selection.
4. Understand scalable machine learning and machine learning for
IoT.
5. Understand Deep leaning and Expert system.
UNIT-
I
Outline
• Introduction to Machine Learning
• Types of Machine Learning Algorithms
• Supervised Learning
• Unsupervised learning
• Reinforcement Learning
• Classification of Machine Learning
Concept
• Distance Based Machine learning Methods
• K-Nearest Neighbor (kNN)
Outline
cont….
• Introduction to Clustering Techniques
• Possible Applications
• Requirements of clustering algorithm
• Problems associated with using Clustering
Technique
• Types of Clustering Methods
• Clustering Strategies.
Introduction to Machine Learning
• The idea behind Machine Learning is that, the data is feed to train the machines
and
letting them learn on their own, without any human intervention.

• Machine learning was first introduced in 1959, by Arthur Samuel described as the,” it is
a field of study that gives the ability to the computer for self-learn without
being explicitly programmed”, that means imbue knowledge to machines without
hard-coding it.

• “A computer algorithm/program is said to learn from performance measure P


and experience E with some class of tasks T if its performance at tasks in T, as
measured by P, improves with experience E.” -Tom M. Mitchell.
Introduction to Machine
Learning
• Traditional Programming : We feed in DATA (Input) + PROGRAM (logic), run it
on machine and get output.

• Machine Learning : We feed in DATA(Input) + Output, run it on machine during


training and the machine creates its own program(logic), which can be evaluated while
testing.
Introduction to Machine
Learning
• In traditional programming, a programmer code all the rules with an expert for
which software is being developed. Each rule is based on a logical foundation; the
machine will execute an output following the logical statement. When the
system grows complex, more rules need to be written. It can quickly become
unsustainable to maintain.
• The machine learns how the input and output data are correlated and it writes a
rule. The programmers do not need to write new rules each time when there is
new data. The algorithms adapt in response to new data and experiences to
improve efficacy over time.
Machine learning is a diverse and exciting field, and there are multiple ways of defining it:

The Artificial Intelligence View: Learning is central to human knowledge and intelligence,
and, likewise, it is also essential for building intelligent machines. Years of effort in AI has shown that trying
to build intelligent computers by programming all the rules cannot be done; automatic learning is crucial. For
example, we humans are not born with the ability to understand language — we learn it — and it makes
sense to try to have computers learn language instead of trying to program it all it.
The Software Engineering View Machine learning allows us to program computers by
:

example, which can be easier than writing code the traditional way.
The Stats(statistics) View Machine learning is the marriage of computer science and statistics:
.

computational techniques are applied to statistical problems. Machine learning has been applied to a vast
number of problems in many contexts, beyond the typical statistics problems. Machine learning is often
designed with different considerations than statistics (e.g., speed is often more important than accuracy).
History of Machine
Learning
History of Machine Learning
• It was in the 1940s when the first manually operated computer system,
ENIAC (Electronic Numerical Integrator and Computer), was invented.
• At that time the word “computer” was being used as a name for a human
with intensive numerical computation capabilities, so, ENIAC was called a
numerical computing machine!
• Well, you may say it has nothing to do with learning?! WRONG, from
the beginning the idea was to build a machine able to emulate human
thinking and learning.
History of Machine
Learning
• In the 1950s, we see the first computer game program claiming to be able to beat the
checkers world champion. This program helped checkers players a lot in improving their skills!

• Around the same time, Frank Rosenblatt invented the Perceptron which was a very, very
simple classifier but when it was combined in large numbers, in a network, it became a powerful
monster.

• Thanks to statistics, machine learning became very famous in the 1990s. The intersection
of computer science and statistics gave birth to probabilistic approaches in AI. This shifted the
field further toward data-driven approaches. Having large-scale data available, scientists started
to build intelligent systems that were able to analyze and learn from large amounts of data.
Need of Machine
Learning
• Machine learning is a field raised out of artificial intelligence (AI). Through
the application of AI, human build better and intelligent machines. But they
were unable to program more complex and constantly evolving challenges.

• Then came the realization that the only way to achieve more advanced tasks
was to let the machine learn from its own input.

• So machine learning was developed as a new capability for computers, and is


now
present in so many segments of technology we may not even realize it's there.
Need of Machine Learning
• Some data sets are so massive that the human brain needs help finding patterns, and this
is
where machine learning swings into action.

• If big data and cloud computing are gaining importance for their contributions,
machine learning also deserves recognition for helping data scientists analyze large
chunks of data via an automated process that saves time and effort.

• The techniques used for data mining have been around for years, but they're not
effective without the power to run algorithms. When you run deep learning with
access to better data, the output leads to dramatic advances, which is why there's
a need for machine learning.
Why Machine Learning is so
important?
Working Process of Machine
Learning
Machine Learning
Applications
• There are many uses of
Machine Learning in various
fields. These fields have
different applications of
Supervised, Unsupervised and
Reinforcement learning.
Types of Machine
Learning
Machine learning can
be classified into 3
types of algorithms:
▪ Supervised Learning
▪ Unsupervised
Learning
▪ Reinforcement
Learning
Supervised Learning
• In supervised learning, algorithms are trained using labeled data,
where
the input and the output are known.

• The data is fed in the learning algorithm as a set of inputs, along with
the corresponding outputs, and the algorithm learns by comparing its
actual production with correct outputs to find errors. It then modifies
the model accordingly.

• The raw data divided into two parts, i.e. training data and testing data.
Supervised
Learning
• Supervised learning uses the data patterns to predict the values
of additional data for the labels.

• Such method is commonly used in applications where historical


data
predict the upcoming events.

• Ex:- It can anticipate when transactions are likely to be fraudulent


or which insurance customer is expected to file a claim.
Supervised Learning Algorithm
Types of Supervised
learning
• Classification: A classification problem is when the output variable
is a category, such as “red” or “blue” or “disease” and “no disease”.

• Regression: A regression problem is when the output variable is a


real value, such as “dollars” or “weight”.
Unsupervised
Learning
• Unsupervised Learning is the second type of machine learning,
in which unlabeled data are used to train the algorithm, which
means it used against data that has no historical labels.

• What is being showing must figure out by the algorithm. The


purpose is to explore the data and find some structure within.
Unsupervised
Learning
•In unsupervised learning the data is unlabeled, and the input of raw
information directly to the algorithm without pre-processing of the data and
without knowing the output of the data.
• The algorithm figures out the data and according to the data segments, it
makes
clusters of data with new labels.
• This learning technique works well on transactional data.
• These algorithms are also used to segment text topics, recommend items
and
identify data outliers.
Unsupervised Learning
Types of Unsupervised
learning
• Clustering: A clustering technique is to find similarities in the
data point and group similar data points together and to figures out
that new data should belong to which cluster.

• Association: An association rule learning problem is where you


want to discover rules that describe large portions of your data,
such as people that buy X also tend to buy Y.
Reinforcement
Learning
• A reinforcement learning algorithm, or agent, learns by interacting with its
environment.

• The agent receives rewards by performing correctly and penalties for


performing incorrectly. The agent learns without intervention from a human by
maximizing its reward and minimizing its penalty.

• The objective is for the agent to take actions that maximise the expected reward over
a given measure of time. The agent will reach the goal much quicker by following a
good policy. So the purpose of reinforcement learning is to learn the best plan.
Reinforcement
Learning
Reinforcement
Learning
• It is a type of dynamic programming that trains algorithms using
a system of reward and punishment.
• With reinforcement learning, the algorithm discovers through trial
and
error which actions yield the most significant rewards.
• The reinforcement learning frequently used for robotics, gaming,
and
navigation.
Summarization of all Machine Learning
Types
• Supervised Learning – Train Me!

• Unsupervised Learning – I am self sufficient in learning

• Reinforcement Learning – My life My rules! (Hit &


Trial)
Classification of Machine Learning
Concept
Distance Based Machine Learning
Methods
• Distance-based algorithms are machine learning algorithms that
classify data points by computing distances between the new data
points and a number of internally stored data points.
• The data points that are closest to the old data points have the
largest
influence on the classification assigned to the points.
• In Distance-based machine learning algorithms, K-Nearest
Neighbor
algorithm is simplest algorithm used for classification.
Distance Based Machine Learning
Methods
• Similarity
measure

• Euclidean space
• Euclidean distance
• Manhattan distance
• Minkowski
distance
K-Nearest Neighbor (KNN) Algorithm
• K-Nearest Neighbor is a type of Supervised Machine Learning algorithm.

• K-NN algorithm can be used for both Regression as well as for Classification,
however it is mainly used for the Classification problems.

• K-NN algorithm uses ‘feature similarity’ to predict the values between the new
data points and available data points and put the new data points into the category that
is most similar to the available categories.

• K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
K-Nearest Neighbor (KNN) Algorithm
• The following two properties would define KNN well −
• Non-parametric learning algorithm − KNN is also a non-parametric
learning
algorithm because it does not assume anything about the underlying data.
• Lazy learning algorithm − KNN is a lazy learning algorithm because it
does not have a specialized training set and uses all the data for training
while classification.
• KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much similar
to the new data.
K-Nearest Neighbor (KNN)
Algorithm
With the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
K-Nearest Neighbor (KNN)
Algorithm
• The working of the K-NN algorithm:

• Step-1: Select the number K of the neighbors

• Step-2: Calculate the Euclidean distance of K number of neighbors

• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

• Step-4: Among these k neighbors, count the number of the data points in each
category.

• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.

• Step-6: Our model is ready.


• Suppose we have a new data point and we need to put it in the required
category. Consider the below image:
• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points.
• The Euclidean distance is the distance between two points. It can be calculated
as:
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors
in category A and two nearest neighbors in category B.

• Consider the below image:

• As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.
Procedure to select the value of K in the K-NN
Algorithm?
• Below are some points to remember while selecting the value of K in the K-
NN algorithm:

• There is no particular way to determine the best value for "K", so we need to try
some
values to find the best out of them. The most preferred value for K is 5.

• A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.

• Large values for K are good, but it may find some difficulties.
Advantages and Disadvantages of KNN
Algorithm
• Advantages :
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.

• Disadvantages:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the
data
points for all the training samples
Introductio
n to
Clustering
Techniques
Introduction to Clustering
Techniques
• Clustering is basically a type of unsupervised learning method.

• An unsupervised learning is a method where references are


drawn from datasets consisting of input data without labelled
responses.

• Generally, it is used as a process to find meaningful


structure, explanatory underlying processes, generative features,
and groupings inherent in a set of examples.
Introduction to Clustering
Techniques
Introduction to Clustering
Techniques
• Clustering is the task of dividing the objects or data points into a number of groups
such that data points are more similar to each other in the same group and data
points are more dissimilar to other the data points in the other group.

• It does it by finding some similar patterns in the unlabeled dataset such as shape,
size, color, behavior, etc., and divides them as per the presence and absence of those
similar patterns.

• It is basically a collection of objects on the basis of similarity and dissimilarity


between
them.
Introduction to Clustering
Techniques
Example 1: Here clusters are distinguish, and can identify that there are
3
clusters in the below picture.
Example 2: Imagine there are number of objects in a basket. Each item has a distinct set
of features (size, shape, color, etc.). The task is to select features of each object in the
basket and accordingly divide them in a group.
Possible
Applications
• Clustering Algorithms can be applied in many fields, for instance:

1. Marketing: Finding groups of customers with similar behavior given a


large
database of customer data containing their properties and past buying records.

2. Biology: Classification of plants and animals given their features.

3. City-Planning: Helps in the identification of groups of houses in a city


according
to house type, value, and geographic location.

4. Libraries : It is used in clustering different books on the basis


of topics and
Possible
Applications
5. WWW: Clustering also helps in classifying documents on the web
for
information discovery.

6. Insurance: It is used to acknowledge the customers, their policies


and
identifying the frauds.

7. Data mining: Cluster analysis serves as a tool to gain insight into


the distribution of data to observe
characteristics of each cluster.

8. Earthquake studies: By learning the earthquake-affected areas we


Requirements of Clustering
Algorithm
• Scalability Need highly clustering algorithms deal larg
− scalable to with e
databases.
• Minimal requirements for domain knowledge to determine input
parameters- Many clustering algorithms require users to input certain
parameters in cluster analysis. Parameters are often difficult to determine,
especially for data sets containing high-dimensional objects. This not only
burdens users, but it also makes the quality of clustering difficult to control.
Requirements of Clustering
Algorithm
• Ability to deal with different kinds of attributes − Algorithms should be capable to be
applied on any kind of data such as interval-based (numerical) data, categorical, and binary data.

• Discovery of clusters with attribute shape − The clustering algorithm should be capable
of detecting clusters of arbitrary shape. They should not be bounded to only distance
measures that tend to find spherical cluster of small sizes.

• Insensitivity to the order of input records: Some clustering algorithms cannot incorporate
newly inserted data. Some clustering algorithms are sensitive to the order of input data. It is
important to develop algorithms that are insensitive to the order of input.
Requirements of Clustering
Algorithm
• High dimensionality − The clustering algorithm should not only be able to
handle
low-dimensional data but also the high dimensional space.

• Ability to deal with noisy data − Databases contain noisy, missing or


incorrect data. Some algorithms are sensitive to such data and may lead to
poor quality clusters.

• Interpretability − The clustering results should be interpretable,


comprehensible, and usable.
Problems Associated with using Clustering Technique
(and
1. Current clustering techniques do not address all the
requirements adequately concurrently);

2. Dealing with large number of dimensions and large number of data items
can be problematic because of time complexity;

3. The effectiveness of the method depends on the definition of “distance” (for


distance- based clustering);

4. If an obvious distance measure doesn’t exist we must “define” it, which is not always
easy, especially in multi-dimensional spaces;

5. The result of the clustering algorithm (that in many cases can be arbitrary itself) can
be interpreted in different ways.
Types of Clustering Methods
• Clustering methods are used to identify groups of similar objects in a multivariate
data sets collected from fields such as marketing, bio-medical and geo-spatial.
• They are different types of clustering methods, including:
• Partitioning methods
• Hierarchical methods
• Density-based methods
• Grid-based methods
• Model-based methods
• Subspace methods
• Graph based methods
Partitioning
Method
• From a data set of n objects, a partitioning method constructs k partitions of
the
data, where each partition represents a cluster and k ≤n.
• That is, it divides the data into k groups such that each group must contain
at
least one object.
• In other words, partitioning methods conductone-level partitioning on
data
sets.
• To find clusters with complex shapes and for very large data sets,
partitioning-
Partitioning Method
• Algorithms under Partitioning Method:
• K-means- The main objective of the K-Means algorithm is to
minimize the
sum of distances between the points and their respective cluster centroid.
• K-medoids: A medoid can be defined as the point in the cluster,
whose
dissimilarities with all the other points in the cluster is minimum.
Hierarchical
methods
• A hierarchical clustering method works by grouping data objects into a tree of clusters

• A hierarchical method can be classified into: Agglomerative method and Divisive


method
based on how the hierarchical decomposition is formed.

• The agglomerative approach, also called the bottom-up approach, starts with each
object forming a separate group. It successively merges the objects or groups close to one
another, until all the groups are merged into one (the topmost level of the hierarchy), or a
termination condition holds.

• The divisive approach, also called the top-down approach, starts with all the objects in
the same cluster. In each successive iteration, a cluster is split into smaller clusters,
until eventually each object is in one cluster, or a termination condition holds.
Hierarchical methods
Hierarchical methods
• Algorithms under Hierarchical
method:
and Clustering using Hierarchies):
• BIRCH(Balanced Iterative is
designed
Reducingfor clustering a large amount of numeric data by integrating hierarchical
and other clustering methods such as iterative partitioning.

• Chameleon: Chameleon is a hierarchical clustering algorithm that uses


dynamic modeling to determine the similarity between pairs of clusters.
Density-based methods

• To clusters with
discover
arbitrary density-
shape,
clustering methods
basedhave
been
developed.
• The clusters are modeled
as dense regions in the data
space, separated by sparse
regions.
Density-based methods

• This is the main strategy behind density-based clustering


methods, which can discover clusters of non-spherical shape.

• These methods consider the clusters as the dense region having


some similarity and different from the lower dense region of the
space.

• These methods have good accuracy and ability to merge two clusters.
Density-based methods
• Algorithms under Density-based method:
• DBSCAN(Density-Based Spatial Clustering of Applications with
Noise):
DBSCAN grows clusters according to a density-based connectivity analysis.
• OPTICS (Ordering Points To Identify Clustering Structure):
OPTICS extends DBSCAN to produce a cluster ordering obtained from a
wide range of parameter settings.
• DENCLUE (DENsity-based CLUstEring): It clusters objects based on a
set
of density distribution functions.
Grid-based
methods
• The grid-based clustering approach uses a multi-resolution grid
data
structure.

• It quantizes the object space into a finite number of cells that form a
grid
structure on which all of the operations for clustering are performed.

• The main advantage of the approach is its fast processing time, which
is typically independent of the number of data objects, yet
dependent on only the number of cells in each dimension in the
Grid-based methods
Grid-based
methods
• Algorithms under Grid-based method:
• STING (Statistical Information Grid):The algorithm can be used on
spatial queries. The spatial area is divided into rectangle cells, which are
represented by a hierarchical structure. Statistical information regarding
the attributes in each grid cell is pre-computed and stored.
• CLIQUE (CLustering In QUEst): It is a simple grid-based
method for
finding density based clusters in subspaces.
Model-based methods
• Model-based method is based on probability models, such as the
finite
mixture model for probability densities.

• In other words, in model-based clustering, it is assumed that the data


are generated by a mixture of probability distributions in which
each component represents a different cluster.

• Thus a particular clustering method can be expected to work well


when the data conform to the model.
Subspace methods
• A subspace method searches various subspaces for clusters. Here, a cluster is a subset of objects that

are similar to each other in a subspace. The similarity is often captured by conventional measures
such as distance or density. For example, the CLIQUE algorithm is a subspace method algorithm.

• Generally there are two kinds of strategies:

• Bottom-up approaches start from low-dimensional subspaces and search higher


dimensional subspaces.

• Top-down approaches start from the full space and search smaller and smaller subspaces

recursively. Top-down approaches are effective only if the subspace of a cluster can be
determined by the local neighborhood.
Graph based
methods
• To find clusters in a graph, visualize cutting the graph into pieces, each piece being a cluster, such that the

vertices within a cluster are well connected and the vertices in different clusters are connected in a much weaker way.

• The size of the cut is the number of edges in the cut set. For weighted graphs, the size of a cut is the sum of the weights

of the edges in the cut set.

• In graph theory and some network applications, a minimum cut is of importance. A cut is minimum if the cut’s size is

not greater than any other cut’s size. There are polynomial time algorithms to compute minimum cuts of graphs.

• There are two kinds of methods for clustering graph data, which address some challenges such as High computational

cost, High dimensionality, Sparsity, etc. One uses clustering methods for high-dimensional data, while the other

is designed specifically for clustering graphs.


END
OF
UNIT
I

You might also like