Machine Learning Unit 4
Machine Learning Unit 4
STUDY MATERIALS
FOR
OF
VI SEMESTER BCA
1
Learning
Introduction:
Arthur Samuel, an early American leader in the field of computer gaming and
artificial intelligence, coined the term “Machine Learning” in 1959 while at
IBM. He defined machine learning as “the field of study that gives computers
the ability to learn without being explicitly programmed.” However, there is no
universally accepted definition for machine learning. Different authors define
the term differently.
Definition of learning
Definition A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its performance at tasks
T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given
classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
2
• training experience: A sequence of images and steering commands
recorded while observing a human driver
1. Data storage
Facilities for storing and retrieving huge amounts of data are an important
component of the learning process. Humans and computers alike utilize
data storage as a foundation for advanced reasoning.
• In a human being, the data is stored in the brain and data is retrieved
using electrochemical signals.
3
• Computers use hard disk drives, flash memory, random access memory
and similar devices to store data and use cables and other technology to
retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This
involves creating general concepts about the data as a whole. The creation
of knowledge involves application of known models and creation of new
models.
The process of fitting a model to a dataset is known as training. When
the model has been trained, the data is transformed into an abstract form
that summarizes the original information.
3. Generalization
The third component of the learning process is known as generalisation.
The term generalization describes the process of turning the knowledge
about stored data into a form that can be utilized for future action. These
actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to
discover those properties of the data that will be most relevant to future
tasks.
4. Evaluation
Evaluation is the last component of the learning process. It is the process
of giving feedback to the user to measure the utility of the learned
knowledge. This feedback is then utilized to effect improvements in the
whole learning process.
6
Learning
Forms of learning:At a high level, Machine learning tasks can be categorized
into three groups based on the desired output and the kind of input required to
produce it.
Supervised Learning
A training set of examples with the correct responses (targets) is provided and,
based on this training set, the algorithm generalizes to respond correctly to all
possible inputs. This is also called learning from exemplars. Supervised learning
is the machine learning task of learning a function that maps an input to an
output based on example input-output pairs.
7
Remarks
A “supervised learning” is so called because the process of an algorithm
learning from the training dataset can be thought of as a teacher supervising the
learning process. We know the correct answers (that is, the correct outputs), the
algorithm iteratively makes predictions on the training data and is corrected by
the teacher. Learning stops when the algorithm achieves an acceptable level of
performance.
Example
Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients and each patient is labeled as
“healthy” or “sick”.
Broadly, there are two types commonly used as supervised learning algorithms.
1) Regression: The output to be predicted is a continuous number in
relevance with a given input dataset. Example use cases are predictions
of retail sales, prediction of number of staff required for each shift,
8
number of car parking spaces required for a retail store, credit score, for a
customer, etc.
2) Classification: The output to be predicted is the actual or the probability
of an event/class and the number of classes to be predicted can be two or
more. The algorithm should learn the patterns in the relevant input of
each class from historical data and be able to predict the unseen class or
event in the future considering their input. An example use case is spam
email filtering where the output expected is to classify an email into
either a “spam” or “not spam.”
Unsupervised Learning
There are situations where the desired output class/event is unknown for
historical data. The objective in such cases would be to study the patterns in the
input dataset to get better understanding and identify similar patterns that can be
grouped into specific classes or events. As these types of algorithms do not
require any intervention from the subject matter experts beforehand, they are
called unsupervised learning.
Unsupervised learning is a type of machine learning algorithm used to draw
inferences from
9
datasets consisting of input data without labeled responses. In unsupervised
learning algorithms, a classification or categorization is not included in the
observations. There are no output values and so there is no estimation of
functions. Since the examples given to the learner are unlabeled, the accuracy of
the structure that is output by the algorithm cannot be evaluated. The most
common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.
Example
Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients.
Based on this data, can we infer anything regarding the patients entering the
clinic?
10
to simplify you may want to find the key variables that hold a significant
percentage (say 95%) of information and only use them for analysis.
• Anomaly Detection: Anomaly detection is also commonly known as
outlier detection is the identification of items, events or observations
which do not conform to an expected pattern or behavior in comparison
with other items in a given dataset. It has applicability in a variety of
domains, such as machine or system health monitoring, event detection,
fraud/intrusion detection etc.
Reinforcement Learning
The basic objective of reinforcement learning algorithms is to map situations to
actions that yield the maximum final reward. While mapping the action, the
algorithm should not just consider the immediate reward but also next and all
subsequent rewards. For example, a program to play a game or drive a car will
have to constantly interact with a dynamic environment in which it is expected
to perform a certain goal.
Example
Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did
that made it get the reward/punishment. We can use a similar method to train
computers to do many tasks, such as playing backgammon or chess, scheduling
jobs, and controlling robot limbs. Reinforcement learning is different from
supervised learning. Supervised learning is learning from examples provided by
a knowledgeable expert.
Here the target attribute PlayTennis, which can have values yes or no for
different Saturday mornings, is to be predicted based on other attributes of the
morning in question
14
FIGURE above shows A decision tree for the concept Play Tennis. An
example is classified by sorting it through the tree to the appropriate leaf
node, then returning the classification associated with this leaf (in this case,
Yes or No). This tree classifies Saturday mornings according to whether or
not they are suitable for playing tennis.
Figure above illustrates a typical learned decision tree. This decision tree
classifies Saturday mornings according to whether they are suitable for playing
tennis.
For example, the instance (Outlook = Sunny, Temperature = Hot, Humidity =
High, Wind = Strong) would be sorted down the leftmost branch of this decision
tree and would therefore be classified as a negative instance (i.e., the tree
predicts that Play Tennis = no).
15
For example, the decision tree shown in Figure above corresponds to the
expression
(Outlook = Sunny ꓥ Humidity = Normal)
V (Outlook = Overcast)
V (Outlook = Rain ꓥ Wind = Weak)
Decision tree learning is generally best suited to problems with the following
characteristics:
• Instances are represented by attribute-value pairs: Instances are
described by a fixed set of attributes (e.g., Temperature) and their values
(e.g., Hot). The easiest situation for decision tree learning is when each
attribute takes on a small number of disjoint possible values (e.g., Hot,
Mild, Cold).
• The targetfunction has discrete output value:The decision tree in
Figure above assigns a boolean classification (e.g., yes or no) to each
example. Decision tree methods easily extend to learning functions with
more than two possible output values. A more substantial extension
allows learning target functions with real-valued outputs, though the
application of decision trees in this setting is less common.
• Disjunctive descriptions may be require: As noted above, decision
trees naturally represent disjunctive expressions.
• The training data may contain errors: Decision tree learning methods
are robust to errors, both errors in classifications of the training examples
and errors in the attribute values that describe these examples.
• The training data may contain missing attribute values: Decision tree
methods can be used even when some training examples have unknown
values (e.g., if the Humidity of the day is known for only some of the
training examples).
16
An example of a decision tree can be explained using above binary tree. Let’s
say you want to predictwhether a person is fit given their information like age,
eating habit, and physical activity, etc. The decision nodes here are questions
like ‘What’s the age?’, ‘Does he exercise?’, and ‘Does he eat a lot ofpizzas’?
And the leaves, which are outcomes like either ‘fit’, or ‘unfit’. In this case this
was a binaryclassification problem (a yes no type problem). There are two main
types of Decision Trees:
17
Intuitively, it tells us about the predictability of a certain event. Example,
consider a coin toss whose
probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the
highest possible, sincethere’s no way of determining what the outcome might
be. Alternatively, consider a coin which hasheads on both the sides, the entropy
of such an event can be predicted perfectly since we know
beforehand that it’ll always be heads. In other words, this event has no
randomness hence it’s entropyis zero. In particular, lower values imply less
uncertainty while higher values imply high
uncertainty.
Alternatively,
ID3
ID3 Algorithm will perform following tasks recursively
1. Create root node for the tree
2. If all examples are positive, return leaf node „positive‟
3. Else if all examples are negative, return leaf node „negative‟
4. Calculate the entropy of current state H(S)
5. For each attribute, calculate the entropy with respect to the attribute „x‟
denoted by H(S, x)
6. Select the attribute which has maximum value of IG(S, x)
7. Remove the attribute that offers highest IG from the set of attributes
8. Repeat until we run out of all attributes, or the decision tree has all leaf
nodes.
Now we‟ll go ahead and grow the decision tree. The initial step is to calculate
H(S), the Entropy of the current state.
In the above example, we can see in total there are 5 No‟s and 9 Yes‟s.
where „x‟ are the possible values for an attribute. Here, attribute „Wind‟ takes
two possible values in the sampledata, hence x = {Weak, Strong} we‟ll have to
calculate:
19
Amongst all the 14 examples we have 8 places where the wind is weak and 6
where the wind is Strong.
Now out of the 8 Weak examples, 6 of them were „Yes‟ for Play Golf and 2 of
them were „No‟ for „Play Golf‟. So, we have,
20
Remember, here half items belong to one class while other half belong to other.
Hence we have perfect randomness.Now we have all the pieces required to
calculate the Information Gain,
Which tells us the Information Gain by considering „Wind‟ as the feature and
give us information gain of 0.048.
Now we must similarly calculate the Information Gain for all the features.
We can clearly see that IG(S, Outlook) has the highest information gain of
0.246, hence we chose Outlookattribute as the root node. At this point, the
decision tree looks like.
Here we observe that whenever the outlook is Overcast, Play Golf is always
‘Yes’, it’s no coincidence byany chance, the simple tree resulted because of the
highest information gain is given by the attributeOutlook. Now how do we
proceed from this point? We can simply apply recursion, you might want
21
tolook at the algorithm steps described earlier. Now that we’ve used Outlook,
we’ve got three of themremaining Humidity, Temperature, and Wind. And, we
had three possible values of Outlook: Sunny,Overcast, Rain. Where the
Overcast node already ended up having leaf node ‘Yes’, so we’re left withtwo
subtrees to compute: Sunny and Rain.
will give us Wind as the one with highest information gain. The final Decision
Tree looks something likethis. The final Decision Tree looks something like
this.
22
Regression:
Regression Analysis in Machine learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to
understand how the value of the dependent variable is changing corresponding
to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
We can understand the concept of regression analysis using the below example:
Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machinelearning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous output
variable based on the one or more predictor variables. It is mainly used for
23
prediction, forecasting, time series modeling, and determining the causal-
effectrelationship between variables.
In Regression, we plot a graph between the variables which best fits the given
data points, using this plot, the machine learning model can make predictions
about the data. In simple words, "Regressionshows a line or curve that passes
through all the data points on target-predictor graph in such a waythat the
vertical distance between the data points and the regression line is minimum."
The distance between data points and line tells whether a model has captured a
strong relationship or not.
Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.
24
Multicollinearity. It should not be present in the dataset, because it
creates problem while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the
training dataset but not well with test dataset, then such problem is called
Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.
Types of Regression:
There are various types of regressions which are used in data science and
machine learning. Each type has its own importance on different scenarios, but
at the core, all the regression methods analyze the effect of the independent
variable on dependent variables.
Here we are discussing some important types of regression which are given
below:
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression
Linear Regression:
• Linear regression is a statistical regression method which is used for
predictive analysis.
• It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.
25
• Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
• If there is only one input variable (x), then such linear regression is called
simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.
• The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.
Here,
Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Logistic Regression:
• Logistic regression is another supervised learning algorithm which is
used to solve the classification problems. In classification problems, we
have dependent variables in a binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as
0 or 1, Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of
probability.
• Logistic regression is a type of regression, but it is different from the
linear regression algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:
28
f(x)= Output between the 0 and 1 value.
x= input to the function
e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as
follows:
It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.
29
Linear regression assumes the linear relationship between the dependent
and independent variables.
• Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent
variables. Due to multicollinearity, it may difficult to find the true
relationship between the predictors and target variables. Or we can say, it
is difficult to determine which predictor variable is affecting the target
variable and which is not. So, the model assumes either little or no
multicollinearity between the features or independent variables.
• Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be
no clear pattern distribution of data in the scatter plot.
• Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then
confidence intervals will become either too wide or too narrow, which
may cause difficulties in finding coefficients. It can be checked using the
q-q plot. If the plot shows a straight line without any deviation, which
means
the error is normally distributed.
• No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If
there will be any correlation in the error term, then it will drastically
reduce the accuracy of the model. Autocorrelation usually occurs if there
is a dependency between residual errors.
30
examples.For certain types of problems, such as learning to interpret complex
real-world sensor data, artificial neural networks are among the most effective
learning methods currently known.
Figure 2 describes three neurons that perform "AND" logical operations. In this
case, the output neuron will fire if both input neurons are fired. The output
neurons use a threshold value (T), T=3/2 in this case. If none or only one input
neuron is fired, then the total input to the output becomes less than 1.5 and
31
firing for output is not possible. Take another scenario where both input neurons
are firing, and the total input becomes 1+1=2, which is greater than the
threshold value of 1.5, then output neurons will fire. Similarly, we can perform
the "OR” logical operation with the help of the same architecture but set the
new threshold to 0.5. In this case, the output neurons will be fired if at least one
input is fired.
1
T=3/2
32
Training of Neural Networks.These trained neural networks solve specific
problems as defined in the problem statement.
Types of tasks that can be solved using an artificial neural network include
Classification problems, Pattern Matching, Data Clustering, etc.
33
It determines weighted total is passed as an input to an activation function to
produce the output. Activation functions choose whether a node should fire or
not. Only those who are fired make it to the output layer. There are distinctive
activation functions available that can be applied upon the sort of task we are
performing.
34
appropriate network structure is accomplished through experience, trial,
and error.
• Unrecognized behavior of the network: It is the most significant issue
of ANN. When ANN produces a testing solution, it does not provide
insightconcerning why and how. It decreases trust in the network.
• Hardware dependence: Artificial neural networks need processors with
parallel processing power, as per their structure. Therefore,the realization
of the equipment is dependent.
• Difficulty of showing the issue to the network: ANNs can work with
numerical data. Problems must be converted into numerical values before
beingintroduced to ANN. The presentation mechanism to be resolved
here will directly impact the performance of thenetwork. It relies on the
user's abilities.
• The duration of the network is unknown: The network is reduced to a
specific value of the error, and this value does not give us optimum
results.
“Science artificial neural networks that have steeped into the world in the
mid-20th century are exponentially developing. In the present time, we
have investigated the pros of artificial neural networks and the issues
encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which area flourishing science branch,
are eliminated individually, and their pros are increasing day by day. It
means thatartificial neural networks will turn into an irreplaceable part
of our lives progressively important.”
35
some similarities with a more complex biological partner and are very effective
at their expected tasks. For example, segmentation or classification.
• Feedback ANN: In this type of ANN, the output returns into the network
to accomplish the best-evolved results internally.As per the University of
Massachusetts, Lowell Centre for Atmospheric Research. The feedback
networks feedinformation back into itself and are well suited to solve
optimization issues. The Internal system error correctionsutilize feedback
ANNs.
• Feed-Forward ANN: A feed-forward network is a basic neural network
comprising of an input layer, an output layer, and at leastone layer of a
neuron. Through assessment of its output by reviewing its input, the
intensity of the network can benoticed based on group behavior of the
associated neurons, and the output is decided. The primary advantage of
this network is that it figures out how to evaluate and recognize input
patterns.
• Prerequisite
No specific expertise is needed as a prerequisite before starting this
tutorial.
• Audience
Our Artificial Neural Network Tutorial is developed for beginners as well
as professionals, to help them understand the basic concept of ANNs.
37
The activation function for a simple one-level hidden layer of a multilayer
perceptron can be given by:
A multilayered neural network can have many hidden layers, where the network
holds its internal abstract representation of the training sample. The upper layers
will be building new abstractions on top of the previous layers. So having more
hidden layers for a complex dataset will help the neural network to learn better.
Figure above, the MLP architecture has a minimum of three layers, that is,
input, hidden, and output layers. The input layer’s neuron count will be equal to
the total number of features and in some libraries an additional neuron for
intercept/bias. These neurons are represented as nodes. The output layers will
have a single neuron for regression models and binary classifier; otherwise it
will be equal to the total number of class labels for multiclass classification
models.
using too few neurons for a complex dataset can result in an under-fitted model
due to the fact that it might fail to learn the patterns in complex data. However,
using too many neurons can result in an over-fitted model as it has capacity to
capture patterns that might be noise or specific for the given training dataset.
38
Moreover, the activation function does not have a helpful derivative as its
derivative is 0 everywhere. Therefore, it doesn't work for Backpropagation, a
fundamental and valuable concept in multilayer perceptron.
The most popular neural networks activation functions.
Binary Step Function: Binary step function depends on a threshold value that
decides whether a neuron should be activated or not. The input fed to the
activation function is compared to a certain threshold; if the input is greater than
it, then the neuron is activated, else it is deactivated, meaning that its output is
not passed on to the next hidden layer.
40
• A loss function, which is a distance function to measure the
information loss between the compressed representation of data
and the decompressed representation. Reconstruction error can be
measured using traditional squared error ||x-z||2 .
CNN consists of four main types of layers: input layer, convolution layer,
pooling layer, fully connected layer.
The input layer will hold the raw pixel, so an image of CIFAR-10 will have
32x32x3 dimensions of input layer. The convolution layer will compute a dot
product between the weights of small local regions from the input layer, so if
we decide to have 5 filters the resulted reduced dimension will be 32x32x5. The
41
RELU layer will apply an element-wise activation function that will not affect
the dimension. The Pool layer will down sample the spatial dimension along
width and height, resulting in dimension 16x16x5. Finally, the fully connected
layer will compute the class score, and the resulted dimension will be a single
vector 1x1x10 (10 class scores). Each neural in this layer is connected to all
numbers in the previous volume.
42
The previous step’s hidden layer and final outputs are fed back into the network
and will be used as input to the next steps’ hidden layer, which means the
network will remember the past and it will repeatedly predict what will happen
next. The drawback in the general RNN architecture is that it can be memory
heavy, and hard to train for longterm temporal dependency (i.e., context of long
text should be known at any given stage).
43
• Input gate layer: This decides which values to store in the cell state
• Forget gate layer: As the name suggested this decides what information
to throw away from the cell state
• Output gate layer: Create a vector of values that can be added to the cell
state.
• Memory cell state vector
The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put the
new data point in the correct category in the future. This best decision boundary
is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
44
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange creature.
So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme
case of cat and dog. On the basis of the support vectors, it will classify it as a
cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
45
boundary that helps to classify the data points. This best boundary is known as
the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be
a straight line. And if there are 3 features, then hyperplane will be a 2-
dimension plane.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed as
Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.
So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:
46
Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support
vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below
image:
47
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add
a third dimension z. It can be calculated as:z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way.
Consider the below image: If we convert it in 2d space with z=1, then it will
become as:
48
Unsupervised Machine Learning:
Introduction to clustering
49
Why use Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised
Learning:
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their
own experiences,which makes it closer to the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which
make unsupervisedlearning more important.
• In real-world, we do not always have input data with the corresponding
output so to solve suchcases, we need unsupervised learning.
50
Here, we have taken an unlabeled input data, which means it is not categorized
andcorresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learningmodel in order to train it. Firstly, it will interpret the
raw data to find the hidden patterns from the dataand then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.Once it applies the
suitable algorithm, the algorithm divides the data objects into groups according
tothe similarities and difference between the objects.
The unsupervised learning algorithm can be further categorized into two types
of problems:
53
K-Mean Clustering
k-means clustering algorithm
One of the most used clustering algorithm is k-means. It allows to group the
data according to the existing similarities among them in k clusters, given as
input to the algorithm. I’ll start with a simple example.
Let’s imagine we have 5 objects (say 5 people) and for each of them we know
two features (height and weight). We want to group them into k=2 clusters.
First of all, we have to initialize the value of the centroids for our clusters. For
instance, let’s choose Person 2 and Person 3 as the two centroids c1 and c2, so
that c1=(120,32) and c2=(113,33).
Now we compute the euclidian distance between each of the two centroids and
each point in the data. If you did all the calculations, you should have come up
with the following numbers:
At this point, we will assign each object to the cluster it is closer to (that is
taking the minimum between the two computed distances for each object).
54
We can then arrange the points as follows:
Person 1 → cluster 1
Person 2 → cluster 1
Person 3 → cluster 2
Person 4 → cluster 1
Person 5→ cluster 2
Let’s iterate, which means to redefine the centroids by calculating the mean of
the members of each of the two clusters.
So c’1 = ((167+120+175)/3, (55+32+76)/3) = (154, 54.3) and c’2 =
((113+108)/2, (33+25)/2) = (110.5, 29)
Then, we calculate the distances again and re-assign the points to the new
centroids. We repeat this process until the centroids don’t move anymore (or the
difference between them is
under a certain small threshold).
In our case, the result we get is given in the figure below. You can see the two
different clusters labeled with two different colours and the position of the
centroids, given by the crosses.
55
As you probably already know, I’m using Python libraries to analyze my data.
The k-means algorithm is implemented in the scikit-learn package. To use it,
you will just need the following line in your script:
APPLICATIONS OF AI
Components of NLP
Phases of NLP
There are the following five phases of NLP:
57
1. Lexical Analysis and Morphological: The first phase of NLP is the Lexical
Analysis. This phase scans the source code as a stream of characters and
converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.
5. Pragmatic Analysis: Pragmatic is the fifth and last phase of NLP. It helps
you to discover the intended effect by applying a set of rules that characterize
cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.
58
Applications of NLP:
59
Sample of NLP Preprocessing Techniques
• Tokenization: Tokenization splits raw text (for example., a sentence or a
document) into a sequence of tokens, such as words or subword pieces.
Tokenization is often the first step in an NLP processing pipeline. Tokens
are commonly recurring sequences of text that are treated as atomic units
in later processing. They may be words, subword units called morphemes
(for example, prefixes such as “un-“ or suffixes such as “-ing” in
English), or even individual characters.
• Bag-of-words models: Bag-of-words models treat documents as
unordered collections of tokens or words (a bag is like a set, except that it
tracks the number of times each element appears). Because they
completely ignore word order, bag-of-words models will confuse a
sentence such as “dog bites man” with “man bites dog.” However, bag-
of-words models are often used for efficiency reasons on large
information retrieval tasks such as search engines. They can produce
close to state-of-the-art results with longer documents.
• Stop word removal: A “stop word” is a token that is ignored in later
processing. They are typically short, frequent words such as “a,” “the,” or
“an.” Bag-of-words models and search engines often ignore stop words in
order to reduce processing time and storage within the database. Deep
neural networks typically do take word-order into account (that is, they
are not bag-of-words models) and do not do stop word removal because
stop words can convey subtle distinctions in meaning (for example, “the
package was lost” and “a package is lost” don’t mean the same thing,
even though they are the same after stop word removal).
• Stemming and lemmatization: Morphemes are the smallest meaning-
bearing elements of language. Typically morphemes are smaller than
words. For example, “revisited” consists of the prefix “re-“, the stem
“visit,” and the past-tense suffix “-ed.” Stemming and lemmatization map
words to their stem forms (for example, “revisit” + PAST). Stemming
and lemmatization are crucial steps in pre-deep learning models, but deep
learning models generally learn these regularities from their training data,
and so do not require explicit stemming or lemmatization steps.
• Part-of-speech tagging and syntactic parsing: Part-of-speech (PoS)
tagging is the process of labeling each word with its part of speech (for
example, noun, verb, adjective, etc.). A Syntactic parser identifies how
words combine to form phrases, clauses, and entire sentences. PoS
60
tagging is a sequence labeling task, syntactic parsing is an extended kind
of sequence labeling task, and deep neural Nntworks are the state-of-the-
art technology for both PoS tagging and syntactic parsing. Before deep
learning, PoS tagging and syntactic parsing were essential steps in
sentence understanding. However, modern deep learning NLP models
generally only benefit marginally (if at all) from PoS or syntax
information, so neither PoS tagging nor syntactic parsing are widely used
in deep learning NLP.
Text Classification:
Text Classification, also known as text categorization or text tagging, is a
technique used in machine learning and artificial intelligence to automatically
categorize text into predefined classes or categories. It involves training a model
on a labeled dataset, where each text example is associated with a specific class
or category. The trained model can then be used to classify new, unseen texts
into the appropriate categories.
The text classification process involves several steps, from data collection to
model deployment. Here is a quick overview of how it works:
Step3:Tokenization
Break the text apart into tokens, which are small units like words. Tokens help
find matches and connections by creating individually searchable parts. This
step is especially useful for vector search and semantic search, which give
results based on user intent.
63
Workflow for solving machine learning problems
Information Retrieval:
Information retrieval is defined as a completely automated procedure that
answers to a user query by reviewing a group of documents and producing a
sorted document list that ought to be relevant to the user's query criteria. As a
result, it is a collection of algorithms that improves the relevancy of presented
materials to searched queries. In other words, it sorts and ranks content
according to a user's query. There is consistency in the query and content in the
document to provide document accessibility.
A retrieval model (IR) chooses and ranks relevant pages based on a user's query.
Document selection and ranking can be formalized using matching functions
64
that return retrieval status values (RSVs) for each document in a collection since
documents and queries are written in the same way. The majority of IR systems
portray document contents using a collection of descriptors known as words
from a vocabulary V.
• The estimation of the likelihood of user relevance for each page and
query in relation to a collection of q training documents.
• In a vector space, the similarity function between queries and documents
is computed.
2. Non-Classic IR Model
It is diametrically opposed to the traditional IR model. Addition than
probability, similarity, and Boolean operations, such IR models are based
on other ideas. Non-classical IR models include situation theory models,
information logic models, and interaction models.
3. Alternative IR Model
It is an improvement to the traditional IR model that makes use of some
unique approaches from other domains. Alternative IR models include
fuzzy models, cluster models, and latent semantic indexing (LSI) models.
As computing power grows and storage costs fall, the quantity of data we deal
with on a daily basis grows tremendously. However, without a mechanism to
obtain and query the data, the information we collect is useless. Information
retrieval system is critical for making sense of data. Consider how difficult it
65
would be to discover information on the Internet without Google or other search
engines. Without information retrieval methods, information is not knowledge.
Text indexing and retrieval systems may index data in these data repositories
and allow users to search against it. Thus, retrieval systems provide users with
online access to information that they may not be aware of, and they are not
required to know or care about where the information is housed. Users can query
all information that the administrator has decided to index with a single search.
Speech Recognition
66
Due to how context-specific and extremely varied human speech is, voice
recognition algorithms must adjust. Different speech patterns, speaking styles,
languages, dialects, accents, and phrasings are used to train the software
algorithms that process and organize audio into text. The software also
distinguishes speech sounds from the frequently present background noise.
Speech recognition systems utilize one of two types of models to satisfy these
requirements:
Although voice recognition has many uses and advantages, there are also many
difficulties because of the intricacy of the software.
68
simple and effective framework for modeling the temporal structure of
audio and voice signals and the sequence of phonemes that make up a
word. For this reason, most of today’s speech recognition systems are
based on an HMM.
• Dynamic time warping (DTW). DTW is used to compare two separate
sequences of speech that are different in speed. For example, you have
two audio recordings of someone saying “good morning” – one slow, one
fast. In this case, the DTW algorithm can sync the two recordings, even
though they differ in speed and length.
• Artificial neural networks (ANN). ANN is a computational model used
in speech recognition applications that helps computers understand
spoken human language. It uses deep learning techniques and basically
imitates the patterns of how neural networks work in the human brain,
which allows the computer to make decisions in a human-like manner.
Image Processing
69
Theelements of image processing
AI image processing works through a combination of advanced algorithms,
neural networks, and data processing to analyze, interpret, and manipulate
digital images. Here's a simplified overview of how AI image processing works:
Acquisition of image: The initial level begins with image pre-processing which
uses a sensor to capture the image and transform it into a usable format.
• Data Privacy and Security: The reliance on vast amounts of data raises
concerns about privacy and security. Handling sensitive visual
information, such as medical images or surveillance footage, demands
robust safeguards against unauthorized access and misuse.
• Bias: AI image processing models can inherit biases present in training
data, leading to skewed or unfair outcomes. Striving for fairness and
minimizing bias is crucial, especially when making decisions that impact
individuals or communities.
• Robustness and Generalization: Ensuring that AI models perform
reliably across different scenarios and environments is challenging.
Models need to be robust enough to handle variations in lighting,
weather, and other real-world conditions.
• Interpretable Results: While AI image processing can deliver
impressive results, understanding why a model makes a certain prediction
remains challenging. Explaining complex decisions made by deep neural
networks is an ongoing area of research.
Applications:
71
1. Medical Imaging: AI-powered image processing is used in medical
imaging to detect and diagnose diseases such as cancer, Alzheimer’s, and
heart disease. It can also be used to monitor the progression of a disease
and to evaluate the effectiveness of treatment .
2. Surveillance and Security: AI algorithms can be used to analyze images
from surveillance cameras to detect suspicious behavior, identify
individuals, and track their movements. This can help prevent crime and
improve public safety
3.
Industrial Inspection: AI-powered image processing can be used to
inspect products on assembly lines to detect defects and ensure quality
control. It can also be used to monitor equipment and detect potential
problems before they cause downtime
4.
Scientific Image Analysis: AI algorithms can be used to analyze images
from scientific experiments to extract meaningful data. For example, AI
can be used to analyze images of cells to detect changes in their shape,
size, and structure
5.
Photo Editing: AI-powered image processing can be used to enhance
photos by removing noise, adjusting brightness and contrast, and
improving color accuracy. It can also be used to add special effects and
filters to photos
Computer Vision
72
1. Autonomous Vehicles: Computer vision is used in autonomous vehicles
to detect and recognize objects such as pedestrians, other vehicles, and
traffic signals. This helps the vehicle navigate safely and avoid accidents
2.
Retail: Computer vision is used in retail to track inventory, monitor
customer behavior, and analyze shopping patterns. It can also be used to
personalize the shopping experience for customers
3. Healthcare: Computer vision is used in healthcare to diagnose diseases,
monitor patients, and assist with surgical procedures. It can also be used
to analyze medical images such as X-rays and MRIs
4. Security: Computer vision is used in security to detect and recognize
faces, track individuals, and monitor public spaces. It can also be used to
detect suspicious behavior and prevent crime .
5. Entertainment: Computer vision is used in entertainment to create
special effects, enhance video games, and develop virtual reality
experiences. It can also be used to analyze audience reactions and
preferences.
Difference between Computer vision and Image processing
73
Robotics
74
Motivation behind Robotics: To cope up with increasing demands of a
dynamic and competitive market, modern manufacturing methods should satisfy
the following requirements:
Asimov's laws of robotics: The Three Laws of Robotics or Asimov's Laws are
a set of rules devised by the science fiction author Isaac Asimov
1. First Law - A robot may not injure a human being or, through inaction,
allow a human being to come to harm.
2. Second Law - A robot must obey the orders given it by human beings
except where such orders would conflict with the First Law.
3. Third Law - A robot must protect its own existence as long as such
protection does not conflict with the First or Second Laws.
Automation and robotics:- Automation and robotics are two closely related
technologies. In an industrial context, we can dean automation as a technology
that is concerned with the use of mechanical, electronic, and computer-based
systems in the operation and control of production Examples of this technology
include transfer lines. Mechanized assembly machines, feedback control
systems (applied to industrial processes), numerically controlled machine tools,
and robots. Accordingly, robotics is a form of industrial automation. Ex:-
Robotics, CAD/CAM, FMS, CIMS
Applications of Robotics :
1. Androids
Androids are robots that resemble humans. They are often mobile, moving
around on wheels or a track drive. According to the American Society of
Mechanical Engineers, these humanoid robots are used in areas such as
caregiving and personal assistance, search and rescue, space exploration and
research, entertainment and education, public relations and healthcare, and
manufacturing.
2. Telechir
A telechir is a complex robot that is remotely controlled by a human operator
for a telepresence system. It gives that individual the sense of being on location
in a remote, dangerous or alien environment, and enables them to interact with it
since the telechir continuously provides sensory feedback.
3. Telepresence robot
A telepresence robot simulates the experience -- and some capabilities -- of
being physically present at a location. It combines remote monitoring and
control via telemetry sent over radio, wires or optical fibers, and enables remote
business consultations, healthcare, home monitoring, childcare and more.
4. Industrial robot
77
The IFR (International Federation of Robotics) defines an industrial robot as an
"automatically controlled, reprogrammable multipurpose manipulator
programmable in three or more axes." Users can adapt these robots to different
applications as well. Combining these robots with AI has helped businesses
move them beyond simple automation to higher-level and more complex tasks.
5.Swarm robot
Swarm robots (aka insect robots) work in fleets ranging from a few to
thousands, all under the supervision of a single controller. These robots are
analogous to insect colonies, in that they exhibit simple behaviors individually,
but demonstrate behaviors that are more sophisticated with an ability to carry
out complex tasks in total.
6. Smart robot
This is the most advanced kind of robot. The smart robot has a built-in AI
system that learns from its environment and experiences to build knowledge and
enhance capabilities to continuously improve. A smart robot can collaborate
with humans and help solve problems in areas like the following:
• agricultural labor shortages;
• food waste;
• study of marine ecosystems;
• product organization in warehouses; and
• clearing of debris from disaster zones
78