100% found this document useful (1 vote)
104 views

Machine Learning Unit 4

Uploaded by

Niki Gowthu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
104 views

Machine Learning Unit 4

Uploaded by

Niki Gowthu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

MANGALORE UNIVERSITY

National Education Policy – 2020


[NEP-2020]

STUDY MATERIALS

FOR

Artificial Intelligence and Applications-UNIT-IV

OF

VI SEMESTER BCA

1
Learning
Introduction:

Machine learning is programming computers to optimize a performance


criterion using example data or past experience. We have a model defined up to
some parameters, and learning is the execution of a computer program to
optimize the parameters of the model using the training data or past experience.
The model may be predictive to make predictions in the future, or descriptive to
gain knowledge from data, or both.

Arthur Samuel, an early American leader in the field of computer gaming and
artificial intelligence, coined the term “Machine Learning” in 1959 while at
IBM. He defined machine learning as “the field of study that gives computers
the ability to learn without being explicitly programmed.” However, there is no
universally accepted definition for machine learning. Different authors define
the term differently.

Definition of learning
Definition A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its performance at tasks
T, as measured by P, improves with experience E.
Examples
i) Handwriting recognition learning problem
• Task T: Recognizing and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given
classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error

2
• training experience: A sequence of images and steering commands
recorded while observing a human driver

iii) A chess learning problem


• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself

Definition: A computer program which learns from experience is called


a machine learning program or simply a learning program. Such a
program is sometimes also referred to as a learner.

1.2 Components of Learning


Basic components of learning process
The learning process, whether by a human or a machine, can be divided into
four components, namely, data storage, abstraction, generalization and
evaluation. Figure 1.1 illustrates the various components and the steps involved
in the learning process.

1. Data storage
Facilities for storing and retrieving huge amounts of data are an important
component of the learning process. Humans and computers alike utilize
data storage as a foundation for advanced reasoning.
• In a human being, the data is stored in the brain and data is retrieved
using electrochemical signals.
3
• Computers use hard disk drives, flash memory, random access memory
and similar devices to store data and use cables and other technology to
retrieve data.

2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This
involves creating general concepts about the data as a whole. The creation
of knowledge involves application of known models and creation of new
models.
The process of fitting a model to a dataset is known as training. When
the model has been trained, the data is transformed into an abstract form
that summarizes the original information.

3. Generalization
The third component of the learning process is known as generalisation.
The term generalization describes the process of turning the knowledge
about stored data into a form that can be utilized for future action. These
actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before. In generalization, the goal is to
discover those properties of the data that will be most relevant to future
tasks.
4. Evaluation
Evaluation is the last component of the learning process. It is the process
of giving feedback to the user to measure the utility of the learned
knowledge. This feedback is then utilized to effect improvements in the
whole learning process.

Machine learning is a subfield of computer science that evolved from the


study of pattern recognition and computational learning theory in
Artificial Intelligence (AI).
4
• Another definition of machine learning as a “Field of study that gives
computers the ability to learn without being explicitly programmed.
• Machine learning is a field of computer science that involves using
statistical methods to create programs that either improve performance
over time, or detect patterns in massive amounts of data that humans
would be unlikely to find.
• Machine Learning explores the study and construction of algorithms that
can learn from and make predictions on data. Such algorithms operate by
building a model from example inputs in order to make data driven
predictions or decisions, rather than following strictly static program
instructions.

In short, “Machine Learning is a collection of algorithms and techniques used to


create computational systems that learn from data in order to make predictions
and inferences.”

Machine learning application area is abounding.

• Recommendation System: YouTube brings videos for each of its users


based on a recommendation system that believes that the individual user
will be interested in. Similarly Amazon and other such e-retailers suggest
products that the customer will be interested in and likely to purchase by
looking at the purchase history for a customer and a large inventory of
products.
• Spam detection: Email service providers use a machine learning model
that can automatically detect and move the unsolicited messages to the
spam folder.
• Prospect customer identification: Banks, insurance companies, and
financial organizations have machine learning models that trigger alerts
so that organizations intervene at the right time to start engaging with the
right offers for the customer and persuade them to convert early. These
models observe the pattern of behavior by a user during the initial period
5
and map it to the past behaviors of all users to identify those that will buy
the product and those that will not.

Application of machine learning methods to large databases is called data


mining. In data mining, a large volume of data is processed to construct a
simple model with valuable use, for example, having high predictive accuracy.

The following is a list of some of the typical applications of machine learning.


1. In retail business, machine learning is used to study consumer behaviour.
2. In finance, banks analyze their past data to build models to use in credit
applications, fraud detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and
troubleshooting.
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network
optimization and maximizing the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can
only be analyzed fast enough by computers. The World Wide Web is huge;
it is constantly growing and searching for relevant information cannot be
done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to
changes so that the system designer need not foresee and provide solutions
for all possible situations.
8. It is used to find solutions to many problems in vision, speech
recognition, and robotics.
9. Machine learning methods are applied in the design of computer-
controlled vehicles to steer correctly when driving on a variety of roads.
10. Machine learning methods have been used to develop programmes for
playing games such as chess, backgammon and Go.

6
Learning
Forms of learning:At a high level, Machine learning tasks can be categorized
into three groups based on the desired output and the kind of input required to
produce it.

Supervised Learning

A training set of examples with the correct responses (targets) is provided and,
based on this training set, the algorithm generalizes to respond correctly to all
possible inputs. This is also called learning from exemplars. Supervised learning
is the machine learning task of learning a function that maps an input to an
output based on example input-output pairs.

In supervised learning, each example in the training set is a pair consisting of an


input object (typically a vector) and an output value. A supervised learning
algorithm analyzes the training data and produces a function, which can be used
for mapping new examples. In the optimal case, the function will correctly
determine the class labels for unseen instances. Both classification and
regressionproblems are supervised learning problems. A wide range of
supervised learning algorithms are available, each with its strengths and
weaknesses. There is no single learning algorithm that works best on all
supervised learning problems.

7
Remarks
A “supervised learning” is so called because the process of an algorithm
learning from the training dataset can be thought of as a teacher supervising the
learning process. We know the correct answers (that is, the correct outputs), the
algorithm iteratively makes predictions on the training data and is corrected by
the teacher. Learning stops when the algorithm achieves an acceptable level of
performance.

Example
Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients and each patient is labeled as
“healthy” or “sick”.

Broadly, there are two types commonly used as supervised learning algorithms.
1) Regression: The output to be predicted is a continuous number in
relevance with a given input dataset. Example use cases are predictions
of retail sales, prediction of number of staff required for each shift,

8
number of car parking spaces required for a retail store, credit score, for a
customer, etc.
2) Classification: The output to be predicted is the actual or the probability
of an event/class and the number of classes to be predicted can be two or
more. The algorithm should learn the patterns in the relevant input of
each class from historical data and be able to predict the unseen class or
event in the future considering their input. An example use case is spam
email filtering where the output expected is to classify an email into
either a “spam” or “not spam.”

Building supervised learning machine learning models has three stages:


1. Training: The algorithm will be provided with historical input data with
the mapped output. The algorithm will learn the patterns within the input
data for each output and represent that as a statistical equation, which is
also commonly known as a model.
2. Testing or validation: In this phase the performance of the trained
model is evaluated, usually by applying it on a dataset (that was not used
as part of the training) to predict the class or event.
3. Prediction: Here we apply the trained model to a data set that was not
part of either the training or testing. The prediction will be used to drive
business decisions.

Unsupervised Learning
There are situations where the desired output class/event is unknown for
historical data. The objective in such cases would be to study the patterns in the
input dataset to get better understanding and identify similar patterns that can be
grouped into specific classes or events. As these types of algorithms do not
require any intervention from the subject matter experts beforehand, they are
called unsupervised learning.
Unsupervised learning is a type of machine learning algorithm used to draw
inferences from
9
datasets consisting of input data without labeled responses. In unsupervised
learning algorithms, a classification or categorization is not included in the
observations. There are no output values and so there is no estimation of
functions. Since the examples given to the learner are unlabeled, the accuracy of
the structure that is output by the algorithm cannot be evaluated. The most
common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.

Example
Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients.

Based on this data, can we infer anything regarding the patients entering the
clinic?

some examples of unsupervised learning.


• Clustering: Assume that the classes are not known beforehand for a
given dataset. The goal here is to divide the input dataset into logical
groups of related items. Some examples are grouping similar news
articles, grouping similar customers based on their profile, etc.
• Dimension Reduction: Here the goal is to simplify a large input dataset
by mapping them to a lower dimensional space. For example, carrying
analysis on a large dimension dataset is very computational intensive, so

10
to simplify you may want to find the key variables that hold a significant
percentage (say 95%) of information and only use them for analysis.
• Anomaly Detection: Anomaly detection is also commonly known as
outlier detection is the identification of items, events or observations
which do not conform to an expected pattern or behavior in comparison
with other items in a given dataset. It has applicability in a variety of
domains, such as machine or system health monitoring, event detection,
fraud/intrusion detection etc.

Semi-supervised machine learning


This algorithms fall somewhere in between supervised and unsupervised
learning, since they use both labeled and unlabeled data for training – typically
a small amount of labeled data and a large amount of unlabeled data. The
systems that use this method are able to considerably improve learning
accuracy. Usually, semi-supervised learning is chosen when the acquired
labeled data requires skilled and relevant resources in order to train it / learn
from it. Otherwise, acquiring unlabeled data generally doesn‘t require additional
resources.

Reinforcement Learning
The basic objective of reinforcement learning algorithms is to map situations to
actions that yield the maximum final reward. While mapping the action, the
algorithm should not just consider the immediate reward but also next and all
subsequent rewards. For example, a program to play a game or drive a car will
have to constantly interact with a dynamic environment in which it is expected
to perform a certain goal.

This is somewhere between supervised and unsupervised learning. The


algorithm gets told when the answer is wrong, but does not get told how to
correct it. It has to explore and try out different possibilities until it works out
how to get the answer right. Reinforcement learning is sometime called learning
11
with a critic because of this monitor that scores the answer, but does not suggest
improvements.

Reinforcement learning is the problem of getting an agent to act in the world so


as to maximize its rewards. A learner (the program) is not told what actions to
take as in most forms of machine learning, but instead must discover which
actions yield the most reward by trying them. In the most interesting and
challenging cases, actions may affect not only the immediate reward but also the
next situations and, through that, all subsequent rewards.

Example
Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did
that made it get the reward/punishment. We can use a similar method to train
computers to do many tasks, such as playing backgammon or chess, scheduling
jobs, and controlling robot limbs. Reinforcement learning is different from
supervised learning. Supervised learning is learning from examples provided by
a knowledgeable expert.

Examples of reinforcement learning techniques are the following:


• Markov decision process
• Q-learning
• Temporal Difference methods
• Monte-Carlo methods

Issues in Machine Learning


The field of machine learning, and much of this book, is concerned with
answering questions such as the following:
• What algorithms exist for learning general target functions from
specific training examples? In what settings will particular
algorithms converge to the desired function, given sufficient training
12
data? Which algorithms perform best for which types of problems
and representations?
• How much training data is sufficient? What general bounds can be
found to relate the confidence in learned hypotheses to the amount of
training experience and the character of the learner's hypothesis
space?
• When and how can prior knowledge held by the learner guide the
process of generalizing from examples? Can prior knowledge be
helpful even when it is only approximately correct?
• What is the best strategy for choosing a useful next training
experience, and how does the choice of this strategy alter the
complexity of the learning problem?
• What is the best way to reduce the learning task to one or more
function approximation problems? Put another way, what specific
functions should the system attempt to learn? Can this process itself
be automated?
• How can the learner automatically alter its representation to improve
its ability to represent and learn the target function?

Machine Learning-Decision Trees

Decision tree learning is a method for approximating discrete-valued target


functions, in which the learned function is represented by a decision tree.
Learned trees can also be re-represented as sets of if-then rules to improve
human readability. These learning methods are among the most popular of
inductive inference algorithms and have been successfully applied to a broad
range of tasks from learning to diagnose medical cases to learning to assess
credit risk of loan applicants.

Decision Tree Representation Decision trees classify instances by sorting


them down the tree from the root to some leaf node, which provides the
13
classification of the instance. Each node in the tree specifies a test of some
attribute of the instance, and each branch descending from that node
corresponds to one of the possible values for this attribute. An instance is
classified by starting at the root node of the tree, testing the attribute specified
by this node, then moving down the tree branch corresponding to the value of
the attribute in the given example. This process is then repeated for the subtree
rooted at the new node.

Here the target attribute PlayTennis, which can have values yes or no for
different Saturday mornings, is to be predicted based on other attributes of the
morning in question

14
FIGURE above shows A decision tree for the concept Play Tennis. An
example is classified by sorting it through the tree to the appropriate leaf
node, then returning the classification associated with this leaf (in this case,
Yes or No). This tree classifies Saturday mornings according to whether or
not they are suitable for playing tennis.

Figure above illustrates a typical learned decision tree. This decision tree
classifies Saturday mornings according to whether they are suitable for playing
tennis.
For example, the instance (Outlook = Sunny, Temperature = Hot, Humidity =
High, Wind = Strong) would be sorted down the leftmost branch of this decision
tree and would therefore be classified as a negative instance (i.e., the tree
predicts that Play Tennis = no).

In general, decision trees represent a disjunction of conjunctions of constraints


on the attribute values of instances. Each path from the tree root to a leaf
corresponds to a conjunction of attribute tests, and the tree itself to a disjunction
of these conjunctions.

15
For example, the decision tree shown in Figure above corresponds to the
expression
(Outlook = Sunny ꓥ Humidity = Normal)
V (Outlook = Overcast)
V (Outlook = Rain ꓥ Wind = Weak)

Decision tree learning is generally best suited to problems with the following
characteristics:
• Instances are represented by attribute-value pairs: Instances are
described by a fixed set of attributes (e.g., Temperature) and their values
(e.g., Hot). The easiest situation for decision tree learning is when each
attribute takes on a small number of disjoint possible values (e.g., Hot,
Mild, Cold).
• The targetfunction has discrete output value:The decision tree in
Figure above assigns a boolean classification (e.g., yes or no) to each
example. Decision tree methods easily extend to learning functions with
more than two possible output values. A more substantial extension
allows learning target functions with real-valued outputs, though the
application of decision trees in this setting is less common.
• Disjunctive descriptions may be require: As noted above, decision
trees naturally represent disjunctive expressions.
• The training data may contain errors: Decision tree learning methods
are robust to errors, both errors in classifications of the training examples
and errors in the attribute values that describe these examples.
• The training data may contain missing attribute values: Decision tree
methods can be used even when some training examples have unknown
values (e.g., if the Humidity of the day is known for only some of the
training examples).

16
An example of a decision tree can be explained using above binary tree. Let’s
say you want to predictwhether a person is fit given their information like age,
eating habit, and physical activity, etc. The decision nodes here are questions
like ‘What’s the age?’, ‘Does he exercise?’, and ‘Does he eat a lot ofpizzas’?
And the leaves, which are outcomes like either ‘fit’, or ‘unfit’. In this case this
was a binaryclassification problem (a yes no type problem). There are two main
types of Decision Trees:

1. Classification trees (Yes/No types)


What we have seen above is an example of classification tree, where the
outcome was a variable like‘fit’ or ‘unfit’. Here the decision variable is
Categorical.

2. Regression trees (Continuous data types)


Here the decision or the outcome variable is Continuous, e.g. a number like
123. Working Now that weknow what a Decision Tree is, we’ll see how it
works internally. There are many algorithms out therewhich construct Decision
Trees, but one of the best is called as ID3 Algorithm. ID3 Stands for
IterativeDichotomiser 3.

Before discussing the ID3 algorithm, we’ll go through few definitions.


Entropy Entropy,also called as Shannon Entropy is denoted by H(S) for a finite
set S, is the measure of the amount ofuncertainty or randomness in data.

17
Intuitively, it tells us about the predictability of a certain event. Example,
consider a coin toss whose
probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the
highest possible, sincethere’s no way of determining what the outcome might
be. Alternatively, consider a coin which hasheads on both the sides, the entropy
of such an event can be predicted perfectly since we know
beforehand that it’ll always be heads. In other words, this event has no
randomness hence it’s entropyis zero. In particular, lower values imply less
uncertainty while higher values imply high
uncertainty.

Information Gain Information gain is also called as Kullback-Leibler


divergence denoted byIG(S,A) for a set S is the effective change in entropy after
deciding on a particular attribute A. Itmeasures the relative change in entropy
with respect to the independent variables

Alternatively,

where IG(S, A) is the information gain by applying feature A. H(S) is the


Entropy of the entire set, whilethe second term calculates the Entropy after
applying the feature A, where P(x) is the probability ofevent x.
Let’s understand this with the help of an example Consider a piece of data
collected over thecourse of 14 days where the features are Outlook,
Temperature, Humidity, Wind and the outcomevariable is whether Golf was
played on the day. Now, our job is to build a predictive model which takesin
18
above 4 parameters and predicts whether Golf will be played on the day. We’ll
build a decision tree
to do that using ID3 algorithm.

ID3
ID3 Algorithm will perform following tasks recursively
1. Create root node for the tree
2. If all examples are positive, return leaf node „positive‟
3. Else if all examples are negative, return leaf node „negative‟
4. Calculate the entropy of current state H(S)
5. For each attribute, calculate the entropy with respect to the attribute „x‟
denoted by H(S, x)
6. Select the attribute which has maximum value of IG(S, x)
7. Remove the attribute that offers highest IG from the set of attributes
8. Repeat until we run out of all attributes, or the decision tree has all leaf
nodes.

Now we‟ll go ahead and grow the decision tree. The initial step is to calculate
H(S), the Entropy of the current state.
In the above example, we can see in total there are 5 No‟s and 9 Yes‟s.

where „x‟ are the possible values for an attribute. Here, attribute „Wind‟ takes
two possible values in the sampledata, hence x = {Weak, Strong} we‟ll have to
calculate:
19
Amongst all the 14 examples we have 8 places where the wind is weak and 6
where the wind is Strong.

Now out of the 8 Weak examples, 6 of them were „Yes‟ for Play Golf and 2 of
them were „No‟ for „Play Golf‟. So, we have,

Similarly, out of 6 Strong examples, we have 3 examples where the outcome


was „Yes‟ for Play Golf and 3where we had „No‟ for Play Golf.

20
Remember, here half items belong to one class while other half belong to other.
Hence we have perfect randomness.Now we have all the pieces required to
calculate the Information Gain,

Which tells us the Information Gain by considering „Wind‟ as the feature and
give us information gain of 0.048.
Now we must similarly calculate the Information Gain for all the features.

We can clearly see that IG(S, Outlook) has the highest information gain of
0.246, hence we chose Outlookattribute as the root node. At this point, the
decision tree looks like.

Here we observe that whenever the outlook is Overcast, Play Golf is always
‘Yes’, it’s no coincidence byany chance, the simple tree resulted because of the
highest information gain is given by the attributeOutlook. Now how do we
proceed from this point? We can simply apply recursion, you might want
21
tolook at the algorithm steps described earlier. Now that we’ve used Outlook,
we’ve got three of themremaining Humidity, Temperature, and Wind. And, we
had three possible values of Outlook: Sunny,Overcast, Rain. Where the
Overcast node already ended up having leaf node ‘Yes’, so we’re left withtwo
subtrees to compute: Sunny and Rain.

Table where the value of Outlook is Sunny looks like:

As we can see the highest Information Gain is given by Humidity.


Proceeding in the same way with

will give us Wind as the one with highest information gain. The final Decision
Tree looks something likethis. The final Decision Tree looks something like
this.

22
Regression:
Regression Analysis in Machine learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more
independent variables. More specifically, Regression analysis helps us to
understand how the value of the dependent variable is changing corresponding
to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various


advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the corresponding
sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and
wants to know the prediction about the sales for this year. So to solve such
type of prediction problems in machinelearning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous output
variable based on the one or more predictor variables. It is mainly used for

23
prediction, forecasting, time series modeling, and determining the causal-
effectrelationship between variables.

In Regression, we plot a graph between the variables which best fits the given
data points, using this plot, the machine learning model can make predictions
about the data. In simple words, "Regressionshows a line or curve that passes
through all the data points on target-predictor graph in such a waythat the
vertical distance between the data points and the regression line is minimum."
The distance between data points and line tells whether a model has captured a
strong relationship or not.
Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:


• Dependent Variable: The main factor in Regression analysis which we
want to predict or understand is called the dependent variable. It is also
called target variable.
• Independent Variable: The factors which affect the dependent variables
or which are used to predict the values of the dependent variables are
called independent variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value
or very high value in comparison to other observed values. An outlier
may hamper the result, so it should be avoided.
• Multicollinearity: If the independent variables are highly correlated with
each other than other variables, then such condition is called

24
Multicollinearity. It should not be present in the dataset, because it
creates problem while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the
training dataset but not well with test dataset, then such problem is called
Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.

Types of Regression:

There are various types of regressions which are used in data science and
machine learning. Each type has its own importance on different scenarios, but
at the core, all the regression methods analyze the effect of the independent
variable on dependent variables.
Here we are discussing some important types of regression which are given
below:
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression

Linear Regression:
• Linear regression is a statistical regression method which is used for
predictive analysis.
• It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous variables.
• It is used for solving the regression problem in machine learning.

25
• Linear regression shows the linear relationship between the independent
variable (X-axis) and the dependent variable (Y-axis), hence called linear
regression.
• If there is only one input variable (x), then such linear regression is called
simple linear regression. And if there is more than one input variable,
then such linear regression is called multiple linear regression.
• The relationship between variables in the linear regression model can be
explained using the below image. Here we are predicting the salary of an
employee on the basis of the year of experience.

Below is the mathematical equation for Linear regression:


Y= aX+b

Here,
Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:


• Analyzing trends and sales estimates
• Salary forecasting
26
• Real estate prediction
• Arriving at ETAs in traffic.
In non-linear regression, the data can‘t be modelled to fit a straight line i.e.
modelled to fit a curve as shown in below figure.

For example the equation can have the form: XTheta


Here are some examples of real-world regression problems.
• Predict tomorrow‘s stock market price given current market conditions and
other possible side information.
• Predict the age of a viewer watching a given video on YouTube.
• Predict the location in 3d space of a robot arm end effector, given control
signals (torques) sent to its various motors.
• Predict the amount of prostate specific antigen (PSA) in the body as a
function of a number of different clinical measurements.
• Predict the temperature at any location inside a building using weather data,
time, door sensors, etc.

Classifications of linear models:


• Linear regression models the relationship between a dependent variable
and one or more explanatory variables using a linear function can be
formulated as below:
y = b0 + b1*x1
Eg. Prediction of height based on age.
Feature(X): age
Class(y): height
27
Regression equation will be: height = w*age + b

• Multiple linear regression :If two or more explanatory variables have a


linear relationship with the dependent variable, the regression is called a
multiple linear regression, which can be formulated as below:
y = b0 + b1*x1+ b2*x2 .. + bn*xn
where, y is the response variable.
a, b1, b2...bn are the coefficients.
x1, x2, ...xn are the predictor variables.
Eg. Prediction of height based on age and gender.
Features(X):age, gender
class(y): height
Regression equation will be: height = w1*age + w2*gender + b

Logistic Regression:
• Logistic regression is another supervised learning algorithm which is
used to solve the classification problems. In classification problems, we
have dependent variables in a binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as
0 or 1, Yes or No, True or False, Spam or not spam, etc.
• It is a predictive analysis algorithm which works on the concept of
probability.
• Logistic regression is a type of regression, but it is different from the
linear regression algorithm in the term how they are used.
• Logistic regression uses sigmoid function or logistic function which is a
complex cost function. This sigmoid function is used to model the data in
logistic regression. The function can be represented as:

28
f(x)= Output between the 0 and 1 value.
x= input to the function
e= base of natural logarithm.
When we provide the input values (data) to the function, it gives the S-curve as
follows:

It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:


• Binary(0/1, pass/fail)
• Multi(cats, dogs, lions)
• Ordinal(low, medium, high)

Assumptions of Linear Regression


Below are some important assumptions of Linear Regression. These are some
formal checks while building a Linear Regression model, which ensures to get
the best possible result from the given dataset.
• Linear relationship between the features and target:

29
Linear regression assumes the linear relationship between the dependent
and independent variables.
• Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent
variables. Due to multicollinearity, it may difficult to find the true
relationship between the predictors and target variables. Or we can say, it
is difficult to determine which predictor variable is affecting the target
variable and which is not. So, the model assumes either little or no
multicollinearity between the features or independent variables.
• Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be
no clear pattern distribution of data in the scatter plot.
• Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then
confidence intervals will become either too wide or too narrow, which
may cause difficulties in finding coefficients. It can be checked using the
q-q plot. If the plot shows a straight line without any deviation, which
means
the error is normally distributed.
• No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If
there will be any correlation in the error term, then it will drastically
reduce the accuracy of the model. Autocorrelation usually occurs if there
is a dependency between residual errors.

Artificial Neural Networks


Artificial neural networks (ANNs) provide a general, practical method for
learning real-valued, discrete-valued, and vector-valued functions from

30
examples.For certain types of problems, such as learning to interpret complex
real-world sensor data, artificial neural networks are among the most effective
learning methods currently known.

An artificial neural network (ANN) is like a computing system that simulates


how the human brain analyzes information and processes it. It is the branch of
artificial intelligence (AI) and solves problems that may be difficult or
impossible to solve such issues for humans. In addition, ANNs have the
potential for self-learning that provide better results if more data becomes
available.

Artificial neurons consist of the following things:


Interconnecting model of neurons. The neuron is the elementary component
connected with other neurons to form the network.
Learning algorithm to train the network. Various learning algorithms are
available in the literature to train the model. Each layer consists of neurons, and
these neurons are connected to other layers. Weight is also assigned to each
layer, and these weights are changed at each iteration for training purpose.

An artificial neural network (ANN) consists of an interconnected group of


artificial neurons that process information through input, hidden and output
layers and use a connectionist approach to computation. Neural networks use
nonlinear statistical data modeling tools to solve complex problems by finding
the complex relationships between inputs and outputs. After getting this
relation, we can predict the outcome or classify our problems.

Figure 2 describes three neurons that perform "AND" logical operations. In this
case, the output neuron will fire if both input neurons are fired. The output
neurons use a threshold value (T), T=3/2 in this case. If none or only one input
neuron is fired, then the total input to the output becomes less than 1.5 and
31
firing for output is not possible. Take another scenario where both input neurons
are firing, and the total input becomes 1+1=2, which is greater than the
threshold value of 1.5, then output neurons will fire. Similarly, we can perform
the "OR” logical operation with the help of the same architecture but set the
new threshold to 0.5. In this case, the output neurons will be fired if at least one
input is fired.

1
T=3/2

Figure 2:Three Neurons Diagram

Artificial Neural Network has a huge number of interconnected processing


elements, also known as Nodes. These nodes are connected with other nodes
using a connection link. The connection link contains weights, these weights
contain the information about the input signal. Each iteration and input in turn
leads to updation of these weights. After inputting all the data instances from
the training data set, the final weights of the Neural Network along with its
architecture is known as the Trained Neural Network. This process is called

32
Training of Neural Networks.These trained neural networks solve specific
problems as defined in the problem statement.

Types of tasks that can be solved using an artificial neural network include
Classification problems, Pattern Matching, Data Clustering, etc.

The architecture of an artificial neural network:

Input Layer: As the name suggests, it accepts inputs in several different


formats provided by the programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It
performs all the calculations to findhidden features and patterns.
Output Layer: The input goes through a series of transformations using the
hidden layer, which finally results in outputthat is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the
inputs and includes a bias. This computation is represented in the form of a
transfer function.

33
It determines weighted total is passed as an input to an activation function to
produce the output. Activation functions choose whether a node should fire or
not. Only those who are fired make it to the output layer. There are distinctive
activation functions available that can be applied upon the sort of task we are
performing.

Advantages of Artificial Neural Network (ANN)


• Parallel processing capability: Artificial neural networks have a
numerical value that can perform more than one task simultaneously.
• Storing data on the entire network: Data that is used in traditional
programming is stored on the whole network, not on a database.
Thedisappearance of a couple of pieces of data in one place doesn't
prevent the network from working.
• Capability to work with incomplete knowledge: After ANN training,
the information may produce output even with inadequate data. The loss
ofperformance here relies upon the significance of missing data.
• Having a memory distribution: For ANN is to be able to adapt, it is
important to determine the examples and to encourage the
networkaccording to the desired output by demonstrating these examples
to the network. The succession of the network isdirectly proportional to
the chosen instances, and if the event can't appear to the network in all its
aspects, it canproduce false output.
• Having fault tolerance: Extortion of one or more cells of ANN does not
prohibit it from generating output, and this feature makesthe network
fault-tolerance.

Disadvantages of Artificial Neural Network:


• Assurance of proper network structure: There is no particular
guideline for determining the structure of artificial neural networks. The

34
appropriate network structure is accomplished through experience, trial,
and error.
• Unrecognized behavior of the network: It is the most significant issue
of ANN. When ANN produces a testing solution, it does not provide
insightconcerning why and how. It decreases trust in the network.
• Hardware dependence: Artificial neural networks need processors with
parallel processing power, as per their structure. Therefore,the realization
of the equipment is dependent.
• Difficulty of showing the issue to the network: ANNs can work with
numerical data. Problems must be converted into numerical values before
beingintroduced to ANN. The presentation mechanism to be resolved
here will directly impact the performance of thenetwork. It relies on the
user's abilities.
• The duration of the network is unknown: The network is reduced to a
specific value of the error, and this value does not give us optimum
results.
“Science artificial neural networks that have steeped into the world in the
mid-20th century are exponentially developing. In the present time, we
have investigated the pros of artificial neural networks and the issues
encountered in the course of their utilization. It should not be overlooked
that the cons of ANN networks, which area flourishing science branch,
are eliminated individually, and their pros are increasing day by day. It
means thatartificial neural networks will turn into an irreplaceable part
of our lives progressively important.”

Types of Artificial Neural Network:


There are various types of Artificial Neural Networks (ANN) depending upon
the human brain neuron and network functions, an artificial neural network
similarly performs tasks. The majority of the artificial neural networks will have

35
some similarities with a more complex biological partner and are very effective
at their expected tasks. For example, segmentation or classification.
• Feedback ANN: In this type of ANN, the output returns into the network
to accomplish the best-evolved results internally.As per the University of
Massachusetts, Lowell Centre for Atmospheric Research. The feedback
networks feedinformation back into itself and are well suited to solve
optimization issues. The Internal system error correctionsutilize feedback
ANNs.
• Feed-Forward ANN: A feed-forward network is a basic neural network
comprising of an input layer, an output layer, and at leastone layer of a
neuron. Through assessment of its output by reviewing its input, the
intensity of the network can benoticed based on group behavior of the
associated neurons, and the output is decided. The primary advantage of
this network is that it figures out how to evaluate and recognize input
patterns.
• Prerequisite
No specific expertise is needed as a prerequisite before starting this
tutorial.
• Audience
Our Artificial Neural Network Tutorial is developed for beginners as well
as professionals, to help them understand the basic concept of ANNs.

Perceptron – Single Artificial Neuron: An artificial neuron has multiple input


channels to accept training samples represented as a vector, and a processing
stage where the weights(w) are adjusted such that the output error (actual vs.
predicted) is minimized. Then the result is fed into an activation function to
produce output, for example, a classification label. The activation function for a
classification problem is a threshold cutoff (standard is .5) above which class is
1 else 0.
36
A drawback of the single perceptron approach is that it can only learn linearly
separable functions

Multilayer Perceptrons (Feedforward Neural Network)


To address the drawback of single perceptrons, multilayer perceptrons were
proposed; also commonly known as a feedforward neural network, it is a
composition of multiple perceptrons connected in different ways and operating
on distinctive activation functions to enable improved learning mechanisms.
The training sample propagates forward through the network and the output
error is back propagated and the error is minimized using the gradient descent
method, which will calculate a loss function for all the weights in the network.

37
The activation function for a simple one-level hidden layer of a multilayer
perceptron can be given by:

A multilayered neural network can have many hidden layers, where the network
holds its internal abstract representation of the training sample. The upper layers
will be building new abstractions on top of the previous layers. So having more
hidden layers for a complex dataset will help the neural network to learn better.
Figure above, the MLP architecture has a minimum of three layers, that is,
input, hidden, and output layers. The input layer’s neuron count will be equal to
the total number of features and in some libraries an additional neuron for
intercept/bias. These neurons are represented as nodes. The output layers will
have a single neuron for regression models and binary classifier; otherwise it
will be equal to the total number of class labels for multiclass classification
models.

using too few neurons for a complex dataset can result in an under-fitted model
due to the fact that it might fail to learn the patterns in complex data. However,
using too many neurons can result in an over-fitted model as it has capacity to
capture patterns that might be noise or specific for the given training dataset.

SIGMOID NEURONS: The activation function is a mathematical function that


decides the threshold value for a neuron, it may be linear or nonlinear. The
purpose of an activation function is to add non-linearity to the neural network. If
you have a linear activation function, then the number of hidden layers does
matter, and the final output remains a linear combination of the input data.
However, this linearity cannot help solving complex problems like patterns
separated by curves where nonlinear activation is required.

38
Moreover, the activation function does not have a helpful derivative as its
derivative is 0 everywhere. Therefore, it doesn't work for Backpropagation, a
fundamental and valuable concept in multilayer perceptron.
The most popular neural networks activation functions.
Binary Step Function: Binary step function depends on a threshold value that
decides whether a neuron should be activated or not. The input fed to the
activation function is compared to a certain threshold; if the input is greater than
it, then the neuron is activated, else it is deactivated, meaning that its output is
not passed on to the next hidden layer.

Mathematically it can be represented as:

Here are some of the limitations of binary step function:


• It cannot provide multi-value outputs—for example, it cannot be used for
multi-class classification problems.
• The gradient of the step function is zero, which causes a hindrance in the
backpropagation process.
The idea of step function/Activation will be clear from this paragraph. For
example, we have a perceptron with an activation function that isn't very
"stable" as a relationship candidate.
For example, say some person has bipolar issues. One day ( z< 0), s/he behaves
with no responses as s/he is quiet, and on the second day ( z ≥ 0), s/he changes
the mood and becomes very talkative, and speaks non-stop in front of you.
There is s no transition for the spirit, and you don't know the behavior when s/he
39
will be quiet or talking. In such cases, we have a nonlinear step function that
helps.
So, minor changes in the weight of the input layer of our model may activate the
neuron by flipping from 0 to 1, which impacts the working of the hidden layer's
working, and then the outcome may affect. Therefore, we want a model that
enhances our exiting neural network by adjusting the weights. However, it is not
possible by a linear activation function. If we don't have such activation
functions, this task cannot be accomplished by simply changing the weights.

Autoencoders:As the name suggests, an autoencoder aims to learn encoding as


a representation of training sample data automatically without human
intervention. The autoencoder is widely used for dimensionality reduction and
data de-nosing.

Building an autoencoder will typically have three elements.


• Encoding function to map input to a hidden representation through
a nonlinear function, z = sigmoid (Wx + b).
• A decoding function such as x’ = sigmoid(W’y + b’), which will
map back into reconstruction x’ with same shape as x.

40
• A loss function, which is a distance function to measure the
information loss between the compressed representation of data
and the decompressed representation. Reconstruction error can be
measured using traditional squared error ||x-z||2 .

Convolution Neural Network (CNN)


CNN’s are similar to ordinary neural networks, except that it explicitly assumes
that the inputs are images, which allows us to encode certain properties into the
architecture. These then make the forward function efficient to implement and
reduces the parameters in the network. The neurons are arranged in three
dimensions: width, height, and depth.

CNN on CIFAR10 Dataset


Let’s consider CIFAR-10 (Canadian Institute for Advanced Research), which is
a standard computer vision and deep learning image dataset. It consists of
60,000 color photos of 32 by 32 pixel squared with RGB for each pixel, divided
into 10 classes, which include common objects such as airplanes, automobiles,
birds, cats, deer, dog, frog, horse, ship, and truck. Essentially each image is of
size 32x32x3 (width x height x RGB color channels).

CNN consists of four main types of layers: input layer, convolution layer,
pooling layer, fully connected layer.

The input layer will hold the raw pixel, so an image of CIFAR-10 will have
32x32x3 dimensions of input layer. The convolution layer will compute a dot
product between the weights of small local regions from the input layer, so if
we decide to have 5 filters the resulted reduced dimension will be 32x32x5. The
41
RELU layer will apply an element-wise activation function that will not affect
the dimension. The Pool layer will down sample the spatial dimension along
width and height, resulting in dimension 16x16x5. Finally, the fully connected
layer will compute the class score, and the resulted dimension will be a single
vector 1x1x10 (10 class scores). Each neural in this layer is connected to all
numbers in the previous volume.

Recurrent Neural Network (RNN)


The MLP (feedforward network) is not known to do well on sequential events
models such as the probabilistic language model of predicting the next word
based on the previous word at every given point. RNN architecture addresses
this issue. It is similar to MLP except that they have a feedback loop, which
means they feed previous time steps into the current step. This type of
architecture generates sequences to simulate situation and create synthetic data,
making them the ideal modeling choice to work on sequence data such as
speech text mining, image captioning, time series prediction, robot control,
language modeling, etc.

42
The previous step’s hidden layer and final outputs are fed back into the network
and will be used as input to the next steps’ hidden layer, which means the
network will remember the past and it will repeatedly predict what will happen
next. The drawback in the general RNN architecture is that it can be memory
heavy, and hard to train for longterm temporal dependency (i.e., context of long
text should be known at any given stage).

Long Short-Term Memory (LSTM)


LSTM is an implementation of improved RNN architecture to address the
issues of general RNN, and it enables long-range dependencies. It is designed to
have better memory through linear memory cells surrounded by a set of gate
units used to control the flow of information, when information should enter the
memory, when to forget, and when to output. It uses no activation function
within its recurrent components, thus the gradient term does not vanish with
back propagation.

LSTM components are:

43
• Input gate layer: This decides which values to store in the cell state
• Forget gate layer: As the name suggested this decides what information
to throw away from the cell state
• Output gate layer: Create a vector of values that can be added to the cell
state.
• Memory cell state vector

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put the
new data point in the correct category in the future. This best decision boundary
is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which there
are two different categories that are classified using a decision boundary or
hyperplane:

Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
44
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange creature.
So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme
case of cat and dog. On the basis of the support vectors, it will classify it as a
cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the


classes in n-dimensional space, but we need to find out the best decision

45
boundary that helps to classify the data points. This best boundary is known as
the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be
a straight line. And if there are 3 features, then hyperplane will be a 2-
dimension plane.

We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.

Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed as
Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.

Linear SVM:The working of the SVM algorithm can be understood by using


an example. Suppose we have a dataset that has two tags (green and blue), and
the dataset has two features x1 and x2. We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:

46
Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane. SVM algorithm finds the
closest point of the lines from both the classes. These points are called support
vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin.
The hyperplane with maximum margin is called the optimal hyperplane.

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below
image:

47
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add
a third dimension z. It can be calculated as:z=x2 +y2

By adding the third dimension, the sample space will become as below image:

So now, SVM will divide the datasets into classes in the following way.
Consider the below image: If we convert it in 2d space with z=1, then it will
become as:

Hence we get a circumference of radius 1 in case of non-linear data.

48
Unsupervised Machine Learning:

Introduction to clustering

As the name suggests, unsupervised learning is a machine learning technique in


which modelsare not supervised using training dataset. Instead, models itself
find the hidden patterns and insightsfrom the given data. It can be compared to
learning which takes place in the human brain while learningnew things. It can
be defined as:
“Unsupervised learning is a type of machine learning in which models are
trained using
unlabeled dataset and are allowed to act on that data without any supervision.”

Unsupervised learning cannot be directly applied to a regression or


classification problem
because unlike supervised learning, we have the input data but no corresponding
output data. The goalof unsupervised learning is to find the underlying
structure of dataset, group that data according tosimilarities, and represent
that dataset in a compressed format

Example: Suppose the unsupervised learning algorithm is given an input


dataset containing images ofdifferent types of cats and dogs. The algorithm is
never trained upon the given dataset, which means itdoes not have any idea
about the features of the dataset. The task of the unsupervised learningalgorithm
is to identify the image features on their own. Unsupervised learning algorithm
will performthis task by clustering the image dataset into the groups according
to similarities between images.

49
Why use Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised
Learning:
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their
own experiences,which makes it closer to the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which
make unsupervisedlearning more important.
• In real-world, we do not always have input data with the corresponding
output so to solve suchcases, we need unsupervised learning.

Working of Unsupervised Learning

Working of unsupervised learning can be understood by the below diagram:

50
Here, we have taken an unlabeled input data, which means it is not categorized
andcorresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learningmodel in order to train it. Firstly, it will interpret the
raw data to find the hidden patterns from the dataand then will apply suitable
algorithms such as k-means clustering, Decision tree, etc.Once it applies the
suitable algorithm, the algorithm divides the data objects into groups according
tothe similarities and difference between the objects.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types

of problems:

• Clustering: Clustering is a method of grouping the objects into clusters


such that objects withmost similarities remains into a group and has less
or no similarities with the objects of anothergroup. Cluster analysis finds
51
the commonalities between the data objects and categorizes themas per
the presence and absence of those commonalities.
• Association: An association rule is an unsupervised learning method
which is used for findingthe relationships between variables in the large
database. It determines the set of items thatoccurs together in the dataset.
Association rule makes marketing strategy more effective. Suchas people
who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. Atypical example of Association rule is Market Basket
Analysis.

Unsupervised Learning algorithms:


Below is the list of some popular unsupervised learning algorithms:
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition

Advantages of Unsupervised Learning


• Unsupervised learning is used for more complex tasks as compared to
supervised learning because, in unsupervised learning, we don't have
labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in
comparison to labeled data.

Disadvantages of Unsupervised Learning


52
• Unsupervised learning is intrinsically more difficult than supervised
learning as it does not have corresponding output.
• The result of the unsupervised learning algorithm might be less accurate
as input data is not labeled, and algorithms do not know the exact output
in advance.

53
K-Mean Clustering
k-means clustering algorithm
One of the most used clustering algorithm is k-means. It allows to group the
data according to the existing similarities among them in k clusters, given as
input to the algorithm. I’ll start with a simple example.

Let’s imagine we have 5 objects (say 5 people) and for each of them we know
two features (height and weight). We want to group them into k=2 clusters.

Our dataset will look like this:

First of all, we have to initialize the value of the centroids for our clusters. For
instance, let’s choose Person 2 and Person 3 as the two centroids c1 and c2, so
that c1=(120,32) and c2=(113,33).

Now we compute the euclidian distance between each of the two centroids and
each point in the data. If you did all the calculations, you should have come up
with the following numbers:

At this point, we will assign each object to the cluster it is closer to (that is
taking the minimum between the two computed distances for each object).

54
We can then arrange the points as follows:
Person 1 → cluster 1
Person 2 → cluster 1
Person 3 → cluster 2
Person 4 → cluster 1
Person 5→ cluster 2

Let’s iterate, which means to redefine the centroids by calculating the mean of
the members of each of the two clusters.
So c’1 = ((167+120+175)/3, (55+32+76)/3) = (154, 54.3) and c’2 =
((113+108)/2, (33+25)/2) = (110.5, 29)

Then, we calculate the distances again and re-assign the points to the new
centroids. We repeat this process until the centroids don’t move anymore (or the
difference between them is
under a certain small threshold).

In our case, the result we get is given in the figure below. You can see the two
different clusters labeled with two different colours and the position of the
centroids, given by the crosses.

How to apply k-means?

55
As you probably already know, I’m using Python libraries to analyze my data.
The k-means algorithm is implemented in the scikit-learn package. To use it,
you will just need the following line in your script:

What if our data is non-numerical?


At this point, you will maybe have noticed something. The basic concept of k-
means stands on mathematical calculations (means, euclidian distances). But
what if our data is non-numerical or, in other words, categorical? Imagine, for
instance, to have the ID code and date of birth of the five people of the previous
example, instead of their heights and weights.
We could think of transforming our categorical values in numerical values and
eventually apply kmeans. But beware: k-means uses numerical distances, so it
could consider close two really distant objects that merely have been assigned
two close numbers.

k-modes is an extension of k-means. Instead of distances it uses dissimilarities


(that is,
quantification of the total mismatches between two objects: the smaller this
number, the more similar the two objects). And instead of means, it uses modes.
A mode is a vector of elements that minimizes the dissimilarities between the
vector itself and each object of the data. We will have as many modes as the
number of clusters we required, since they act as centroids.

APPLICATIONS OF AI

Natural Language Processing:

NLP stands for Natural Language Processing, which is a part of Computer


Science, Human language, and Artificial Intelligence. It is the technology that
is used by machines to understand, analyse, manipulate, and interpret human's
languages. It helps developers to organize knowledge for performing tasks such
56
as translation, automatic summarization, Named Entity Recognition
(NER), speech recognition, relationship extraction, and topic segmentation.

Components of NLP

There are the following two components of NLP –

1. Natural Language Understanding (NLU):Natural Language


Understanding (NLU) helps the machine to understand and analyse
human language by extracting the metadata from content such as
concepts, entities, keywords, emotion, relations, and semantic roles.

NLU mainly used in Business applications to understand the customer's


problem in both spoken and written language.

NLU involves the following tasks -


o It is used to map the given input into useful representation.
o It is used to analyze different aspects of the language.

2. Natural Language Generation (NLG):Natural Language Generation


(NLG) acts as a translator that converts the computerized data into natural
language representation. It mainly involves Text planning, Sentence
planning, and Text Realization.

Phases of NLP
There are the following five phases of NLP:

57
1. Lexical Analysis and Morphological: The first phase of NLP is the Lexical
Analysis. This phase scans the source code as a stream of characters and
converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.

2. Syntactic Analysis (Parsing): Syntactic Analysis is used to check grammar,


word arrangements, and shows the relationship among the words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this
sentence is rejected by the Syntactic analyzer.

3. Semantic Analysis: Semantic analysis is concerned with the meaning


representation. It mainly focuses on the literal meaning of words, phrases, and
sentences.

4. Discourse Integration: Discourse Integration depends upon the sentences


that proceeds it and also invokes the meaning of the sentences that follow it.

5. Pragmatic Analysis: Pragmatic is the fifth and last phase of NLP. It helps
you to discover the intended effect by applying a set of rules that characterize
cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.

58
Applications of NLP:

1. Question Answering: Question Answering focuses on building systems that


automatically answer the questions asked by humans in a natural language.

2. Spam Detection: Spam detection is used to detect unwanted e-mails getting


to a user's inbox.

3. Sentiment Analysis: Sentiment Analysis is also known as opinion mining. It


is used on the web to analyse the attitude, behaviour, and emotional state of the
sender. This application is implemented through a combination of NLP (Natural
Language Processing) and statistics by assigning the values to the text (positive,
negative, or natural), identify the mood of the context (happy, sad, angry, etc.)

4. Machine Translation: Machine translation is used to translate text or speech


from one natural language to another natural language.Example: Google
Translator

5. Spelling correction: Microsoft Corporation provides word processor


software like MS-word, PowerPoint for the spelling correction.

6. Speech Recognition: Speech recognition is used for converting spoken


words into text. It is used in applications, such as mobile, home automation,
video recovery, dictating to Microsoft Word, voice biometrics, voice user
interface, and so on.

7. Chatbot: Implementing the Chatbot is one of the important applications of


NLP. It is used by many companies to provide the customer's chat services.

8. Information extraction: Information extraction is one of the most important


applications of NLP. It is used for extracting structured information from
unstructured or semi-structured machine-readable documents.

9. Natural Language Understanding (NLU): It converts a large set of text into


more formal representations such as first-order logic structures that are easier
for the computer programs to manipulate notations of the natural language
processing.

59
Sample of NLP Preprocessing Techniques
• Tokenization: Tokenization splits raw text (for example., a sentence or a
document) into a sequence of tokens, such as words or subword pieces.
Tokenization is often the first step in an NLP processing pipeline. Tokens
are commonly recurring sequences of text that are treated as atomic units
in later processing. They may be words, subword units called morphemes
(for example, prefixes such as “un-“ or suffixes such as “-ing” in
English), or even individual characters.
• Bag-of-words models: Bag-of-words models treat documents as
unordered collections of tokens or words (a bag is like a set, except that it
tracks the number of times each element appears). Because they
completely ignore word order, bag-of-words models will confuse a
sentence such as “dog bites man” with “man bites dog.” However, bag-
of-words models are often used for efficiency reasons on large
information retrieval tasks such as search engines. They can produce
close to state-of-the-art results with longer documents.
• Stop word removal: A “stop word” is a token that is ignored in later
processing. They are typically short, frequent words such as “a,” “the,” or
“an.” Bag-of-words models and search engines often ignore stop words in
order to reduce processing time and storage within the database. Deep
neural networks typically do take word-order into account (that is, they
are not bag-of-words models) and do not do stop word removal because
stop words can convey subtle distinctions in meaning (for example, “the
package was lost” and “a package is lost” don’t mean the same thing,
even though they are the same after stop word removal).
• Stemming and lemmatization: Morphemes are the smallest meaning-
bearing elements of language. Typically morphemes are smaller than
words. For example, “revisited” consists of the prefix “re-“, the stem
“visit,” and the past-tense suffix “-ed.” Stemming and lemmatization map
words to their stem forms (for example, “revisit” + PAST). Stemming
and lemmatization are crucial steps in pre-deep learning models, but deep
learning models generally learn these regularities from their training data,
and so do not require explicit stemming or lemmatization steps.
• Part-of-speech tagging and syntactic parsing: Part-of-speech (PoS)
tagging is the process of labeling each word with its part of speech (for
example, noun, verb, adjective, etc.). A Syntactic parser identifies how
words combine to form phrases, clauses, and entire sentences. PoS
60
tagging is a sequence labeling task, syntactic parsing is an extended kind
of sequence labeling task, and deep neural Nntworks are the state-of-the-
art technology for both PoS tagging and syntactic parsing. Before deep
learning, PoS tagging and syntactic parsing were essential steps in
sentence understanding. However, modern deep learning NLP models
generally only benefit marginally (if at all) from PoS or syntax
information, so neither PoS tagging nor syntactic parsing are widely used
in deep learning NLP.

Text classification and Information Retrieval :

Text Classification:
Text Classification, also known as text categorization or text tagging, is a
technique used in machine learning and artificial intelligence to automatically
categorize text into predefined classes or categories. It involves training a model
on a labeled dataset, where each text example is associated with a specific class
or category. The trained model can then be used to classify new, unseen texts
into the appropriate categories.

Text Classification Workflow

The text classification process involves several steps, from data collection to
model deployment. Here is a quick overview of how it works:

Step1: Data collection


Collect a set of text documents with their corresponding categories for the text
labeling process. Gathering data is the most important step in solving any
supervised machine learning problem. Your text classifier can only be as good
as the dataset it is built from.
Here are some important things to remember when collecting data:
• The more training examples (referred to as samples in the rest of this
guide) you have, the better. This will help your model generalize better.
• Make sure the number of samples for every class or topic is not overly
imbalanced. That is, you should have comparable number of samples in
each class.
61
• Make sure that your samples adequately cover the space of possible
inputs, not only the common cases.

Step 2: Data preprocessing


Clean and prepare the text data by removing unnecessary symbols, converting
to lowercase, and handling special characters such as
punctuation.Understanding the characteristics of your data beforehand will
enable you to build a better model. This could simply mean obtaining a higher
accuracy. It’s good practice to run some checks on it: pick a few samples and
manually check if they are consistent with your expectations.

Step3:Tokenization
Break the text apart into tokens, which are small units like words. Tokens help
find matches and connections by creating individually searchable parts. This
step is especially useful for vector search and semantic search, which give
results based on user intent.

Step 4: Feature extraction


Convert the text into numerical representations that machine learning models
can understand. Some common methods include counting the occurrences of
words (also known as Bag-of-Words) or using word embeddings to capture
word meanings. When we convert all of the texts in a dataset into tokens, we
may end up with tens of thousands of tokens. Not all of these tokens/features
contribute to label prediction. So we can drop certain tokens, for instance those
that occur extremely rarely across the dataset. We can also measure feature
importance (how much each token contributes to label predictions), and only
include the most informative tokens.

Step5:Build, Train and evaluate model:


Now that the data is clean and preprocessed, you can use it to train a machine
learning model. The model will learn patterns and associations between the
text’s features and their categories. This helps it understand the text labeling
conventions using the pre-labeled examples.Training involves making a
prediction based on the current state of the model, calculating how incorrect the
prediction is, and updating the weights or parameters of the network to
62
minimize this error and make the model predict better. We repeat this process
until our model has converged and can no longer learn. There are three key
parameters to be chosen for this process

• Metric: How to measure the performance of our model using a metric.


We used accuracy as the metric in our experiments.
• Loss function: A function that is used to calculate a loss value that the
training process then attempts to minimize by tuning the network
weights. For classification problems, cross-entropy loss works well.
• Optimizer: A function that decides how the network weights will be
updated based on the output of the loss function. We used the popular
Adam optimizer in our experiments

Step 6: Text labeling


Create a new, separate dataset to start text labeling and classifying new text. In
the text labeling process, the model separates the text into the predetermined
categories from the data collection step.

Step 8: Hyperparameter tuning


Depending on how the model evaluation goes, you may want to adjust the
model's settings to optimize its performance. We had to choose a number of
hyperparameters for defining and training the model. We relied on intuition,
examples and best practice recommendations. Our first choice of
hyperparameter values, however, may not yield the best results. It only gives us
a good starting point for training. Every problem is different and tuning these
hyperparameters will help refine our model to better represent the particularities
of the problem at hand.
some of the hyperparameters :
o Number of layers in the model: The number of layers in a neural
network is an indicator of its complexity.
o Number of units per layer: The units in a layer must hold the
information for the transformation that a layer performs.
o Dropout rate: Dropout layers are used in the model

Step 9: Model deployment


Use the trained and tuned model to classify new text data into their appropriate
categories.

63
Workflow for solving machine learning problems

The Most Important Text Classification Use Cases

Text Classification finds applications in various domains and industries. Some


of the most important use cases include:

• Sentiment Analysis: Determining whether a customer review or social


media post is positive, negative, or neutral.
• Document Classification: Classifying news articles, research papers, or
legal documents into relevant topics or categories.
• Spam Detection: Identifying and filtering out spam emails, messages, or
comments.
• Intent Classification: Classifying customer queries or requests to
provide relevant responses or route them to the appropriate department.
• News Categorization: Organizing news articles into categories such as
sports, politics, entertainment, etc.

Information Retrieval:
Information retrieval is defined as a completely automated procedure that
answers to a user query by reviewing a group of documents and producing a
sorted document list that ought to be relevant to the user's query criteria. As a
result, it is a collection of algorithms that improves the relevancy of presented
materials to searched queries. In other words, it sorts and ranks content
according to a user's query. There is consistency in the query and content in the
document to provide document accessibility.

Information retrieval model:

A retrieval model (IR) chooses and ranks relevant pages based on a user's query.
Document selection and ranking can be formalized using matching functions
64
that return retrieval status values (RSVs) for each document in a collection since
documents and queries are written in the same way. The majority of IR systems
portray document contents using a collection of descriptors known as words
from a vocabulary V.

The query-document matching function in an IR model is defined in the


following ways:

• The estimation of the likelihood of user relevance for each page and
query in relation to a collection of q training documents.
• In a vector space, the similarity function between queries and documents
is computed.

Types Of Information Retrieval Models


1. Classic IR Model
It is the most basic and straightforward IR model. This paradigm is
founded on mathematical information that was easily recognized and
comprehended. The three traditional IR models are Boolean, Vector, and
Probabilistic.

2. Non-Classic IR Model
It is diametrically opposed to the traditional IR model. Addition than
probability, similarity, and Boolean operations, such IR models are based
on other ideas. Non-classical IR models include situation theory models,
information logic models, and interaction models.

3. Alternative IR Model
It is an improvement to the traditional IR model that makes use of some
unique approaches from other domains. Alternative IR models include
fuzzy models, cluster models, and latent semantic indexing (LSI) models.

Importance of Information Retrieval System

As computing power grows and storage costs fall, the quantity of data we deal
with on a daily basis grows tremendously. However, without a mechanism to
obtain and query the data, the information we collect is useless. Information
retrieval system is critical for making sense of data. Consider how difficult it
65
would be to discover information on the Internet without Google or other search
engines. Without information retrieval methods, information is not knowledge.

Text indexing and retrieval systems may index data in these data repositories
and allow users to search against it. Thus, retrieval systems provide users with
online access to information that they may not be aware of, and they are not
required to know or care about where the information is housed. Users can query
all information that the administrator has decided to index with a single search.

Speech Recognition

Speech recognition, also called automatic speech recognition (ASR), computer


speech recognition, or speech-to-text, is a form of artificial intelligence and
refers to the ability of a computer or machine to interpret spoken words and
translate them into text.

Working of Speech recognition technology:

Speech recognition is a software technology fueled by cutting-edge solutions


like Natural Language Processing (NLP) and Machine Learning (ML). NLP, an
AI system that analyses natural human speech, is sometimes referred to as
human language processing. The vocal data is first transformed into a digital
format that can be processed by computer software. Then, the digitized data is
subjected to additional processing using NLP, ML, and deep learning
techniques. Consumer products like smartphones, smart homes, and other voice-
activated solutions can employ this digitized speech.

A piece of software operates four procedures to convert the audio that a


microphone records into text that both computers and people can understand:

1. Analyze the audio;


2. Separate it into sections;
3. Create a computer-readable version of it using digitization, and
4. The most appropriate text representation should be found using an
algorithm.

66
Due to how context-specific and extremely varied human speech is, voice
recognition algorithms must adjust. Different speech patterns, speaking styles,
languages, dialects, accents, and phrasings are used to train the software
algorithms that process and organize audio into text. The software also
distinguishes speech sounds from the frequently present background noise.

Speech recognition systems utilize one of two types of models to satisfy these
requirements:

• Good modeling. These illustrate the connection between speech


linguistics and audio signals.
• Language layouts. Here, word sequences and sounds are matched to
identify words with similar sounds.

Examples of Speech Recognition AI

• Digital assistants with voice recognition: These include functions on


smartphones and computers like Siri, Alexa, and Cortana. These voice-
activated devices consult many databases and digital sources to respond
to commands or provide answers. As a result, these digital assistants have
changed how users engage with their gadgets.
• Speech Recognition Solutions In Banking: Voice recognition assists
banking clients with inquiries and provides information on account
balances, transactions, and payments. It can raise consumer loyalty and
satisfaction with care.
• Voice Recognition In Medical :Healthcare frequently necessitates swift
judgment calls and actions. Healthcare is delivered more quickly and
effectively when it can be directed verbally, freeing up the hands of
medical staff. Fewer documents are required. Access to health records is
simple. Reminders for appointments might be given to nursing personnel.
It can facilitate the management of hospital bedding. It can enhance
patient data entry and alter how healthcare services are delivered.
• Call centers: Customer support call centers often use speech recognition
systems to automate customer interactions. The systems analyze speech
input and respond to customer requests, providing more time for human
agents to deal with complex issues.
• Navigation systems: Speech recognition software is often used in
navigation systems and allows drivers to give voice commands to vehicle
67
devices, like car radios, while keeping their eyes on the road and hands
on the wheel.

Difficulties faced by speech recognition AI

Although voice recognition has many uses and advantages, there are also many
difficulties because of the intricacy of the software.

• Lack of speech standards


Due to the lack of speech standards, speech recognition is made more
difficult because each person speaks differently depending on their
region, age, gender, and native tongue. For African Americans and
French speakers who might need to become more accustomed to the
conventional form of English, this can result in recognition problems. To
ensure an equal development process, voice recognition technology
creators should consider this and openly publish their progress.
• The different contexts in which speech is used
The accuracy of voice recognition can be affected by the environment in
which it is employed. For example, speech recognition AI accuracy is
frequently lower than reading aloud in spontaneous speech. The machine
searches for more straightforward, probable rules while recognizing
sounds. Therefore, we must take neural networks into account if we are to
improve voice recognition accuracy.
• Different accents and pronunciations of words
Speech recognition AI technology can be affected by different accents
and word pronunciations, including making it harder to understand what
is being said, changing sound patterns, and reducing accuracy rates for
specific users.

Speech recognition algorithms:

Multiple speech recognition algorithms and computation techniques work in a


hybrid approach and help convert spoken language into text and ensure output
accuracy. Here are the three main algorithms that ensure the precision of the
transcript:
• Hidden Markov model (HMM). HMM is an algorithm that handles
speech diversity, such as pronunciation, speed, and accent. It provides a

68
simple and effective framework for modeling the temporal structure of
audio and voice signals and the sequence of phonemes that make up a
word. For this reason, most of today’s speech recognition systems are
based on an HMM.
• Dynamic time warping (DTW). DTW is used to compare two separate
sequences of speech that are different in speed. For example, you have
two audio recordings of someone saying “good morning” – one slow, one
fast. In this case, the DTW algorithm can sync the two recordings, even
though they differ in speed and length.
• Artificial neural networks (ANN). ANN is a computational model used
in speech recognition applications that helps computers understand
spoken human language. It uses deep learning techniques and basically
imitates the patterns of how neural networks work in the human brain,
which allows the computer to make decisions in a human-like manner.

Differences between speech recognition and voice recognition

o Speech recognition refers to the process of a computer recognizing,


understanding, and transcribing speech into readable written text. This
technology is used in different professional fields and our daily lives and
facilitates the process of dictation, transcription, or natural language
processing.
o Voice recognition, on the other hand, converts voice into digital data
based on the user’s unique voice characteristics. This technology is a
biometric system used to verify a person’s identity by analyzing the
unique features of their voice, such as pitch, tone, and rhythm. Voice
recognition is often used for security and personal authentication, such as
unlocking a mobile device or accessing systems.

Image Processing And Computer Vision:

Image Processing

Image processing is a methodology used to process of converting an image to a


digital format and then performing various operations on it to gather useful
information. Image processing is one of the most growing set of technologies.

69
Theelements of image processing
AI image processing works through a combination of advanced algorithms,
neural networks, and data processing to analyze, interpret, and manipulate
digital images. Here's a simplified overview of how AI image processing works:

Acquisition of image: The initial level begins with image pre-processing which
uses a sensor to capture the image and transform it into a usable format.

Enhancement of image: Image enhancement is the technique of bringing out


and emphasising specific interesting characteristics which are hidden in an
image.

Restoration of image: Image restoration is the process of enhancing an image's


look. Picture restoration, as opposed to image augmentation, is carried out
utilising specific mathematical or probabilistic models.

Colour image processing: A variety of digital colour modelling approaches


such as HSI (Hue-Saturation-Intensity), CMY (Cyan-Magenta-Yellow) and
RGB (Red-Green-Blue) etc. are used in colour picture processing.

Compression and decompression of image:This enables adjustments to image


resolution and size, whether for image reduction or restoration depending on the
situation, without lowering image quality below a desirable level. Lossy and
lossless compression techniques are the two main types of image file
compression which are being employed in this stage.

Morphological processing:Digital images are processed depending on their


shapes using an image processing technique known as morphological
70
operations. The operations depend on the pixel values rather than their
numerical values, and well suited for the processing of binary images. It aids in
removing imperfections for structure of the image.

Segmentation, representation and description: The segmentation process


divides a picture into segments, and each segment is represented and described
in such a way that it can be processed further by a computer. The image's
quality and regional characteristics are covered by representation. The
description's job is to extract quantitative data that helps distinguish one class of
items from another.

Recognition of image: A label is given to an object through recognition based


on its description. Some of the often-employed algorithms in the process of
recognising images include the Scale-invariant Feature Transform (SIFT), the
Speeded Up Robust Features (SURF), and the PCA (Principal Component
Analysis)

Challenges in AI Image Processing

• Data Privacy and Security: The reliance on vast amounts of data raises
concerns about privacy and security. Handling sensitive visual
information, such as medical images or surveillance footage, demands
robust safeguards against unauthorized access and misuse.
• Bias: AI image processing models can inherit biases present in training
data, leading to skewed or unfair outcomes. Striving for fairness and
minimizing bias is crucial, especially when making decisions that impact
individuals or communities.
• Robustness and Generalization: Ensuring that AI models perform
reliably across different scenarios and environments is challenging.
Models need to be robust enough to handle variations in lighting,
weather, and other real-world conditions.
• Interpretable Results: While AI image processing can deliver
impressive results, understanding why a model makes a certain prediction
remains challenging. Explaining complex decisions made by deep neural
networks is an ongoing area of research.

Applications:

71
1. Medical Imaging: AI-powered image processing is used in medical
imaging to detect and diagnose diseases such as cancer, Alzheimer’s, and
heart disease. It can also be used to monitor the progression of a disease
and to evaluate the effectiveness of treatment .
2. Surveillance and Security: AI algorithms can be used to analyze images
from surveillance cameras to detect suspicious behavior, identify
individuals, and track their movements. This can help prevent crime and
improve public safety
3.
Industrial Inspection: AI-powered image processing can be used to
inspect products on assembly lines to detect defects and ensure quality
control. It can also be used to monitor equipment and detect potential
problems before they cause downtime
4.
Scientific Image Analysis: AI algorithms can be used to analyze images
from scientific experiments to extract meaningful data. For example, AI
can be used to analyze images of cells to detect changes in their shape,
size, and structure
5.
Photo Editing: AI-powered image processing can be used to enhance
photos by removing noise, adjusting brightness and contrast, and
improving color accuracy. It can also be used to add special effects and
filters to photos

Computer Vision

Computer vision is a field of artificial intelligence that enables computers to


derive meaningful information from digital images, videos, and other visual
inputs . It uses machine learning algorithms to analyze and interpret images, and
it has a wide range of applications in various fields.
Computer vision works by using deep learning algorithms to analyze and
interpret images. These algorithms are trained on large datasets of images and
use convolutional neural networks (CNNs) to identify patterns and features in
the images .Once the patterns and features are identified, the computer can use
them to recognize objects, classify images, and perform other image-based
tasks.
Some of the applications of computer vision include:

72
1. Autonomous Vehicles: Computer vision is used in autonomous vehicles
to detect and recognize objects such as pedestrians, other vehicles, and
traffic signals. This helps the vehicle navigate safely and avoid accidents
2.
Retail: Computer vision is used in retail to track inventory, monitor
customer behavior, and analyze shopping patterns. It can also be used to
personalize the shopping experience for customers
3. Healthcare: Computer vision is used in healthcare to diagnose diseases,
monitor patients, and assist with surgical procedures. It can also be used
to analyze medical images such as X-rays and MRIs
4. Security: Computer vision is used in security to detect and recognize
faces, track individuals, and monitor public spaces. It can also be used to
detect suspicious behavior and prevent crime .
5. Entertainment: Computer vision is used in entertainment to create
special effects, enhance video games, and develop virtual reality
experiences. It can also be used to analyze audience reactions and
preferences.
Difference between Computer vision and Image processing

73
Robotics

Robotics is a field of engineering and science that involves the design,


construction, operation, and use of robots. Robots are autonomous or semi-
autonomous machines that can perform tasks in the physical world.

AI allows robots to interpret and respond to their environment. Advanced


sensors combined with AI algorithms enable robots to perceive and understand
the world around them using vision, touch, sound, and other sensory inputs.
Robots can learn from experience through machine learning algorithms. This
capability enables them to adapt to changing conditions, improve their
performance over time, and even learn from human demonstrations.They can
analyze data, assess situations, and make decisions based on predefined rules or
learned patterns.

74
Motivation behind Robotics: To cope up with increasing demands of a
dynamic and competitive market, modern manufacturing methods should satisfy
the following requirements:

• Reduced Production Cost


• Increased Productivity
• Improved Product Quality

Asimov's laws of robotics: The Three Laws of Robotics or Asimov's Laws are
a set of rules devised by the science fiction author Isaac Asimov

1. First Law - A robot may not injure a human being or, through inaction,
allow a human being to come to harm.
2. Second Law - A robot must obey the orders given it by human beings
except where such orders would conflict with the First Law.
3. Third Law - A robot must protect its own existence as long as such
protection does not conflict with the First or Second Laws.

Automation and robotics:- Automation and robotics are two closely related
technologies. In an industrial context, we can dean automation as a technology
that is concerned with the use of mechanical, electronic, and computer-based
systems in the operation and control of production Examples of this technology
include transfer lines. Mechanized assembly machines, feedback control
systems (applied to industrial processes), numerically controlled machine tools,
and robots. Accordingly, robotics is a form of industrial automation. Ex:-
Robotics, CAD/CAM, FMS, CIMS

Types of Automation:- Automation is categorized into three types. They are,

1) Fixed Automation : It is the automation in which the sequence of


processing or assembly operations to be carried out is fixed by the
equipment configuration. In fixed automation, the sequence of operations
(which are simple) are integrated in a piece of equipment. Therefore, it is
difficult to automate changes in the design of the product. It is used where
high volume of production is required Production rate of fixed automation
is high. In this automation, no new products are processed for a given
sequence of assembly operations. It is the automation in which the
sequence of processing or assembly operations to be carried out is fixed
by the equipment configuration. In fixed automation, the sequence of
75
operations (which are simple) are integrated in a piece of equipment.
Therefore, it is difficult to automate changes in the design of the product.
It is used where high volume of production is required Production rate of
fixed automation is high. In this automation, no new products are
processed for a given sequence of assembly operations.
2) Programmable Automation : It is the automation in which the
equipment is designed to accommodate various product configurations in
order to change the sequence of operations or assembly operations by
means of control program. Different types of programs can be loaded into
the equipment to produce products with new configurations (i.e., new
products). It is employed for batch production of low and medium
volumes. For each new batch of different configured product, a new
control program corresponding to the new product is loaded into the
equipment. This automation is relatively economic for small batches of
the product.:
3) Flexible Automation: A computer integrated manufacturing system
which is an extension of programmable automation is referred as flexible
automation. It is developed to minimize the time loss between the
changeover of the batch production from one product to another while
reloading. The program to produce new products and changing the
physical setup i.e., it produces different products with no loss of time.
This automation is more flexible in interconnecting work stations with
material handling and storage system.

Applications of Robotics :

• Manufacturing: Robots powered by AI are extensively used in


manufacturing for tasks like assembly, welding, and quality control. They
can operate collaboratively with humans, increasing efficiency and safety.
• Healthcare: AI-driven robots assist in surgeries, rehabilitation, and
patient care. They can navigate environments, perform precise
movements, and even provide companionship for patients.
• Autonomous Vehicles: Robotics and AI are crucial in the development
of autonomous vehicles, where robots (self-driving cars, drones) use AI
algorithms for navigation, obstacle avoidance, and decision-making.
• Service and Assistance: AI-powered robots are employed in various
service industries, such as customer service, hospitality, and retail. They
can interact with customers, answer queries, and perform tasks.
76
Challenges:

• Safety Concerns: As robots become more autonomous, ensuring their


safety in dynamic environments is a significant challenge. AI must enable
robots to make safe decisions to avoid accidents.
• Ethical Considerations: The integration of AI in robotics raises ethical
questions, such as the impact on employment, privacy concerns, and the
potential misuse of autonomous systems.
• Interdisciplinary Nature: Successful implementation of AI in robotics
requires collaboration between experts in computer science, engineering,
and other fields, making it a complex and interdisciplinary challenge.

Different Types of robots


There are as many different types of robots as there are tasks.

1. Androids
Androids are robots that resemble humans. They are often mobile, moving
around on wheels or a track drive. According to the American Society of
Mechanical Engineers, these humanoid robots are used in areas such as
caregiving and personal assistance, search and rescue, space exploration and
research, entertainment and education, public relations and healthcare, and
manufacturing.

2. Telechir
A telechir is a complex robot that is remotely controlled by a human operator
for a telepresence system. It gives that individual the sense of being on location
in a remote, dangerous or alien environment, and enables them to interact with it
since the telechir continuously provides sensory feedback.

3. Telepresence robot
A telepresence robot simulates the experience -- and some capabilities -- of
being physically present at a location. It combines remote monitoring and
control via telemetry sent over radio, wires or optical fibers, and enables remote
business consultations, healthcare, home monitoring, childcare and more.

4. Industrial robot

77
The IFR (International Federation of Robotics) defines an industrial robot as an
"automatically controlled, reprogrammable multipurpose manipulator
programmable in three or more axes." Users can adapt these robots to different
applications as well. Combining these robots with AI has helped businesses
move them beyond simple automation to higher-level and more complex tasks.

5.Swarm robot
Swarm robots (aka insect robots) work in fleets ranging from a few to
thousands, all under the supervision of a single controller. These robots are
analogous to insect colonies, in that they exhibit simple behaviors individually,
but demonstrate behaviors that are more sophisticated with an ability to carry
out complex tasks in total.

6. Smart robot
This is the most advanced kind of robot. The smart robot has a built-in AI
system that learns from its environment and experiences to build knowledge and
enhance capabilities to continuously improve. A smart robot can collaborate
with humans and help solve problems in areas like the following:
• agricultural labor shortages;
• food waste;
• study of marine ecosystems;
• product organization in warehouses; and
• clearing of debris from disaster zones

78

You might also like