UNIT -3 ITAI & ML
UNIT -3 ITAI & ML
UNIT
BAYES THEOREM
Machine Learning is one of the most emerging technology of Artificial
Intelligence. We are living in the 21th century which is completely
driven by new technologies and gadgets in which some are yet to be
used and few are on its full potential. Similarly, Machine Learning is
also a technology that is still in its developing phase. There are lots of
concepts that make machine learning a better technology such as
supervised learning, unsupervised learning, reinforcement learning,
perceptron models, Neural networks, etc. In this article "Bayes
Theorem in Machine Learning", we will discuss another most
important concept of Machine Learning theorem i.e., Bayes Theorem.
But before starting this topic you should have essential understanding
of this theorem such as what exactly is Bayes theorem, why it is used
in Machine Learning, examples of Bayes theorem in Machine Learning
and much more. So, let's start the brief introduction of Bayes theorem.
UNIT - III 1
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Bayes theorem is also known with some other name such as Bayes
rule or Bayes Law. Bayes theorem helps to determine the probability
of an event with random knowledge. It is used to calculate the
probability of occurring one event while other one already occurred.
It is a best method to relate the condition probability and marginal
probability.
UNIT - III 2
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
1. Experiment
2. Sample Space
S1 = {1, 2, 3, 4, 5, 6}
S2 = {Head, Tail}
3. Event
UNIT - III 4
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Assume in our experiment of rolling a dice, there are two event A and
B such that;
UNIT - III 5
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
4. Random Variable:
5. Exhaustive Event:
As per the name suggests, a set of events where at least one event
occurs at a time, called exhaustive event of an experiment.
UNIT - III 6
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
6. Independent Event:
7. Conditional Probability:
8. Marginal Probability:
UNIT - III 7
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
These are two conditions given to us, and our classifier that works on
Machine Language has to predict A and the first thing that our
classifier has to choose will be the best possible class. So, with the help
of Bayes theorem, we can write it as:
Here;
P(A) will remain constant throughout the class means it does not
change its value with respect to change in class. To maximize the
P(Ci/A), we have to maximize the value of term P(A/Ci) * P(Ci).
UNIT - III 8
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
With n number classes on the probability list let's assume that the
possibility of any class being the right answer is equally likely.
Considering this factor, we can say that:
P(C1)=P(C2)-P(C3)=P(C4)=…..=P(Cn).
UNIT - III 9
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
MAXIMUM LIKELIHOOD
In this post I’ll explain what the maximum likelihood method for
parameter estimation is and go through a simple example to
demonstrate the method. Some of the content requires knowledge of
fundamental probability concepts such as the definition of joint
probability and independence of events. I’ve written a blog post with
these prerequisites so feel free to read this if you think you need a
refresher.
UNIT - III 10
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 11
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
The above definition may still sound a little cryptic so let’s go through
an example to help understand this.
Let’s suppose we have observed 10 data points from some process. For
example, each data point could represent the length of time in seconds
that it takes a student to answer a specific exam question. These 10
data points are shown in the figure below
UNIT - III 12
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
For these data we’ll assume that the data generation process can be
adequately described by a Gaussian (normal) distribution. Visual
inspection of the figure above suggests that a Gaussian distribution is
plausible because most of the 10 points are clustered in the middle
with few points scattered to the left and the right. (Making this sort of
decision on the fly with only 10 data points is ill-advised but given that
I generated these data points we’ll go with it).
UNIT - III 13
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
The 10 data points and possible Gaussian distributions from which the
data were drawn. f1 is normally distributed with mean 10 and variance
2.25 (variance is equal to the square of the standard deviation), this is
also denoted f1 ∼ N (10, 2.25). f2 ∼ N (10, 9), f3 ∼ N (10, 0.25) and f4
∼ N (8, 2.25). The goal of maximum likelihood is to find the parameter
values that give the distribution that maximise the probability of
observing the data.
The true distribution from which the data were generated was f1 ~
N(10, 2.25), which is the blue curve in the figure above.
UNIT - III 14
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
We just have to figure out the values of μ and σ that results in giving
the maximum value of the above expression.
the derivative of the function, set the derivative function to zero and
then rearrange the equation to make the parameter of interest the
subject of the equation. And voilà, we’ll have our MLE values for our
parameters. I’ll go through these steps now but I’ll assume that the
reader knows how to perform differentiation on common functions. If
you would like a more detailed explanation then just let me know in
the comments.
The above expression for the total probability is actually quite a pain
to differentiate, so it is almost always simplified by taking the natural
logarithm of the expression. This is absolutely fine because the natural
logarithm is a monotonically increasing function. This means that if the
value on the x-axis increases, the value on the y-axis also increases (see
figure below). This is important because it ensures that the maximum
value of the log of the probability occurs at the same point as the
original probability function. Therefore we can work with the simpler
log-likelihood instead of the original likelihood.
UNIT - III 16
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 17
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Finally, setting the left hand side of the equation to zero and then
rearranging for μ gives:
UNIT - III 18
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
GIBBS ALGORITHM
In statistical mechanics, the Gibbs algorithm, introduced by J. Willard
Gibbs in 1902, is a criterion for choosing a probability distribution for
the statistical ensemble of microstates of a thermodynamic
system by minimizing the average log probability
subject to the probability distribution pi satisfying a set of constraints
(usually expectation values) corresponding to the
known macroscopic quantities. in 1948, Claude Shannon interpreted
the negative of this quantity, which he called information entropy, as
a measure of the uncertainty in a probability distribution.[1] In
1957, E.T. Jaynes realized that this quantity could be interpreted as
missing information about anything, and generalized the Gibbs
algorithm to non-equilibrium systems with the principle of maximum
entropy and maximum entropy thermodynamics.
Physicists call the result of applying the Gibbs algorithm the Gibbs
distribution for the given constraints, most notably Gibbs's grand
UNIT - III 19
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
canonical ensemble for open systems when the average energy and
the average number of particles are given. (See also partition
function).
This general result of the Gibbs algorithm is then a maximum entropy
probability distribution. Statisticians identify such distributions as
belonging to exponential families.
NAÏVE BAYES CLASSIFIER
o Naïve Bayes algorithm is a supervised learning algorithm, which
is based on Bayes theorem and used for solving classification
problems.
o It is mainly used in text classification that includes a high-
dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
UNIT - III 20
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Bayes' Theorem:
Where,
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
UNIT - III 22
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
13 Overcast Yes
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
UNIT - III 23
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
P(Sunny)= 0.35
P(Yes)=0.71
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
There are three types of Naive Bayes Model, which are given below:
Steps to implement:
UNIT - III 25
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
In the above code, we have loaded the dataset into our program using
"dataset = pd.read_csv('user_data.csv'). The loaded dataset is
divided into training and test set, and then we have scaled the feature
variable.
UNIT - III 26
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
After the pre-processing step, now we will fit the Naive Bayes model
to the Training set. Below is the code for it:
UNIT - III 27
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Output:
Now we will predict the test set result. For this, we will create a new
predictor variable y_pred, and will use the predict function to make
the predictions.
Output:
UNIT - III 28
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
The above output shows the result for prediction vector y_pred and
real vector y_test. We can see that some predications are different
from the real values, which are the incorrect predictions.
Now we will check the accuracy of the Naive Bayes classifier using the
Confusion matrix. Below is the code for it:
Output:
As we can see in the above confusion matrix output, there are 7+3=
10 incorrect predictions, and 65+25=90 correct predictions.
Next we will visualize the training set result using Naïve Bayes
Classifier. Below is the code for it:
8. mtp.xlim(X1.min(), X1.max())
9. mtp.ylim(X2.min(), X2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
Output:
UNIT - III 30
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
In the above output we can see that the Naïve Bayes classifier has
segregated the data points with the fine boundary. It is Gaussian curve
as we have used GaussianNB classifier in our code.
8. mtp.xlim(X1.min(), X1.max())
9. mtp.ylim(X2.min(), X2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('purple', 'green'))(i), label = j)
Output:
UNIT - III 31
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
The above output is final output for test set data. As we can see the
classifier has created a Gaussian curve to divide the "purchased" and
"not purchased" variables. There are some wrong predictions which
we have calculated in Confusion matrix. But still it is pretty good
classifier.
UNIT - III 32
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Suppose there are two categories, i.e., Category A and Category B, and
we have a new data point x1, so this data point will lie in which of
these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category
or class of a particular dataset. Consider the below diagram:
UNIT - III 33
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 34
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 35
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
UNIT - III 37
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
The Data Pre-processing step will remain exactly the same as Logistic
Regression. Below is the code for it:
1. # importing libraries
2. import numpy as nm
3. import matplotlib.pyplot as mtp
4. import pandas as pd
5.
6. #importing datasets
7. data_set= pd.read_csv('user_data.csv')
8.
9. #Extracting Independent and dependent Variable
10. x= data_set.iloc[:, [2,3]].values
11. y= data_set.iloc[:, 4].values
12.
13. # Splitting the dataset into training and test set.
14. from sklearn.model_selection import train_test_split
15. x_train, x_test, y_train, y_test= train_test_split(x, y, test_s
ize= 0.25, random_state=0)
16.
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)
UNIT - III 38
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
From the above output image, we can see that our data is successfully
scaled.
Now we will fit the K-NN classifier to the training data. To do this
we will import the KNeighborsClassifier class of Sklearn
Neighbors library. After importing the class, we will create
the Classifier object of the class. The Parameter of this class will
be
And then we will fit the classifier to the training data. Below is
the code for it:
Output: By executing the above code, we will get the output as:
Out[10]:
KNeighborsClassifier(algorithm='auto', leaf_size=30,
metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
o Predicting the Test Result: To predict the test set result, we will
create a y_pred vector as we did in Logistic Regression. Below is
the code for it:
Output:
UNIT - III 40
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Now we will create the Confusion Matrix for our K-NN model to
see the accuracy of the classifier. Below is the code for it:
Output: By executing the above code, we will get the matrix as below:
UNIT - III 41
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
Now, we will visualize the training set result for K-NN model. The
code will remain same as we did in Logistic Regression, except
the name of the graph. Below is the code for it:
UNIT - III 42
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13. mtp.title('K-NN Algorithm (Training set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()
Output:
The output graph is different from the graph which we have occurred
in Logistic Regression. It can be understood in the below points:
o As we can see the graph is showing the red point and green
points. The green points are for Purchased(1) and Red
Points for not Purchased(0) variable.
o The graph is showing an irregular boundary instead of
showing any straight line or any curve because it is a K-NN
algorithm, i.e., finding the nearest neighbor.
UNIT - III 43
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
After the training of the model, we will now test the result by
putting a new dataset, i.e., Test dataset. Code remains the same
except some minor changes: such as x_train and y_train will be
replaced by x_test and y_test.
16. mtp.legend()
17. mtp.show()
Output:
The above graph is showing the output for the test data set. As we can
see in the graph, the predicted output is well good as most of the red
points are in the red region and most of the green points are in the
green region.
However, there are few green points in the red region and a few red
points in the green region. So these are the incorrect observations that
we have observed in the confusion matrix(7 Incorrect output).
UNIT - III 45
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 46
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 47
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
The need for machine learning is increasing day by day. The reason
behind the need for machine learning is that it is capable of doing tasks
that are too complex for a person to implement directly. As a human,
we have some limitations as we cannot access the huge amount of
data manually, so for this, we need some computer systems and here
comes the machine learning to make things easy for us.
UNIT - III 48
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
have build machine learning models that are using a vast amount of
data to analyze the user interest and recommend product accordingly.
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
UNIT - III 49
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
1) Supervised Learning
The goal of supervised learning is to map input data with the output
data. The supervised learning is based on supervision, and it is the
same as when a student learns things in the supervision of the teacher.
The example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
The training is provided to the machine with the set of data that has
not been labeled, classified, or categorized, and the algorithm needs
to act on that data without any supervision. The goal of unsupervised
learning is to restructure the input data into new features or a group
of objects with similar patterns.
o Clustering
UNIT - III 50
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
o Association
3) Reinforcement Learning
Before some years (about 40-50 years), machine learning was science
fiction, but today it is the part of our daily life. Machine learning is
making our day to day life easy from self-driving cars to Amazon
virtual assistant "Alexa". However, the idea behind machine learning
is so old and has a long history. Below some milestones are given
which have occurred in the history of machine learning:
UNIT - III 51
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 52
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
o The duration of 1974 to 1980 was the tough time for AI and ML
researchers, and this duration was called as AI winter.
o In this duration, failure of machine translation occurred, and
people had reduced their interest from AI, which led to reduced
funding by the government to the researches.
UNIT - III 53
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 54
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
giving the new set of fruit. The model will identify the fruit and predict
the output using a suitable algorithm.
UNIT - III 55
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 56
[INTRODUCTION TO ARTIFICIAL INTELLIGENCE & MACHINE LEARNING]
UNIT - III 57