Ai and ML Module 3
Ai and ML Module 3
Machine Learning
3.1 INTRODUCTION TO LEARNING. AND ITS TYPES
The process of experience, or
called as acquiring knowledge
learning. Generally,
and expertise through study,
different ways. To make machines
humans learn in
being
learn
wetaught
mathematicians and logicians. This needt
simulate the strategies of human learning in machines. But, willthe computers learn?
has been raised
First let us
over many centuries by philosophers,
This,
questio
of tasks can the computers learn?
the nature of addressthe question - What sortsolve. There aretwo kinds of problems.
depends U
problems that the can -
wewel -ld-epfoinsteet
and ill-posed. Computers have
problems, as these
Computers can solve only well-posed
specifications and have the following Componentsinherent to it.
1.Class of learning tasks (T)
2. A measure of
3. A source of performance
(P)
The standard experienceof
(E)
from Efor the task definition learning proposed by Tom Mitchell is that a program can lea
as follows:
T, and P
improves with experience E. Let us formalize the concept of learming
Let x be the input and Xbe Yis
the input space, which is the set of all inputs, and the output
space, which is the set of all possible outputs, that is,
yes/no.
Let D be the input dataset with examples, (x. ,V.). (z,,y,),". (x, Y,) tor n inputs.
Let the unknown target function be f: X -’ Y, that maps the input
Ihe objective of the learning program is to pick a function, g: X’ Yto space output space
to
All the possible formulae form a hvpothesis space. In short, let H be theapprOXimate hypothesis f
set of all tormulae from
which the learning algorithm chooses. The choice is good when the hypothesis g replicates ffor al
samples. This is shown in Figure 3.1.
Ideal target
hypothesis f)
Training
samples
Learning Generated
algorithms hypothesis g) Error
Candidate Difference
formulae
Hypothesis
space H
,X,,",are the
components of the input vector, w, W,.., w, are the weights and +1
where,
represent the class. This simple model is called perception model. One can simplify this by
and-1 can further be simplified as:
=b and fixing it as 1, then the model
making W,
h(x) = sign(w"x).
Thisisscalled perception learning
algorithm. The:formal learning models are later discussed in
Section 3.7 of this chapter.
Learning Types
There are different types of learning. Some of the different learning methods are as follows:
1. Learn by memorization orlearn by repetition also called as rote learning is done by
memorizing without understanding the logic or concept. Although rote learring is
basically learning by repetition, in machine learning perspective, the learningoccurs by
simply comparing with the existing knowledge for the same input data and producing the
output if present.
2. Learn by examples also called as learn by experience or previous knowledge acquired
at some time, is like finding an analogy, which means performing inductive learning
from observations that formulate a general concept. Here, the learner learns by inferring a
general rule from the set of observations or examples. Therefore, inductive learning is also
called as discovery learning.
80
Machine Learning called as
3. teacher, generally
Learn by being taught by an learning called activelearning whera
expert or a
pasStoe learh
However, there
interactively query
is a aspecial kind of to label unlabelled data instances withhe the des
teacher/expert
TrainingExperience
consderr desigring of a chess game. In direct experience,
individual board states and correct
Letus directly. In indirect system, the move sequences and results are
f
mOresof the chess game are given label all
given. The training experience also depends on the presence of asupervisor who can
onl boardI state. In the absence of asupervisor, the game agent plays against itself and
ralid mo fora
dmoves distributed
good moves, if the training samples cover all scenarios, or in other words,
learnsthe computation. Ifthe training samples and testing samples have the same
enoughfor performance
good.
istrbution, the results would be
Computing the error as the difference between trained and expected apprOximation
be error(b).
Then, for every board feature x, the weights are updated as: hypothesis.
w, = w, +ux error(b) Xx
Here, u is the constant that moderates the size of the
Thus, the learning system has the weight update.
A following components:
Performance
A Critic system to
system to allow the game to play against itself.
A generate the samples.
Generalizer system to generate a hypothesis based on samples.
An Experimenter
This is sent as inputsystem
to the
to generate a new system based on the
currently learnt
performance system. function
3.4
INTRODUCTION
Concept learning is a
T0 CONCEPT
LEARNING
concept or
deriving learning from
a category strategy of acquiring abstract knowledge or
inferring a general
generalization from the data. the given training samples. It is a
process of abstraction.
Concept learning helps to classify an object that and
it helps learner compare and
a has a set of common, relevant
positive and negative instances in the categories based on the similarity and features. Thusof
contrast
simplify by observing the common training data to classify an object. The learner association
tries to
simplified model to the future samples.features from the training samples and
This task is also known as then apply this
Each concept or category learning from
or false value. For
example,
obtained by learning is a Boolean valued function experience.
relevant features and categorizehumans can identify different kinds of which takes a true
that distinguish one animal all animals based on animals based on common
specific
from another can be called as a sets of features. The
specialfeatures
for object and to concept. This way
It is formally recognize new instances of those categories is called as of learning categories
defined as inferring a Boolean valued function by processing training concept learning.
Concept learning requires three things: instances.
1. Input Training
dataset which is a set of training
of a concept or instances, each labeled with the name
category to which it belongs. Use this
the model. past experience train and build
to
2. Output - Target concept or Target
output functionf. It is a mapping function f(x) from input x to
y. It is to determine the specific
In other words, it is to find the features or commorn features to
to determine the target identify an object.
specific set of features to identifyhypothesis
an elephant from all animals. concept. For e.g.. the
3. Test New instances to test the learned model.
Basics of Learning Theory 83
Formally, Concept learning is defined as-"Given a set of hypotheses, the learner searches
through
thehypothesis space toidentify the best hypothesis that matches thetarget concept':
Considerthe followingset of training instances shown in Table 3.1.
Sample Training Instances
Table3.1:
Tail Tusks Paws Fur Color Hooves
Horns Size Elephant
SNO.
Short Yes No No Black No Yes
No Big
No No
Yes
Short No Brown Yes Medium No
2 No No
No Short Yes Black No Medium Yes
3.
No Yes Yes White No Medium No
No Long
4.
Yes Yes Yes Black No
5
No Short Big Yes
Thus, concept learning can also be called as Inductive Learning that tries to
function from specific training instances. This way of learning a hypothesis thatinduce a general
can produce an
approximate target function with a sufficiently large set of training instances can also approxi
mately classify other unobserved instances and is called as inductive learning hypothesis. We can
only determine an approximate target function because it is very difficult to find an exact target
function with the observed training instances. That is why ahypothesis is an approximate target
function that best maps the inputs to outputs.
Basics of Learning Theory 85
34.2 HypothesisSpace
is the set of all possible hypotheses that approximates the target function f.
Hpothesisspace set of all possible
other words, the approximations of the target function can be defined as
hypothesisspace.Fromthis set of hypothesessiinthe
In hypothesis space, a machine learning algorithm
determinethee best possible hypothesis that would best describe the target function or best
would
outputs. Generally, a hypothesis representation language represents a larger hypothesis
fRtthe Every machine learning algorithm would represent the hypothesis space in a different
space.
manner about the function that maps the input variables to output variables. For example,
regressionalgorithm represents the hypothesis space as alinear function whereas a decision
a representsthe.hypothesis space as atree.
tree algorithm
The set of hypotheses that can be generated by alearning algorithm can be further reduced by
specifyingalanguagebias.
The subset of hypothesis space that is
consistent with all-observed training instances is called
for the classification.
Version Space. Version space represents the only hypotheses that are used
of
Eor example, each of the attribute given in the Table 3.1 has the following possible set
values.
Horns- Yes, No
Tail- Long, Short
Tusks - Yes, No
Paws - Yes, No
Fur - Yes, No
Color - Brown, Black, White
Hooves- Yes, No
Size -Medium, Big
Considering these values for each of the attribute, there are (2 x2x2x2 x2x3x2 x2) =384
distinct instances covering all the 5 instances in the training dataset.
So, we can generate (4 x4 x4 x4x4x5x4 x4) =81,920 distinct hypotheses when including
two more values [2,o] for each of the attribute. However, any hypothesis containing one or more
psymbols represents the empty set of instances; that is, it classfies every instance as negative
instance. Therefore, there will be (3 x3 x3x3 x3 x 4x3 x3+1) =8,749 distinct hypotheses
by including only " for each of the attribute and one hypothesis representing the empty set
of instances. Thus, the hypothesis space is much larger and hence we need efficient learning
algorithms to search for the best hypothesis from the set of hypotheses.
Hypothesis ordering is also important wherein the hypotheses are ordered from the most
Specitic one to the most general one in order to restrict searching the hypothesis space exhaustively.
3.4.3 Heuristic Space Search
neuristic search is a search strategy that finds an optimized hypothesis/solution to a problem
ieratively improving the hypothesis/solution based on a given heuristic function or a cost
teasure. Heuristic search methods willgenerate apossble hypothesis that can be asolution in
86
Machine Learning
the hypothesis will be tested
hypothesis
function spae
or the goal or a Path trom theinitial state. This
state to see if it is a real solution. If the tested
hypothesis is a real with the
targ
it will be
selected. This
better hypothesis but maymethod generally increasesthe efficiency because it
not be the best hypothesis. It is useful for
is
solving tough sol
guaranteedution, the to in
heurproisbltiecmssearcwhih .
could not solved by any problem solved by
the travelling other method. The typical example
Several
salesman problem.
commonly used heuristic search methods are hill climbing methods
satisfaction problems, best-first t search, simulated-annealing, A* algorithm, and genetic. calgoornsitrtahi.:
3.4.4
Generalization
In order to
and ISpecialization
principle of understand about how we construct this concept hierarchy, let us apply this
gener;
relation. By generalization of the mosts
and by
an
general izatio n/specialization
specialization the most general hypothesis, the hypothesis
of space specific
can be hypothesto
searched
approximate hypothesis that
instance. matches all positive instances but does not match any negative
Generalization - Specific to General Learning This learning methodology will search through
the hypothesis space for an approximate hypothesis by generalizing the most specific hypothesis
Example 3.2: Consider the training instances shown in Table 3.1 and illustrate Specific to
General Learning
Solution: We will start from all false or the most specific hypothesis to determine the moSt
restrictive specialization. Consider only the positive instances and generalize the most specit
hypothesis. Ignore the negative instances.
This learning is illustrated as follows:
The most specific hypothesis 1s taken now, which will not classify any instance to true.
h= <9
Read the first instance I1, to generaliZe the hypothesis h so that this positive
instance ca
classified by the hypothesis h1.
Short Yes No No Black
Big Yes (Positive instance
I1: No No
<No Short Yes No No Black No
h1=
Big>
Basics of Learning Theory 87
instance 12, it is a negative instance, so ignore it.
readingthe.second
When Short No No Brown Yes
I2: Yes
No Medium No (Negative instance)
Short Yes
No No Black No Big>
H2=<No
Siilarly, when
readingthethhird instance I3, it is a positiveinstance so generalize h2to h3 to
generalized.
accommodateit. The resulting h3 is
Short Yes No No Black No Medium Yes (Positive instance)
13: No
No No Black No ?>
<No Short Yes
k3=
14ssince it is a
negative instance.
lgnore White No
No Yes Yes Medium No (Negative instance)
14: No Long
No No Black No ?>
h4=<No Short Yes
fifth instance I5, h4 is further generalized to h5.
Whenreadingthe Yes Yes Black No Big Yes (Positive instance)
5: No
Short Yes
Yes 2 ? Black No ?>
h5= <No Short
allthe positive instances, an approximate hypothesis h5 is generated
Now, after observing instance to true.
which can now classity any subsequent positive
3.3: Ilustrate learning by Specialization - General to Specific Learning for the data
Example
instances shown in Table 3.1.
true all positive and negative
Solution: Start from the most general hypothesis which willmake
instances.
Initially.
h=<? ? ? ? ?
2. Generalize the initial hypothesis for the first positive instance [Since »h' is more specific!
3. For each subsequent instances:
If it is a positive instance,
Check for each attribute value in the instance with the hypothesis 'h.
Ifthe attribute value is the same as the hypothesis value, then do
nothny
Else if the attribute value is different than the hypothesis value, change
to "? in 'h.
Else if it is a negative instance,
Ignore it.
Example 3.4: Consider the training dataset of 4instances shown in Table 3.2. It Containsthe
details of the performance of students and their likelihood of theirfinal
getting ajob offer or not in
semester. Apply the Find-S algorithm.
Basics of Learning Theory 89
Training Dataset
Table3.2: Practical
CGPA Interactiveness Communication Logical Interest Job
Knowledge Skills
Excellent Good
Thinking Offer
Fast Yes
Yes Yes
Good Good Fast Yes
Yes Yes
Good Good Fast No No
No
Good Good Slow No Yes
Yes
Solution:
Initialize'W to the most specific hypothesis. There are 6 attributes, so for each attribute,
Step1:
initiallyfilI'ã in theinitial hypothesis 'h'.
wei
h=<o
2: Generalize the initial hypothesis for the first positive instance. I1 is a positive instance,
Step
generalizethe most specific hypothesis 'h to include this positive instance. Hence,
sO Good Fast Yes Positive instance
I1: 29 Yes Excellent
h=<9 Yes Excellent Good Fast Yes>
is a positive instance. Generalize 'W to include positive
Step3: Scan the next instance 2, since I2
n .For each of the non-matching attribute value in 'h put a '? toinclude this positive
a '.
instance. The third attribute value is mismatching in 'h' with I2, soput
2: 29 Yes Good Good Fast Yes Positive instance
h=<9 Yes ? Good Fast Yes>
Now,scan 13. Since it is a negative instance, ignore it. Hence, the hypothesis remains the same
without any change after scanning I3.
I3: 28 No Good Good Fast No Negative instance
h=<9 Yes Good Fast Yes>
Now scan 14. Since it is a positive instance, check for mismatch in the hypothesis 'h with l4.
The 5th and 6th attribute value are mismatching, so add ? to those attributes in 'h'.
14: 29 Yes Good Good Slow No Positive instance
h=<9 Yes ? Good ?>
It includes all positive instances arnd obviously ignores any negative instance.