0% found this document useful (0 votes)
81 views

Unit 1-Concept Learning

The document discusses concept learning, which involves learning general concepts or categories from specific training examples. Concept learning can be viewed as inferring a boolean-valued function (concept) from labeled examples of instances and their target values. The goal is to learn the target concept that can predict the target value of new instances. Concept learning involves searching through a space of hypotheses to find the one that best fits the training examples. The Find-S algorithm is described for finding the most specific hypothesis consistent with the positive examples. However, Find-S has limitations like not detecting inconsistent data and not considering negative examples.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Unit 1-Concept Learning

The document discusses concept learning, which involves learning general concepts or categories from specific training examples. Concept learning can be viewed as inferring a boolean-valued function (concept) from labeled examples of instances and their target values. The goal is to learn the target concept that can predict the target value of new instances. Concept learning involves searching through a space of hypotheses to find the one that best fits the training examples. The Find-S algorithm is described for finding the most specific hypothesis consistent with the positive examples. However, Find-S has limitations like not detecting inconsistent data and not considering negative examples.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Concept Learning

Concept Learning
• Learning involves acquiring general concepts from specific
training examples. Example: People continually learn
general concepts or categories such as "bird," "car,"
"situations in which I should study more in order to pass the
exam," etc.
• Each such concept can be viewed as describing some subset
of objects or events defined over a larger set
• Alternatively, each concept can be thought of as a Boolean-
valued function defined over this larger set. (Example: A
function defined over all animals, whose value is true for
birds and false for other animals).

Concept learning - Inferring a Boolean-valued function from


training examples of its input and output
A Concept Learning Task
Consider the example task of learning the target oncept
"Days on which my friend Aldo enjoys his favorite water sport."

Example Sky AirTemp Humidit Wind Water ForecastEnjoySpor


y t
1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Table desribes examples for enjoysport concept along with


attributes
The attribute EnjoySport indicates
whether or not a Person enjoys his favorite
water sport on this day.

•The task is to learn to predict the value of EnjoySport


for an arbitrary day, based on the values of its other
attributes ?
What hypothesis representation is provided to the learner?

Let’s consider a simple representation in


which each hypothesis consists of a conjunction of
constraints on the instance attributes.

Let each hypothesis be a vector of six constraints, specifying


the values of the six attributes Sky, AirTemp, Humidity, Wind,
Water, and Forecast.

For each attribute, the hypothesis will either


•Indicate by a "?' that any value is acceptable for this attribute,
•Specify a single required value (e.g., Warm) for the attribute,
or
•Indicate by a "Φ" that no value is acceptable
If some instance x satisfies all the constraints of
hypothesis h, then h classifies x as a positive example
(h(x) = 1).
The hypothesis that PERSON enjoys his favorite sport only on
cold days with high humidity (independent of the values of
the other attributes) is represented by the expression
(?, Cold, High, ?, ?, ?)
The most general hypothesis-that every day is a positive
example-is represented by
(?, ?, ?, ?, ?, ?)

The most specific hypothesis-that no day is a positive


example-is represented by (Φ , Φ, Φ, Φ, Φ, Φ)
Notation
The set of items over which the concept is defined is called the set of
instances, which we denote by X.
Example: X is the set of all possible days, each represented by the
attributes: Sky, AirTemp, Humidity, Wind, Water, and Forecast

The concept or function to be learned is called the target concept,


which we denote by c.
c can be any Boolean valued function defined over the instances X
c : X {O, 1}

Example: The target concept corresponds to the value of the attribute


EnjoySport
(i.e., c(x) = 1 if EnjoySport = Yes, and c(x) = 0 if EnjoySport = No).
• Instances for which c(x) = 1 are called positive examples, or
members of the target concept.
• Instances for which c(x) = 0 are called negative examples, or
non-members of the target concept.
• The ordered pair (x, c(x)) - used to describe
the training example consisting of the instance x and
its target concept value c(x).
• D to denote the set of available training examples
• The symbol H to denote the set of all possible hypotheses
that the learner may consider regarding the identity of the
target concept. Each hypothesis h in H represents a
Boolean-valued function defined over X
h : X{O, 1}

• The goal of the learner is to find a hypothesis h such that h(x)


= c(x) for all x in X.
Exampl Sky AirTemp Humidit Wind Water Forecas EnjoySpor
e y t t

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Table describes the example days along with attributes of Enjoysport concept
The Inductive Learning Hypothesis

Any hypothesis found to approximate the target function


well over a sufficiently large set of training examples will
also approximate the target function well over other
unobserved examples.
Concept learning as Search
• Concept learning can be viewed as the task of searching through a
large space of hypotheses implicitly defined by the hypothesis
representation.
• The goal of this search is to find the
hypothesis that best fits the training
examples.
Example, the instances X and hypotheses H in the EnjoySport
learning task.
The attribute Sky has three possible values, and AirTemp, Humidity,
Wind, Water Forecast each have two possible values, the instance
space X contains exactly
• 3.2.2.2.2.2 = 96 Distinct possible instances
• 5.4.4.4.4.4 = 5120 Syntactically distinct hypotheses within H.
Concept learning as Search

Every hypothesis containing one or more " Φ" symbols


represents the empty set of instances; that is, it classifies
every instance as negative.
1 + (4.3.3.3.3.3) = 973. (Number of Semantically distinct
hypothesis)
General-to-Specific Ordering of
Hypotheses
• Consider the two hypotheses
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)

• Consider the sets of instances that are classified positive by


hl and by h2.
• h2 imposes fewer constraints on the instance, it classifies
more instances as positive. So, any instance classified
positive by hl will also be classified positive by h2.
Therefore, h2 is more general than hl.
General-to-Specific Ordering of Hypotheses
• Given hypotheses hj and hk, hj is more-general-than or- equal do
hk if and only if any instance that satisfies hk also satisfies hj

• Definition: Let hj and hk be Boolean-valued functions defined


over X. Then hj is more general-than-or-equal-to hk (written
hj ≥ hk) if and only if
• In the figure, the box on the
left represents the set X of
all instances, the box on
the right the set H of all
hypotheses.
• Each hypothesis
corresponds to some
subset of X-the subset of
instances that it classifies
positive.
• The arrows connecting
hypotheses represent the
more - general -than
relation, with the arrow
pointing toward the less
general hypothesis.
• Note the subset of
instances characterized by
h2 subsumes the subset
characterized by h l , hence
h2 is more - general– than
h1
Figure shows instances,hypothesis,more general than relation
FIND-S: Finding a Maximally Specific
Hypothesis
FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is
satisfied by x
3. Output hypothesis h
Questions Unanswered by FIND-S

1. Has the learner converged to the correct target concept?


2. Why prefer the most specific hypothesis?
3. Are the training examples consistent?
4. What if there are several maximally specific consistent
hypotheses?
Limitations of FIND-S algorithm

•There is no way to determine if the hypothesis is consistent


throughout the data.

•Inconsistent training sets can actually mislead the Find-S


algorithm, since it ignores the negative examples.

•Find-S algorithm does not provide a backtracking technique to


determine the best possible changes that could be done to
improve the resulting hypothesis.
Example :
Consider the following data set having the data about which particular seeds
are poisonous. Apply Find –S algorithm for this data set and obtain the final
hypothesis. (Target concept to be learnt is given the set of attribute values of
the seeds determine whether they are poisonous or not)
First we consider the hypothesis to be more specific hypothesis. Hence, our hypothesis
would be :
h0 = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Consider example 1 :
The data in example 1 is { GREEN, HARD, NO, WRINKLED }. We see that our initial
hypothesis is more specific and we have to generalize it for this example. Hence, the
hypothesis becomes :
h1 = { GREEN, HARD, NO, WRINKLED }
Consider example 2 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h2 = { GREEN, HARD, NO, WRINKLED }
Consider example 3 :
Here we see that this example has a negative outcome. Hence we neglect this
example and our hypothesis remains the same.
h3 = { GREEN, HARD, NO, WRINKLED }
Consider example 4 :
The data present in example 4 is { ORANGE, HARD, NO, WRINKLED }. We compare
every single attribute with the initial data and if any mismatch is found we replace that
particular attribute with general case ( ” ? ” ). After doing the process the hypothesis 3
h3 = { GREEN, HARD, NO, WRINKLED } becomes :

h4 = { ?, HARD, NO, WRINKLED }


Consider example 5 :
The data present in example 5 is { GREEN, SOFT, YES, SMOOTH }. We compare
every single attribute with the initial data and if any mismatch is found we
replace that particular attribute with general case ( ” ? ” ). After doing the
process the hypothesis ,
h4 = { ?, HARD, NO, WRINKLED } becomes ,
h5 = { ?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis
have the general condition, the example 6 and example 7 would result
in the same hypothesis with all general attributes.

Therefore h5 = { ?, ?, ?, ? } and also h6 and h7 will be :

h6 = { ?, ?, ?, ? }
h7 = { ?, ?, ?, ? }

Hence, for the given data the final hypothesis would be :


Final Hyposthesis: h = { ?, ?, ?, ? }
Find-S:Shortcomings
The algorithm finds one hypothesis, but can’t
tell whether it has found the only hypothesis
which is consistent with the data or if there
are more such hypotheses
Why prefer the most specific hypothesis?
 Multiple hypotheses consistent with the training example
 Find-S will find the most specific.

Are the training examples consistent?TR7T


 The training examples will contain at least some error or noise
 Such inconsistent sets of training examples can mislead Find-S

What if there are several maximally specific


consistent hypotheses?
• Several maximally specific hypotheses consistent with the data,
No maximally specific consistent hypothesis
Definitions
• Consistent
A hypothesis h is consistent with a set of training
examples D
if and only if h(x) = c(x) for each example <x, c(x)> in
D.
Consistent(h, D)   <x, c(x)>  D , h(x) = c(x)
• Related definitions:
1. x satisfies the constraints of hypothesis h when
h(x) = 1
2. h covers a positive training example x if it
correctly classifies x as positive
Version Space

• Version space:
The version space, denoted VSH,D, with respect to
hypothesis space H and training example D, is the
subset of hypotheses from H which are consistent
with the training examples in D.

VSH,D  {h  H | Consistent(h, D)}


List-Then-Eliminate Algorithm

• Algorithm
Version Space  a list containing every
hypothesis in H
For each training example, <x, c(x)>
remove from Version Space any
hypothesis h which h(x)  c(x)
Output the list of hypotheses in Version
Space
List-Then-Eliminate Algorithm
 Guaranteed to output all
hypotheses consistent with the
training data
 Can be applied whenever the
hypothesis space H is finite
 It requires exhaustively
enumerating all hypotheses in
H( not realistic)
Candidate-Elimination

• Candidate-Elimination algorithm outputs the


set of all hypotheses consistent with the
training examples

• Without enumerating all hypotheses


Version Space
This Version Space, containing all 6 hypotheses can be compactly
represented with its most specific (S) and most general (G) sets.
How to generate all h in VS, given G and S?
x1 = <Sunny, Warm, Normal, Strong, Warm, Same>, +
x2 = <Sunny, Warm, High, Strong, Warm, Same>, +
x3 = <Rainy, Cold, High, Strong, Warm, Change>, -
x4 = <Sunny, Warm, High, Strong, Cool, Change>, +

S: <Sunny, Warm, ?, Strong, ?, ?>

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G: <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?>

Example Version Space


Version Space and the Candidate-Elimination(contd)

• The Specific boundary S


• With respect to hypothesis space H and training data D,
S is the set of minimally general (i.e. maximally specific) members of H
consistent with D.
S  {s H|Consistent(s,D)  (¬s’ H)[s >g s’)  Consistent(s’,D)]}
Most specific  maximal elements of VSH,D
 “set of sufficient conditions”

• The General boundary G


• With respect to hypothesis space H and training data D, is the set of
maximally general members of H consistent with D.
G  {g H|Consistent(g,D)  (¬g’ H)[g’ >g g)  Consistent(g’,D)]}
Most general  minimal elements of VSH,D
 “set of necessary conditions”
Version Space and the Candidate-Elimination(contd)
• Version space is the set of hypotheses contained
• in G,
• plus those contained in S,
• plus those that lie between G and S in the partially ordered
hypothesis space.

• Version space representation theorem:


Let X be an arbitrary set of instances and let H be a set of boolean-valued
hypotheses defined over X.
Let c : X  {0, 1} be an arbitrary target concept defined over X,
and let D be an arbitrary set of training examples {<x, c(x)>}.

For all X, H, c, and D such that S and G are well defined,


VSH,D = {h  H | (s  S) (g  G) (g g h g s)
Version Space and the Candidate-Elimination(contd)

• The Candidate Elimination algorithm works on


the same principle as List-then-Eliminate, but
using a more compact representation of the
Version Space
• Version Space is represented by its most general and least general(specific)
members.

• Candidate-Elimination Learning Algorithm:


- Initialize G to the set of maximally general hypotheses in H
- Initialize S to the set of maximally specific hypotheses in H
• G0  {<?, ?, ?, ?, ?, ?>}
• S0  {<  ,, , , , >}...
Candidate-Elimination(5)
• Candidate-Elimination Learning Algorithm (cont.)
- For each training example d, do
• If d is a negative example
– //Specialize G...
– For each hypothesis g in G that is not consistent with d
» Remove g from G
» Add to G all minimal specializations h of g such that
h is consistent with d and some member of S is more specific than h
» Remove from G any hypothesis that is less general than another h in G
– Remove from S any hypothesis inconsistent with d

• If d is a positive example
– //Generalize S...
– For each hypothesis s in S that is not consistent with d
» Remove s from S
» Add to S all minimal generalizations h of s such that
h is consistent with d and some member of G is more general than h
» Remove from S any hypothesis that is more general than another h in S
– Remove from G any hypothesis inconsistent with d
Candidate_Elimination:Example Trace
d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes>
S0 <Ø, Ø, Ø, Ø, Ø, Ø>

d2: <Sunny, Warm, High, Strong, Warm, Same, Yes>

S1 <Sunny, Warm, Normal, Strong, Warm, Same>


d3: <Rainy, Cold, High, Strong, Warm, Change, No>

d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>


2 = S3 <Sunny, Warm, ?, Strong, Warm, Same>

S4 <Sunny, Warm, ?, Strong, ?, ?>

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G4 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> G4: Last element of G3 is inconsistent


with d4, must be removed.

G3 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>

G3: What about <?, ?, Normal, ?, ?, ?> or


G0 = G1= G2 <?, ?, ?, ?, ?, ?> <Cloudy, ?,?, ?, ?, ?>
:they are inconsistent with previous positive
examples that S2 summarizes
Summarised Explanation of CE Algorithm
• // summarizes all past positive examples
Any hypothesis h more general than S is guaranteed to be
consistent with all the previous positive examples
• Let h be a generalization of s in S
– h covers more points than s since it is more general
– In particular, h covers all points covered by s
– Since s is consistent with all + examples, so is h

• //G summarizes all past negative examples


Any hypothesis h more specific than G is guaranteed to be
consistent with all the previous negative examples
• Let h be the specialization of a g in G
– h covers less points than g
– In particular, h rejects all negative examples rejected by s
– Since g is consistent with all - examples, so is h

• The learned version space is independent of the order in which the training
examples are presented
• After all, the VS shows all the consistent hypotheses
• S and G boundary will move closer together with more examples, up to
convergence
Candidate-Elimination(contd..)

• Candidate Elimination works by finding minimal


generalizations and specializations of hypotheses
• It can be applied to any concept learning task for which
these operations are well-defined.

• Given only these two sets (S and G), it is possible to


enumerate all members of the version space as needed.

• The algorithm can be applied to any concept learning task and


hypothesis space for which these operations are well-defined.
Candidate-Elimination

• Candidate-Elimination algorithm outputs the


set of all hypotheses consistent with the
training examples

• Without enumerating all hypotheses


Version Space and the Candidate-Elimination(contd)
• Version space is the set of hypotheses contained
• in G,
• plus those contained in S,
• plus those that lie between G and S in the partially ordered
hypothesis space.

• Version space representation theorem:


Let X be an arbitrary set of instances and let H be a set of boolean-valued
hypotheses defined over X.
Let c : X  {0, 1} be an arbitrary target concept defined over X,
and let D be an arbitrary set of training examples {<x, c(x)>}.

For all X, H, c, and D such that S and G are well defined,


VSH,D = {h  H | (s  S) (g  G) (g g h g s)
Version Space and the Candidate-Elimination(contd)

• The Candidate Elimination algorithm works on the same


principle as List-then-Eliminate, but using a more compact
representation of the Version Space
• Version Space is represented by its most general and
least general(specific) members.
• Candidate-Elimination Learning Algorithm:
- Initialize G to the set of maximally general hypotheses in H
- Initialize S to the set of maximally specific hypotheses in H
• G0  {<?, ?, ?, ?, ?, ?>}
• S0  {<  ,, , , , >}...
Candidate-Elimination(5)
• Candidate-Elimination Learning Algorithm (cont.)
- For each training example d, do
• If d is a positive example
– //Generalize S...(make specific hypothesis more general)(only S is altered)
– For each hypothesis s in S that is not consistent with d
» Remove s from S
» Add to S all minimal generalizations h of s such that
h is consistent with d and some member of G is more general than h
» Remove from S any hypothesis that is more general than another h in S
– Remove from G any hypothesis inconsistent with d
• If d is a negative example
– //Specialize G... (make general hypothesis more specific)(only G is altered)
– For each hypothesis g in G that is not consistent with d
» Remove g from G
» Add to G all minimal specializations h of g such that
h is consistent with d and some member of S is more specific than h
» Remove from G any hypothesis that is less general than another h in G
– Remove from S any hypothesis inconsistent with d
Candidate_Elimination:Example Trace
d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes>
S0 <Ø, Ø, Ø, Ø, Ø, Ø>
d2: <Sunny, Warm, High, Strong, Warm, Same, Yes>

S1 <Sunny, Warm, Normal, Strong, Warm, Same>


d3: <Rainy, Cold, High, Strong, Warm, Change, No>

d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>


S2= S3 <Sunny, Warm, ?, Strong, Warm, Same>

S4 <Sunny, Warm, ?, Strong, ?, ?>

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

G4 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> G4: Last element of G3 is inconsistent with
d4, must be removed.

G3 <Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>

G3: What about <?, ?, Normal, ?, ?, ?> or


Go=G1= G2 <?, ?, ?, ?, ?, ?> <Cloudy, ?,?, ?, ?, ?>
:they are inconsistent with previous positive
examples that S2 summarizes
An Illustrative Example
The boundary sets are first initialized to Go and So, the
most general and most specific hypotheses in H.

S0 , , , , , s

G0 ?, ?, ?, ?, ?, ?
For training example d,

Sunny, Warm, Normal, Strong, Warm, Same 


+

S0 , , , , . 

S1
Sunny, Warm, Normal, Strong, Warm,
Same

G0, G1 ?, ?, ?, ?, ?, ?
For training
example d,
Sunny, Warm, High, Strong, Warm, Same +

S1 Sunny, Warm, Normal, Strong, Warm, Same

S2 Sunny, Warm, ?, Strong, Warm,


Same

G1, G2 ?, ?, ?, ?, ?, ?
For training
example d,
Rainy, Cold, High, Strong, Warm, Change 

S2, S3 Sunny, Warm, ?, Strong,


Warm, Same

G3 Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?, ?, ?, ?, ?,


Same
G2 ?, ?, ?, ?, ?,
?
For training example d,

Sunny, Warm, High, Strong, Cool Change  +

Sunny, Warm, ?, Strong,


S3 Warm, Same

S4 Sunny, Warm, ?,
Strong, ?, ?

G4 Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ?

G3 Sunny, ?, ?, ?, ?, ? ?, Warm, ?, ?, ?, ? ?,


?, ?, ?, ?, Same
The final version space for the EnjoySport concept learning
problem and training examples described earlier.
Candidate-Elimination (C-E) Algorithm

1.Initialise G VS to the most general hypothesis: h ← ‹a1,…,an›, (i) ai = ?.


Initialise S VS to the most specific hypothesis: h ← ‹a1,…,an›, (i) ai = 0.
2. FOR each training instance d  D, do: IF d is a positive example
Remove from G all h that are not consistent with d.
FOR each hypothesis s  S that is not consistent with d, do:
- replace s with all h that are consistent with d, h >g s, h ≥g g 
G,
- remove from S all s being more general than other s in S.
IF d is a negative example
Remove from S all h that are not consistent with d.
FOR each hypothesis g  G that is not consistent with d, do:
- replace g with all h that are consistent with d, g >g h, h >g s 
S,
- remove from G all g being less general than other g in G.

3. Output hypothesis G and S.


Summarised Explanation of CE Algorithm
• // S summarizes all past positive examples
Any hypothesis h more general than S is guaranteed to be
consistent with all the previous positive examples
• Let h be a generalization of s in S
– h covers more points than s since it is more general
– In particular, h covers all points covered by s
– Since s is consistent with all + examples, so is h

• //G summarizes all past negative examples


Any hypothesis h more specific than G is guaranteed to be
consistent with all the previous negative examples
• Let h be the specialization of a g in G
– h covers less points than g
– In particular, h rejects all negative examples rejected by s
– Since g is consistent with all - examples, so is h

• The learned version space is independent of the order in which the


training examples are presented
• After all, the VS shows all the consistent hypotheses
• S and G boundary will move closer together with more examples,
up to convergence
Candidate-Elimination(contd..)

• Candidate Elimination works by finding minimal


generalizations and specializations of hypotheses
• It can be applied to any concept learning task for which
these operations are well-defined.

• Given only these two sets (S and G), it is possible to


enumerate all members of the version space as needed.

• The algorithm can be applied to any concept learning task and


hypothesis space for which these operations are well-defined.
Inductive Bias I: A Biased Hypothesis Space
Database:
Day Sky AirTemp Humidity Wind Water Forecast WaterSport
1 Sunny Warm Normal Strong Cool Change Yes
2 Cloudy Warm Normal Strong Cool Change Yes
3 Rainy Warm Normal Strong Cool Change No
class

• Given our previous choice of the hypothesis space


representation, no hypothesis is consistent with
the above database: we have BIASED the learner to
consider only conjunctive hypotheses
Inductive Bias II: An Unbiased Learner

• In order to solve the problem caused by the bias of the


hypothesis space, we can remove this bias and allow
the hypotheses to represent every possible subset of
instances. The previous database could then be
expressed as: <Sunny, ?,?,?,?,?> v <Cloudy,?,?,?,?,?,?>
• However, such an unbiased learner is not able to
generalize beyond the observed examples!!!! All the
non-observed examples will be well-classified by half
the hypotheses of the version space and misclassified
by the other half.
Inductive Bias III: The Futility of Bias-Free
Learning
• Fundamental Property of Inductive Learning A
learner that makes no a priori assumptions
regarding the identity of the target concept has
no rational basis for classifying any unseen
instances.
• We constantly have recourse to inductive biases
Example: we all know that the sun will rise
tomorrow. Although we cannot deduce that it will
do so based on the fact that it rose today,
yesterday, the day before, etc., we do take this
leap of faith or use this inductive bias, naturally!
Inductive Bias IV: A Definition

• Consider a concept-learning algorithm L for the


set of instances X. Let c be an arbitrary concept
defined over X, and let Dc = {<x,c(x)>} be an
arbitrary set of training examples of c. Let L(xi,Dc)
denote the classification assigned to the instance
xi by L after training on the data Dc. The inductive
bias of L is any minimal set of assertions B such
that for any target concept c and corresponding
training examples Dc
(For all xi in X) [(B ^Dc^xi) |-- L(xi,Dc)]
Ranking Inductive Learners according to their
Biases

Weak
– Rote-Learner: This system simply memorizes
the training data and their classification--- No
generalization is involved.
Bias
Strength – Candidate-Elimination: New instances are
classified only if all the hypotheses in the
version space agree on the classification
– Find-S: New instances are classified using the
most specific hypothesis consistent with the
training data
Strong

You might also like