0% found this document useful (0 votes)
4 views

ML unit -I part II

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

ML unit -I part II

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Department of CSE MRCET

Issues
Our in Machine
checkers Learning
example raises a number of generic questions about machine
learning. The field of machine learning, and much of this book, is concerned
with answering questions such as the following:

• What algorithms exist for learning general target functions from specific
training examples? In what settings will particular algorithms converge to the
desired function, given sufficient training data? Which algorithms perform
best for which types of problems and representations?
• How much training data is sufficient? What general bounds can be found to
relate the confidence in learned hypotheses to the amount of training
experience and the character of the learner's hypothesis space?
• When and how can prior knowledge held by the learner guide the process of
generalizing from examples? Can prior knowledge be helpful even when it is
only approximately correct?
• What is the best strategy for choosing a useful next training experience, and
how does the choice of this strategy alter the complexity of the learning
problem?
• What is the best way to reduce the learning task to one or more function
approximation problems? Put another way, what specific functions should
the system attempt to learn? Can this process itself be automated?
• How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?

CONCEPT LEARNING:

• Inducing general functions from specific training examples is a main issue of


machine learning.
• Concept Learning: Acquiring the definition of a general category from
given sample positive and negative training examples of the category.
• Concept Learning can see as a problem of searching through a predefined
space of potential hypotheses for the hypothesis that best fits the training
examples.
• The hypothesis space has a general-to-specific ordering of hypotheses, and
the search can be efficiently organized by taking advantage of a naturally
occurring structure over the hypothesis space.
A Formal Definition for Concept Learning:
13
Department of CSE MRCET

Inferring a Boolean-valued function from training examples of its input and


output.
• An example for concept-learning is the learning of bird-concept from the
given examples of birds (positive examples) and non-birds (negative
examples).

• We are trying to learn the definition of a concept from given examples.


A Concept Learning Task: Enjoy Sport Training Examples

A set of example days, and each is described by six attributes. The task is to
learn to predict the value of Enjoy Sport for arbitrary day, based on the
values of its attribute values.

Concept Learning as Search:


• Concept learning can be viewed as the task of searching through a large
space of hypotheses implicitly defined by the hypothesis representation.
• The goal of this search is to find the hypothesis that best fits the training
examples.
• By selecting a hypothesis representation, the designer of the learning
algorithm implicitly defines the space of all hypotheses that the program can
ever represent and therefore can ever learn.

14
Department of CSE MRCET

FIND-S:
• FIND-S Algorithm starts from the most specific hypothesis and generalize it
by considering only positive examples.
• FIND-S algorithm ignores negative example
: As long as the hypothesis space contains a hypothesis that describes the
true target concept, and the training data contains no errors, ignoring
negative examples does not cause to any problem.
• FIND-S algorithm finds the most specific hypothesis within H that is
consistent with the positive training examples. – The final hypothesis will
also be consistent with negative examples if the correct target concept is in
H, and the training examples are correct.
FIND-S Algorithm:
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x For each attribute
constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
3. Else replace a, in h by the next more general constraint that is satisfied by
x 4. Output hypothesis h
FIND-S Algorithm – Example:
Important-Representation:

1. ? indicates that any value is acceptable for the attribute.


2. specify a single required value (e.g., Cold) for the attribute.
3. Φ indicates that no value is acceptable.
4. The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}
5. The most specific hypothesis is represented by: {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Steps Involved in Find-S:


1. Start with the most specific hypothesis. h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

15
Department of CSE MRCET

2. Take the next example and if it is negative, then no changes occur to the
hypothesis.
3. If the example is positive and we find that our initial hypothesis is too
specific then we update our current hypothesis to a general condition.
4. Keep repeating the above steps till all the training examples are complete.
5. After we have completed all the training examples we will have the final
hypothesis when can use to classify the new examples. Example: Consider
the following data set having the data about which particular seeds are
poisonous.

First, we consider the hypothesis to be a more specific hypothesis. Hence,


our hypothesis would be: h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

Consider example 1:
The data in example 1 is {GREEN, HARD, NO, WRINKLED}. We see that
our initial hypothesis is more specific and we have to generalize it for this
example.
Hence, the hypothesis becomes:
h = {GREEN, HARD, NO, WRINKLED}
Consider example 2:
16
Department of CSE MRCET

Here we see that this example has a negative outcome. Hence we neglect
this example and our hypothesis remains the same. h = {GREEN,
HARD, NO, WRINKLED}
Consider example 3:
Here we see that this example has a negative outcome. hence we neglect
this example and our hypothesis remains the same. h = {GREEN,
HARD, NO, WRINKLED}
Consider example 4:
The data present in example 4 is {ORANGE, HARD, NO, WRINKLED}.
We
compare every single attribute with the initial data and if any mismatch is
found we replace that particular attribute with a general case (“ ?”). After
doing the process the hypothesis becomes: h = {?, HARD, NO,
WRINKLED }
Consider example 5:
The data present in example 5 is {GREEN, SOFT, YES, SMOOTH}. We
compare every single attribute with the initial data and if any mismatch is
found we replace that particular attribute with a general case ( “?” ). After
doing the process the hypothesis becomes:
h = {?, ?, ?, ? }
Since we have reached a point where all the attributes in our hypothesis
have the general condition, example 6 and example 7 would result in the
same hypothesizes with all general attributes. h = {?, ?, ?, ? }
Hence, for the given data the final hypothesis would be:
Final Hypothesis: h = { ?, ?, ?, ? }.

Version Spaces
Definition(Version space). A concept is complete if it covers all positive
examples.
A concept is consistent if it covers none of the negative examples. The
version space is the set of all complete and consistent concepts. This set is
convex and is fully defined by its least and most general elements.

Candidate-Elimination Learning Algorithm

17
Department of CSE MRCET

The CANDIDATE-ELIMINTION algorithm computes the version space


containing all hypotheses from H that are consistent with an observed
sequence of training examples.
Initialize G to the set of maximally general hypotheses in H Initialize S to
the set of maximally specific hypotheses in H For each training example d,
do

• If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
• Remove s from S • Add to S all minimal generalizations h of s such that h is
consistent with d, and some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis
in S
• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G 18\
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis
in G.
CANDIDATE- ELIMINTION algorithm using version spaces An
Illustrative Example:

18
Department of CSE MRCET

CANDIDATE-ELIMINTION algorithm begins by initializing the version


space to the set of all hypotheses in H;
boundary set to contain the most general hypothesis in H, G0 ?, ?, ?, ?, ?,
When the first training example is presented, the
CANDIDATEELIMINTION algorithm checks the S boundary and finds that
it is overly specific and it fails to cover the positive example.

• The boundary is therefore revised by moving it to the least more general


hypothesis that covers this new example.
• No update of the G boundary is needed in response to this training example
because Go correctly covers this example.

• When the second training example is observed, it has a similar effect of


generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 =G0

19
Department of CSE MRCET

• Consider the third training example. This negative example reveals that the
boundary of the version space is overly general, that is, the hypothesis in G
incorrectly predicts that this new example is a positive example.
• The hypothesis in the G boundary must therefore be specialized until it
correctly classifies this new negative example.

Given that there are six attributes that could be specified to specialize G2,
why are there only three new hypotheses in G3?

For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal


specialization of G2 that correctly labels the new example as a negative
example, but it is not included in G3. The reason this hypothesis is excluded
is that it is inconsistent with the previously encountered positive examples.
Consider the fourth training example.

20
Department of CSE MRCET

• This positive example further generalizes the S boundary of the version


space. It also results in removing one member of the G boundary, because
this member fails to cover the new positive example After processing these
four examples, the boundary sets S4 and G4 delimit the version space of all
hypotheses consistent with the set of incrementally observed training
examples.
• After processing these four examples, the boundary sets S4 and G4 delimit
the version space of all hypotheses consistent with the set of incrementally
observed training examples.

Inductive bias:

21

You might also like