0% found this document useful (0 votes)
9 views56 pages

ML UNIT 1-2-57

The document discusses the fundamentals of machine learning, including well-posed learning problems, the role of statistics and computer science, and the concept of learning as a search process. It covers various machine learning tasks such as classification and regression, and provides examples like checkers and handwriting recognition. Additionally, it delves into designing learning systems, choosing training experiences, and the concept learning process with emphasis on hypothesis representation and search strategies.

Uploaded by

mounika07chinta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views56 pages

ML UNIT 1-2-57

The document discusses the fundamentals of machine learning, including well-posed learning problems, the role of statistics and computer science, and the concept of learning as a search process. It covers various machine learning tasks such as classification and regression, and provides examples like checkers and handwriting recognition. Additionally, it delves into designing learning systems, choosing training experiences, and the concept learning process with emphasis on hypothesis representation and search strategies.

Uploaded by

mounika07chinta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

UNIT 1

Concept Learning and General to


Specific Ordering
• Well posed learning problems
• Designing a learning system
• Perspectives and issues in machine learning
• Concept learning task
• Concept learning as search
• Find-S
• Version spaces and candidate elimination algorithm
• Inductive bias

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


What is Machine Learning?
• Optimize a performance criterion using
example data or past experience.
• Role of Statistics: Inference from a sample
• Role of Computer science: Efficient algorithms
to
– Solve the optimization problem
– Representing and evaluating the model for inference

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Machine Learning?
• ML is a branch of artificial intelligence:
– Uses computing based systems to make sense out
of data
• Extracting patterns, fitting data to functions, classifying
data, etc
– ML systems can learn and improve
• With historical data, time and experience
– Bridges theoretical computer science and real
noise data.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


WELL-POSED LEARNING PROBLEMS

• Definition: A computer program is said to learn from


experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E.

• To have a well-defined learning problem, three


features needs to be identified:
– The class of tasks
– The measure of performance to be improved
– The source of experience

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


CLASSIFICATION:
• only two customer attributes,
• income and savings, are taken as input
• and the two classes are low-risk (‘+’) and high-risk (‘−’).
• An example discriminant that separates the two types
of examples is also shown.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


REGRESSION :
• Let us say we want to have a system that can predict the price of a used bike.
• Inputs are the car attributes—brand, year, engine capacity, mileage, and
other information—that we believe affect a bike’s worth.
• The output is the price of the bike.
• Such problems where the output is a number are regression problems.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Examples
• Checkers game: A computer program that learns to play checkers might improve
its performance as measured by its ability to win at the class of tasks involving
playing checkers games, through experience obtained by playing games against
itself.

Fig: Checker game board

• A checkers learning problem:


– Task T: playing checkers
– Performance measure P: percent of games won against opponents
– Training experience E: playing practice games against itself
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
2.A handwriting recognition learning problem:
– Task T: recognizing and classifying handwritten words
within images
– Performance measure P: percent of words correctly
classified
– Training experience E: A database of handwritten words
with given classifications
3.A robot driving learning problem:
– Task T: driving on public four-lane highways using vision
sensors
– Performance measure P: average distance travelled before
an error (as judged by human overseer)
– Training experience E: a sequence of images and steering
commands recorded while observing a human driver

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


DESIGNING A LEARNING SYSTEM
Choosing the TRAINING EXPERIENCE
• Key attribute is training Experience:
• Provides feedback regarding the Choices made by
performance
• Two types:
1.Direct training examples consist individual checkers
board states and the correct move for each.
2.Indirect information consisting the move sequences
and final outcomes. Learner face additional problem in
credit assignment and degree to each move in the
sequence.
• Therefore learning from direct feedback is typically
easier.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Second attribute is degree to which the learner controls the sequence
of training example
• Three types:
1.Learner might rely on teacher to select board state and choose move
2.Learner might itself propose board states and ask teacher for correct
move
3.Learnermayhavecompletecontroloverbothboardstatesandtrainingclass
ification
Third attribute is how well it represent the distribution of examples
over which the final system performance.
• Performance metric is percent of games of the system wins in world
tournament
• Training experience consist only games played against itself.
• Human checker move might not known.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


• The next design choice is to determine exactly
what type of knowledge will be learned and how
this will be used by the performance program.
• Eg: a checkers-playing program that can generate
the legal moves from any board state.
• Program needs only to learn how to choose the
best move from among these legal moves.
• learning task is representative of a large class of
tasks for which the legal moves that define some
large search space are known a priori.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Choosing the TARGET FUNCTION
• Many optimization problems fall into this class,
such as the problems of scheduling and controlling
manufacturing processes where the available
manufacturing steps are well understood, but the
best strategy for sequencing them is not.
• ChooseMove: B M
• the set of legal board states –>B
• the set of legal moves –>M
• target function V and again use the notation
• 𝑉:𝐵→ℝ to denote that 𝑉maps any legal board state
from the set B to some real value.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Choosing the REPRESENTATION for the TARGET FUNCTION
• let us choose a simple representation: for any given
board state, the function c will be calculated as a linear
combination of the following board features:

• x1: the number of black pieces on the board


• x2: the number of red pieces on the board
• x3: the number of black kings on the board
• x4: the number of red kings on the board
• x5: the number of black pieces threatened by red
(i.e., which can be captured on red's next turn)
• x6: the number of red pieces threatened by black

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Choosing the REPRESENTATION for the TARGET
FUNCTION

• Where w0 -w6 are numerical coefficients, or


weights, to be chosen by the learning algorithm.
• Learned values for the weights w1 through w6 will
determine the relative importance of the various
board features in determining the value of the
board, where as the weight w0 will provide an
additive constant to board value.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
• The FINAL DESIGN

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Issues in Machine Learning
• What algorithms exist for learning general target functions from specific
training examples?
• In what settings will particular algorithms converge to the desired function,
given sufficient training data?
• Which algorithms perform best for which types of problems and
representations?
• How much training data is sufficient?
• When and how can prior knowledge held by the learner guide the process of
generalizing from examples?
• Can prior knowledge be helpful even when it is only approximately correct?
• What is the best strategy for choosing a useful next training experience, and
how does the choice of this strategy alter the complexity of the learning
problem?
• What is the best way to reduce the learning task to one or more function
approximation problems?
• Put another way, what specific functions should the system attempt to
learn? Can this process itself be automated?
• How can the learner automatically alter its representation to improve its
ability to represent and learn the target function?

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


What is a concept?
• A concept is a subset of objects or events defined over a
larger set.
• Ex: A set of everything (i.e. all objects) as the set of things.
• Animals are the subset of things and birds are a subset of
animals. Animals

Cars
Things

Birds

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


• In more technical terms , a concept is a
Boolean-valued function defined over this
larger set.
• Given a set of examples labeled as members or
non-members of a concept, concept –learning
consists of automatically inferring the general
definition of this concept.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Example of concept learning task
• Concept: Good days for watersport ( Values: Yes, No)
• Attributes/Features:
Sky (Values : Sunny, Cloudy, Rainy)
AirTemp (Values : Warm, Cold)
Humidity (Values: Normal, High)
Water (Warm, Cool)
Forecast (Values: same, Change)
• Example of a Training Point:
< Sunny, Warm, High, Strong, Warm, Same, Yes >

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


CONCEPT LEARNING

• Consider the example task of learning the


target concept "Days on which Aldo enjoys his
favorite water sport”

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


• Chosen Hypothesis Representation:
– Let’s consider a simple representation in which each
hypothesis consists of a conjunction of constraints on the
instance attributes.
– Let each hypothesis be a vector of six constraints,
specifying the values of the six attributes Sky, Air Temp,
Humidity, Wind, Water, and Forecast.
• Conjunction of constraints on each attribute
where
Indicate by a "?' that any value is acceptable for this
attribute,
 Specify a single required value (e.g., Warm) for the
attribute, or
Indicate by a "Φ" that no value is acceptable
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
• Example of a hypothesis:
( If the air temp is cold and Humidity high then it is a good
day for water sport)
(?, Cold, High, ?, ?, ?)

• Goal: To infer the “best” concept-description from the


set of all possible hypotheses.
• Most general hypothesis-that every day is a good day
for water sport
(?, ?, ?, ?, ?, ?)

• Most specific hypothesis-that no day is a good day for


water sport
(Φ, Φ, Φ, Φ, Φ, Φ)
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
• The set of items over which the concept is defined
is called the set of instances, which is denoted by
X.

• Example: X is the set of all possible days, each


represented by the attributes: Sky, AirTemp,
Humidity, Wind, Water, and Forecast
• The concept or function to be learned is called the
target concept, which is denoted by c.
• c can be any Boolean valued function defined over
the instances X.
• c: X→ {O, 1}

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Example: The target concept corresponds to the
value of the attribute EnjoySport
• (i.e., c(x) = 1 if EnjoySport = Yes,
c(x) = 0 if EnjoySport = No).
• Instances for which c(x) = 1 are called positive
examples, or members of the target concept.
• Instances for which c(x) = 0 are called negative
examples, or non-members of the target
concept.
• The ordered pair (x, c(x)) to describe the training
example consisting of the instance x and its
target concept value c(x).
• D to denote the set of available training
examples
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
• The symbol H to denote the set of all possible
hypotheses that the learner may consider
regarding the identity of the target concept.
• Each hypothesis h in H represents a Boolean-
valued function defined over X
h: X→{O, 1}
• The goal of the learner is to find a hypothesis h
such that h(x) = c(x) for all x in X.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


• Given:
– Instances X: Possible days, each described by the
attributes
• Sky (with possible values Sunny, Cloudy, and Rainy),
• AirTemp (with values Warm and Cold),
• Humidity (with values Normal and High),
• Wind (with values Strong and Weak),
• Water (with values Warm and Cool),
• Forecast (with values Same and Change).

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Hypotheses H: Each hypothesis is described by a
conjunction of constraints on the attributes Sky, AirTemp,
Humidity, Wind, Water, and Forecast.
• The constraints may be "?" (any value is acceptable),
“Φ” (no value is acceptable), or a specific value.

– Target concept c: EnjoySport : X → {0, l}


– Training examples D: Positive and negative examples of
the target function

• Determine:
– A hypothesis h in H such that h(x) = c(x) for all x in X.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Concept learning as a search

• Concept learning can be viewed as the task of


searching through a large space of hypotheses
implicitly defined by the hypothesis
representation.
• Selecting a Hypothesis representation is an
important step since it restricts the space that
can be searched.
• Ex: The Hypotheses( If the air temp is cold and
Humidity high then it is a good day for water
sport) cannot be expressed in our chosen
representation.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


Concept learning as a search
Search
• Find a hypothesis that best fits training examples
• Efficient search in hypothesis space (finite/infinite)
Search space in EnjoySport
• 3*2*2*2*2*2 = 96 distinct instances (eg. Sky={Sunny,
Cloudy, Rainy}
• 5*4*4*4*4*4 = 5120 syntactically distinct hypotheses
within H (considering 0 and ? in addition)
• 1+4*3*3*3*3*3 = 973 semantically distinct hypotheses
(count just one 0 for each attribute since every hypothesis
having one or more 0 symbols is empty)
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Find S Algorithm: Finding a Maximally
Specific Hypothesis
• This algorithm considers only positive examples .

General-to-Specific Ordering of Hypotheses

Consider the two hypotheses


h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
• Consider the sets of instances that are classified positive by hl
and by h2.
• h2 imposes fewer constraints on the instance, it classifies more
instances as positive.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


• So, any instance classified positive by hl will also be
classified positive by h2.
• Therefore, h2 is more general than hl.

Given hypotheses hj and hk, hj is more-general-than or- equal


do hk if and only if any instance that satisfies hk also satisfies hi.

Definition: Let hj and hk be Boolean-valued functions defined


over X. Then hj is more general- than-or-equal-to hk (written
hj ≥ hk ) if and only if

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
 This algorithm may not be solo hypothesis that fits the complete data
 In order to overcome this, we have candidate elimination algorithm

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
The key property of the FIND-S algorithm
• FIND-S is guaranteed to output the most specific hypothesis
within H that is consistent with the positive training
examples
• FIND-S algorithm’s final hypothesis will also be consistent
with the negative examples provided the correct target
concept is contained in H, and provided the training
examples are correct.

Unanswered by FIND-S

1. Has the learner converged to the correct target concept?


2. Why prefer the most specific hypothesis?
3. Are the training examples consistent?
4. What if there are several maximally specific consistent
hypotheses?
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Version Space (VS):
The set of all valid hypotheses provided by an algorithm
is called Version Space (VS) with respect to the hypothesis
space hand the given example set D.
Representation

Definition:
 consistent- A hypothesis h is consistent with a set of training
examples D if and only if h(x) = c(x) for each example (x, c(x))
in D.

Note difference between definitions of consistent and satisfies


 An example x is said to satisfy hypothesis h when
h(x) = 1, regardless of whether x is a positive or negative
example of the target concept.
 An example x is said to consistent with hypothesis h iff
h(x) = c(x) B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Definition:
• version space- The version space, denoted VS with
respect to hypothesis space H, D
H and training examples D, is the subset of hypotheses from
H consistent with the training examples in D

.
The LIST-THEN-ELIMINATION algorithm

• The LIST-THEN-ELIMINATE algorithm first initializes the


version space to contain all hypotheses in H and then
eliminates any hypothesis found inconsistent with any
training example.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


List –Then –Eliminate Algorithm on order to
obtain Version Space:
1. VersionSpace c a list containing every hypothesis in H
2. For each training example, (x, c(x)) remove from VersionSpace
any hypothesis h for which h(x) ≠ c(x)
3. Output the list of hypotheses in VersionSpace

Step 2: from this step, we keep on removing inconsistent hypothesis from version
space

• In principle, List –then –Elimination algorithm is applied whenever the


hypotheses space H is finite.
• It is guaranteed, that all the output will be consistent with the training
data.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
A More Compact Representation for Version Spaces
• The version space is represented by its most general and least
general members.
• These members form general and specific boundary sets that
delimit the version space within the partially ordered
hypothesis space.

Definition: The general boundary G, with respect to hypothesis


space H and training data D, is the set of maximally general
members of H consistent with D.

Definition: The specific boundary S, with respect to hypothesis


space H and training data D, is the set of minimally general (i.e.,
maximally specific) members of H consistent with D.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
CANDIDATE ELIMINATION ALGORITHM
• The key idea in the CANDIDATE-ELIMINATION algorithm is to output a
description of the set of all hypotheses consistent with the training
examples
• Uses the concept of Version Space.
• Considers both +ve and –ve Values (Yes/No).
• Both Specific and General Hypotheses.
• For +ve samples, move from specific to general hypotheses.
• For -ve samples, move from general to specific hypotheses..
• Candidate Elimination Algorithm works on the same principles the
above LIST-THEN-ELIMINATION algorithm.
• It is a compact representation of the version space.
• In this the version space is represented by its most general and least
general members (Specific).
• These members form general and specific boundary sets that delimit
the version space within the partially ordered hypothesis space.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Initializing the G boundary set to contain the most general
hypothesis in H.
Initializing the S boundary set to contain the most specific
(least general) hypothesis.

 When the first training example is presented, the


CANDIDATE-ELIMINTION algorithm checks the S boundary
and finds that it is overly specific and it fails to cover the
positive example.
• The boundary is therefore revised by moving it to the least
more general hypothesis that covers this new example
• No update of the G boundary is needed in response to this
training example because Go correctly covers this example
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
• Consider the third training example. This negative
example reveals that the G boundary of the version
space is overly general, that is, the hypothesis in G
incorrectly predicts that this new example is a positive
example.
• The hypothesis in the G boundary must therefore be
specialized until it correctly classifies this new negative
example
Given that there are six attributes that could be specified to
specialize G2, why are there only three new hypotheses in G3?
• For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a
minimal specialization of G2 that correctly labels the new
example as a negative example, but it is not included in G3.
• The reason this hypothesis is excluded is that it is
inconsistent with the previously encountered positive
examples
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
Consider the fourth training example.
• This positive example further generalizes the S boundary of
the version space.
• It also results in removing one member of the G boundary,
because this member fails to cover the new positive example.

After processing these four examples, the boundary


sets S4 and G4 delimit the version space of all hypotheses
consistent with the set of incrementally observed training
examples.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


INDUCTIVE BIAS
Fundamental questions raised here
(remarks on CEA and VS )

1. Will CE algorithm gives correct hypothesis? (Specific and General


hypothesis)
2. What training sample should the learner request next?
(depends on type of task we have)

• Inductive learning: from examples we derive rules.


• Deductive learning: Already existing rules are applied to our
example.
• Biased hypothesis space:
Does not consists all types of training examples
Solution include all hypothesis
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
INDUCTIVE BIAS
The fundamental questions for inductive bias

• What if the target concept is not contained in the hypothesis


space?
• Can we avoid this difficulty by using a hypothesis space that
includes every possible hypothesis?
• How does the size of this hypothesis space influence the ability of
the algorithm to generalize to unobserved instances?
• How does the size of the hypothesis space influence the number of
training examples that must be observed?

These fundamental questions are examined in the context of the CANDIDATE-


ELIMINTION algorithm

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


A Biased Hypothesis Space
• Suppose the target concept is not contained in the hypothesis space H, then obvious
solution is to enrich the hypothesis space to include every possible hypothesis.
• Consider the EnjoySport example in which the hypothesis space is restricted to include only
conjunctions of attribute values.
<Sunny ^ Warm ^ Normal ^ Strong ^ Cool ^ Change> Y

• Because of this restriction, the hypothesis space is unable to represent even simple
disjunctive target concepts such as
"Sky = Sunny or Sky = Cloudy."
The following three training examples of disjunctive hypothesis, the algorithm would find
that there are zero hypotheses in the version space

<Sunny Warm Normal Strong Cool Change> Y


<Cloudy Warm Normal Strong Cool Change> Y
<Rainy Warm Normal Strong Cool Change> N

If Candidate Elimination algorithm is applied,


After first two training example
S= <? Warm Normal Strong Cool Change>

This new hypothesis is overly general and it incorrectly covers the third negative training
example! So H does not include the appropriate c.
In this case, aB.Ashreetha, Asst Prof, Dept
more expressive of ECE, SVEC. space is required.
hypothesis
An Unbiased Learner
• The solution to the problem of assuring that the target concept is in the
hypothesis space H is to provide a hypothesis space capable of representing
every teachable concept that is representing every possible subset of the
instances X.
• simply, providing a hypothesis capable of representing all the examples.

• The set of all subsets of a set X is called the power set of X


• In the EnjoySport learning task the size of the instance space X of days
described by the six attributes is 96 instances.

Search space in EnjoySport


 3*2*2*2*2*2 = 96 distinct instances (eg. Sky={Sunny,
Cloudy, Rainy}
 5*4*4*4*4*4 = 5120 syntactically distinct hypotheses
within H (considering 0 and ? in addition)
 1+(4*3*3*3*3*3) = 973 semantically distinct hypotheses
(Null –taken as common)
B.Ashreetha, Asst Prof, Dept of ECE, SVEC.
• Possible instances =96
• Target concepts = 296 (huge & practically not possible)
Idea of Inductive bias:
Leaner generalizes beyond the observed training examples to
infer with new examples.
> ---- inductively inferred from
X>Y ------ Y is Inductively inferred from X.
X- predefined (from X, Y is learning)

Example:???

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


The below figure explains
• Modelling inductive systems by equivalent deductive
systems.
• The input-output behavior of the CANDIDATE-ELIMINATION
algorithm using a hypothesis space H is identical to that of a
deductive theorem prover utilizing the assertion "H contains
the target concept."
• This assertion is therefore called the inductive bias of the
CANDIDATE-ELIMINATION algorithm.
• Characterizing inductive systems by their inductive bias allows
modelling them by their equivalent deductive systems.
• This provides a way to compare inductive systems according
to their policies for generalizing beyond the observed training
data.

B.Ashreetha, Asst Prof, Dept of ECE, SVEC.


B.Ashreetha, Asst Prof, Dept of ECE, SVEC.

You might also like