Unit-III - Chapter7-Learning rule Sets
Unit-III - Chapter7-Learning rule Sets
1. Relative frequency :
nc/n ( n : matched by rule, nc: classified by rule correctly)
2. m-estimate of accuracy :
(nc + mp) / (n + m)
• p : The prior probability that a randomly drawn example will have
classification assigned by the rule (e.g. if 12 out of 100 examples
have the value predicted by the rule, then p=0.12)
• m : Weight ( or # of examples for weighting this prior p )
3. Entropy
Working on the Algorithm:
The algorithm involves a set of ‘ordered rules’ or ‘list of decisions’
to be made.
Step 1 – create an empty learned rule list, ‘R’.
Step 2 – ‘Learn-One-Rule’ Algorithm It extracts the best rule for a
particular class ‘y’, where a rule is defined as:
General Form of Rule
In the beginning,
Step 2.a – if all training examples ∈ class ‘y’, then it’s classified
as positive example.
Step 2.b – else if all training examples ∉ class ‘y’, then it’s
classified as negative example.
Step 3 – The rule becomes ‘desirable’ when it covers a majority of
the positive examples.
Step 4 – When this rule is obtained, delete all the training data
associated with that rule. (i.e. when the rule is applied to the
dataset, it covers most of the training data, and has to be
removed)
Step 5 – The new rule is added to the bottom of learned rule list,
‘R’.
Training examples
Method I :
Form a rule based on these positive samples and remove those from training examples.
Step 1: a) forming rule b) removal of +ve samples covered by rule
Rule R1
STEP2: Rule R2
STEP3 : Rule R3
STEP2:
STEP3 :
STEP4:
where the subscript on each attribute name indicates which of the.two persons is
being described. Now if we were to collect a number of such training examples for
the target concept Daughter1,2 and provide them to a propositional rule learner
such as CN2 or C4.5, the result would be a collection of very specific rules such as
Although it is correct, this rule is so specific that it will rarely, if ever, be useful in classifying
future pairs of people.
The problem is that propositional representations offer no general way to describe
the essential relations among the values of the attributes. In contrast, a program
using first-order representations could learn the following general rule, where x and
y are variables that can be bound to any person.
First-order Horn clauses may also refer to variables in the preconditions that do not
occur in the post conditions. For example, one rule for GrandDaughter might be
Note the variable z in this rule, which refers to the father of y, is not present in the
rule post conditions. Whenever such a variable occurs only in the preconditions, it is
assumed to be existentially quantified; that is, the rule preconditions are satisfied as
long as there exists at least one binding of the variable that satisfies the
corresponding literal.
6.4 LEARNING SETS OF FIRST-ORDER RULES using FOIL Algorithm –
( First-Order Inductive Learner (FOIL) Algorithm)
Consider a program called FOIL that employs an approach very similar to the
SEQUENTIAL-COVERING and LEARN-ONERULE algorithms. In fact, the FOIL
program is the natural extension of these earlier algorithms to first-order
representations. Formally, the hypotheses learned by FOIL are sets of first-order
rules, where each rule is similar to a Horn clause with two exceptions.
• First, the rules learned by FOIL are more restricted than general Horn clauses,
because the literals are not permitted to contain function symbols (this reduces
the complexity of the hypothesis space search).
• Second, FOIL rules are more expressive than Horn clauses, because the literals
appearing in the body of the rule may be negated.
Sequential Covering Vs FOIL
Similarity is both uses the Learns one rule at a time and removes positive
examples from training examples and Difference is FOIL will learn the first order
rule when the target is TRUE .
where L1.. . Ln are literals forming the current rule preconditions and
where P(x1, x2, . . . , xk) is the literal that forms the rule head, or post
conditions. FOIL generates candidate specializations of this rule by
considering new literals Ln+1 that fit one of the following forms:
• Q(vl, . . . , vn), where Q is any predicate name occurring in Predicates and
where the Vi are either new variables or variables already present in the
rule. At least one of the Vi in the created literal must already exist as a
variable in the rule.
• Equal(xj, xk), where xj and xk are variables already present in the rule.
• The negation of either of the above forms of literals.
6.4.2 Guiding the Search in FOIL
To select the most promising literal from the candidates generated at
each step, FOIL considers the performance of the rule over the training
data. In doing this, it considers all possible bindings of each variable in
the current rule. To illustrate this process, consider again the example in
which we seek to learn a set of rules for the target literal
GrandDaughter(x, y). For illustration, assume the training data includes
the following simple set of assertions, where we use the convention
that P(x, y) can be read as "The P of x is y ."
6.4.3 Learning Recursive Rule Sets
Earlier, we ignored the possibility that new literals added to the rule body could
refer to the target predicate itself (i.e., the predicate occurring in the rule head).
However, if we include the target predicate in the input list of Predicates, then
FOIL will consider it as well when generating candidate literals. This will allow it to
form recursive rules - rules that use the same predicate in the body and the head
of the rule.
For instance, recall the following rule set that provides a recursive
definition of the Ancestor relation.
Given an appropriate set of training examples, these two rules can be learned
following a trace similar to the one above for GrandDaughter. Note the second
rule is among the rules that are potentially within reach of FOIL'S search, provided
Ancestor is included in the list Predicates that determines which predicates may
be considered when generating new literals. Of course whether this particular rule
would be learned or not depends on whether these particular literals outscore
competing candidates during FOIL'S greedy search for increasingly specific rules.
FOIL Example :
• Say we are tying to predict the Target-predicate GrandDaughter(x,y).
• FOIL begins with
NewRule = GrandDaughter(x,y) ←
• To specialize it, generate these candidate additions to the preconditions:
Equal(x,y), Female(x), Female(y), Father(x,y), Father(y.x), Father(x,z),
Father(z,x), Father(y,z), Father(z,y) and their negations.
• FOIL might greedily select Father(x,y) as most promising, then
NewRule = GrandDaughter(x,y) ← Father(y,z).
• Foil now considers all the literals from the previous step as well as:
Female(z), Equal(z,x), Equal(z,y), Father(z,w), Father(w,z) and their
negations.
• Foil might select Father(z,x), and on the next step Female(y) leading to
NewRule = GrandDaughter (x,y) ← Father(y,z) ∧ Father(z,x) ∧ Female(y)
• If this covers only positive examples it terminates the search for further
specialization.
• FOIL now removes all positive examples covered by this new rule. If more
are left then the outer loop continues.