0% found this document useful (0 votes)
22 views

Rule_engine-decision_trees-JP

Rule-based systems utilize a set of assertions and rules to create expert systems that mimic expert decision-making. These systems can be implemented using forward-chaining or backward-chaining methods, depending on the problem's requirements, and can be optimized using techniques like the Rete algorithm. Despite claims of obsolescence, rule-based systems remain prevalent in various fields, particularly in medicine.

Uploaded by

buildbooster468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Rule_engine-decision_trees-JP

Rule-based systems utilize a set of assertions and rules to create expert systems that mimic expert decision-making. These systems can be implemented using forward-chaining or backward-chaining methods, depending on the problem's requirements, and can be optimized using techniques like the Rete algorithm. Despite claims of obsolescence, rule-based systems remain prevalent in various fields, particularly in medicine.

Uploaded by

buildbooster468
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Introduction to Rule-Based Systems

Using a set of assertions, which collectively form the ‘working memory’, and a set of
rules that specify how to act on the assertion set, a rule-based system can be created.
Rule-based systems are fairly simplistic, consisting of little more than a set of if-then
statements, but provide the basis for so-called “expert systems” which are widely
used in many fields. The concept of an expert system is this: the knowledge of an
expert is encoded into the rule set. When exposed to the same data, the expert
system AI will perform in a similar manner to the expert.
Rule-based systems are a relatively simple model that can be adapted to any number
of problems. As with any AI, a rule-based system has its strengths as well as
limitations that must be considered before deciding if it’s the right technique to use for
a given problem. Overall, rule-based systems are really only feasible for problems for
which any and all knowledge in the problem area can be written in the form of if-then
rules and for which this problem area is not large. If there are too many rules, the
system can become difficult to maintain and can suffer a performance hit.
To create a rule-based system for a given problem, you must have (or create) the
following:

1. A set of facts to represent the initial working memory. This should be anything
relevant to the beginning state of the system.
2. A set of rules. This should encompass any and all actions that should be taken
within the scope of a problem, but nothing irrelevant. The number of rules in
the system can affect its performance, so you don’t want any that aren’t
needed.
3. A condition that determines that a solution has been found or that none exists.
This is necessary to terminate some rule-based systems that find themselves
in infinite loops otherwise.

Theory of Rule-Based Systems

The rule-based system itself uses a simple technique: It starts with a rule-base,
which contains all of the appropriate knowledge encoded into If-Then rules, and a
working memory, which may or may not initially contain any data, assertions or
initially known information. The system examines all the rule conditions (IF) and
determines a subset, the conflict set, of the rules whose conditions are satisfied based
on the working memory. Of this conflict set, one of those rules is triggered (fired).
Which one is chosen is based on a conflict resolution strategy. When the rule is fired,
any actions specified in its THEN clause are carried out. These actions can modify the
working memory, the rule-base itself, or do just about anything else the system
programmer decides to include. This loop of firing rules and performing actions
continues until one of two conditions are met: there are no more rules whose
conditions are satisfied or a rule is fired whose action specifies the program should
terminate.
Which rule is chosen to fire is a function of the conflict resolution strategy. Which
strategy is chosen can be determined by the problem or it may be a matter of
preference. In any case, it is vital as it controls which of the applicable rules are fired
and thus how the entire system behaves. There are several different strategies, but
here are a few of the most common:

• First Applicable: If the rules are in a specified order, firing the first applicable
one allows control over the order in which rules fire. This is the simplest
strategy and has a potential for a large problem: that of an infinite loop on the
same rule. If the working memory remains the same, as does the rule-base,
then the conditions of the first rule have not changed and it will fire again and
again. To solve this, it is a common practice to suspend a fired rule and prevent
it from re-firing until the data in working memory, that satisfied the rule’s
conditions, has changed.
• Random: Though it doesn’t provide the predictability or control of the
first-applicable strategy, it does have its advantages. For one thing, its
unpredictability is an advantage in some circumstances (such as games for
example). A random strategy simply chooses a single random rule to fire from
the conflict set. Another possibility for a random strategy is a fuzzy rule-based
system in which each of the rules has a probability such that some rules are
more likely to fire than others.
• Most Specific: This strategy is based on the number of conditions of the rules.
From the conflict set, the rule with the most conditions is chosen. This is based
on the assumption that if it has the most conditions then it has the most
relevance to the existing data.
• Least Recently Used: Each of the rules is accompanied by a time or step
stamp, which marks the last time it was used. This maximizes the number of
individual rules that are fired at least once. If all rules are needed for the
solution of a given problem, this is a perfect strategy.
• "Best" rule: For this to work, each rule is given a ‘weight,’ which specifies how
much it should be considered over the alternatives. The rule with the most
preferable outcomes is chosen based on this weight.

Methods of Rule-Based Systems

Forward-Chaining

Rule-based systems, as defined above, are adaptable to a variety of problems. In


some problems, information is provided with the rules and the AI follows them to see
where they lead. An example of this is a medical diagnosis in which the problem is to
diagnose the underlying disease based on a set of symptoms (the working memory).
A problem of this nature is solved using a forward-chaining, data-driven, system that
compares data in the working memory against the conditions (IF parts) of the rules
and determines which rules to fire.

For an example of forward-chaining, see Appendix A.

Backward-Chaining

In other problems, a goal is specified and the AI must find a way to achieve that
specified goal. For example, if there is an epidemic of a certain disease, this AI could
presume a given individual had the disease and attempt to determine if its diagnosis
is correct based on available information. A backward-chaining, goal-driven, system
accomplishes this. To do this, the system looks for the action in the THEN clause of the
rules that matches the specified goal. In other words, it looks for the rules that can
produce this goal. If a rule is found and fired, it takes each of that rule’s conditions as
goals and continues until either the available data satisfies all of the goals or there are
no more rules that match.

For an example of backward-chaining, see Appendix B.

Which method to use?

Of the two methods available, forward- or backward-chaining, the one to use is


determined by the problem itself. A comparison of conditions to actions in the rule
base can help determine which chaining method is preferred. If the ‘average’ rule has
more conditions than conclusions, that is the typical hypothesis or goal (the
conclusions) can lead to many more questions (the conditions), forward-chaining is
favored. If the opposite holds true and the average rule has more conclusions than
conditions such that each fact may fan out into a large number of new facts or actions,
backward-chaining is ideal.
If neither is dominant, the number of facts in the working memory may help the
decision. If all (relevant) facts are already known, and the purpose of the system is to
find where that information leads, forward-chaining should be selected. If, on the
other hand, few or no facts are known and the goal is to find if one of many possible
conclusions is true, use backward-chaining.

Improving Efficiency of Forward Chaining

Forward-chaining systems, as powerful as they can be if well designed, can become


cumbersome if the problem is too large. As the rule-base and working memory grow,
the brute-force method of checking every rule condition against every assertion in the
working memory can become quite computationally expensive. Specifically, the
computational complexity if the order of RA^C, where R is the number of rules, C is
the approximate number of conditions per rule, and A is the number of assertions in
working memory. With this exponential complexity, for a rule-base with any real rules,
the system will perform quite slowly.
There are ways to reduce this complexity, thus making a system of this nature far
more feasible for use with real problems. The most effective such solution to this is the
Rete algorithm. The Rete algorithm reduces the complexity by reducing the number of
comparisons between rule conditions and assertions in the working memory. To
accomplish this, the algorithm stores a list of rules matched or partially matched by
the current working memory. Thus, it avoids unnecessary computations in
re-checking the already matched rules (they are already activated) or un-matched
rules (their conditions cannot be satisfied under the existing assertions in the working
memory). Only when the working memory changes does it re-check the rules, and
then only against the assertions added or removed from working memory. All told,
this method drops the complexity to O(RAC), linear rather than exponential.
The Rete algorithm, however, requires additional memory to store the state of the
system from cycle to cycle. The additional memory can be considerable, but may be
justified for the increased speed efficiency. For large problems in which speed is a
factor, the Rete method is justified. For small problems, or those in which speed is not
an issue but memory is, the Rete method may not be the best option. Another
unfortunate shortcoming of the Rete method is that it only works with
forward-chaining

Building Rule-Based Systems with Identification Trees

Semantic Network

A semantic network is the most basic structure upon which an identification tree
(hereafter referred to as an ID tree) is based. Simply put, a semantic network consists
of nodes representing objects and links representing any relations between these
objects.

In this sample semantic network, Zoom is a feline; a feline is a mammal; a mammal


is an animal. Zoom chases a mouse; a mouse is a mammal; a mammal is an animal.
Zoom eats fish; fish is an animal. The relations are written on the lines: is a, is an, eats,
chases. The nodes (circles) are ‘objects’.

Semantic Tree

At the next level of complexity exists a semantic tree, which is simply a semantic
network with a few additional conditions and terms. Each node has a parent to which
it is linked (with the exception of the root node which is it’s own parent and which
needs no link). Each link connects the parent node with any and all children nodes. A
single parent node may have multiple children, but no children may have multiple
parents. Nodes with no children are the leaf nodes. The difference between a tree and
a network is this: a network can have loops, a tree cannot.
The root node is marked as such. It is parent to itself, A and B. A is child to the root
and parent to C. B is child to the root and parent to D and E. C is a child to A and has
no children of its own, making it a leaf node. D is parent to F, which is parent to leaf
nodes I and J. E, is parent to leaf nodes G and H.

Decision Tree

Above semantic trees comes a decision tree. Each node of a decision tree is linked to
a set of possible solutions. Each parent node, that is each node that is not a leaf (and
thus has children) is associated with a test, which splits the set of possible answers
into subsets representing every possibility of the test’s outcomes.
Each non-leaf node serves as a test to lead to one of the leaf outcomes.

Identification Trees

Last, but not least, an ID tree is a decision tree in which all possible divisions is
created by training the tree against a list of known data. The purpose of an ID tree is
to take a set of sample data, classify the data and construct a series of test to classify
an unknown object based on like properties.

Training ID Trees

First, the tree must be created and trained. It must be provided with sufficient labeled
samples that are used to create the tree itself. It does this by dividing the samples into
subsets based on features. The sets of samples at the leaves of the tree define a
classification. The tree is created based on Occam’s Razor, which (modified for ID
trees) states that the simplest (smallest) tree, that is consistent with the training
samples, is the best predictor. To find the smallest tree, one could find every possible
tree given the data set then examine each one and choose the smallest. However, this
is expensive and wasteful. The solution to this, therefore, is to greedily create one
small tree:

At each node, pick a test such that branches are close to


same classification
Split into subsets with the least disorder
Find which of these tests minimizes the disorder

Then:

Until each leaf node contains a set that is homogenous or


is near homogenous
Select a leaf node that is non-homogenous
Split this set into two or more homogenous subsets to
minimize disorder

Since the goal of an ID tree is to generate homogenous subsets, we want to calculate


how non-homogenous the subsets each test creates. The test that minimizes the
disorder is the one that divides the samples into the cleanest categories. Disorder is
calculated as follows:

Average disorder = Σb (nb/nt) * (Σc (nbc/nb)log2(nbc/nb))


Where:
nb is the number of samples in branch ‘b’
nt is the total number of samples in all branches
nbc is the total of samples in branch b of class c

For an example of training an ID tree, see Appendix C.

ID Trees to Rules

Once an ID tree is constructed successfully, it can be used to generate a rule-set,


which will serve to perform the necessary classifications of the ID tree. This is done by
creating a single rule for each path from the root to a leaf in the ID tree.
For an example of this, see Appendix D.

Pruning Unnecessary Conditions

If there are conditions of that rule that are inconsequential to the outcome, discard
them thus simplifying the rule (and thus improving efficiency). This is accomplished
by proving that the outcome is independent of the given condition. Events A and B are
independent if the probability of event B does not change given that event A occurs.
Using Bayes Rule:
P(B | A) = P(B)
This states that the probability of event B given that event A occurs is equal to the
probability that event B occurs by itself. If this holds true, then event A does not effect
whether or not event B occurs. If A is a condition and B is a result, then A can be
discarded without affecting the rule.
For an example of this, see Appendix E.

Pruning Unnecessary Rules

If two or more rules share the same end result, you may be able to replace them with
a rule that fires in the event that no other rule is fired:

if (no other rule fires)


then (execute these common actions)

If there is more than one such group of rules, replace only one group. Which one is
determined by some heuristic tiebreaker. Two such tiebreakers follow:
• Replace the larger of the two groups. If group A has six rules which share a
common result and group B only has five, replace the larger group A with will
eliminate more rules and simplify the rule base the most.
• Replace the group with the highest average number of rule conditions. While
more rules may remain, the rules that remain will be more simple as they have
fewer conditions. For example, given the rules:

• if (x) and (y) and (z) then (A)


• if (m) and (n) and (o) then (A)

• vs.

• if (p) then (Z)
• if (q) then (Z)

You would want to replace the first set with:

if (no other rule fires)


then (A)

For an example of this, see Appendix F.


With enough training data, an ID tree can be created which, in turn, can be used to
create a rule-base for classification. From then on, using forward-chaining, a new
entity can be introduced as an assertion in the knowledge base and it can be classified
as if by the ID tree. Using backward-chaining, one could use it to find evidence to
support that a given classification is valid

Conclusion

I have heard a few people, including some of my classmates, say that rule-based and
expert systems are obsolete; and that ID trees are a thing of the past. Granted, this
is not the direction that most research is moving, but that doesn’t negate the existing
accomplishments of these architectures. As it stands, expert rule-based systems are
the most widely used and accepted AI in the world outside of games. The fields of
medicine, finance and many others have benefited greatly by intelligent use of such
systems. With the combination of rule-based systems and ID trees, there is great
potential for most fields.
Appendices

Appendix A -- Forward-Chaining Example: Medical


Diagnosis

Assertions (Working Memory):

A1: runny nose


A2: temperature=101.7
A3: headache
A4: cough

Rules (Rule-Base):

R1: if (nasal congestion)


(viremia)
then diagnose (influenza)
exit

R2: if (runny nose)


then assert (nasal congestion)

R3: if (body- aches)


then assert (achiness)

R4: if (temp >100)


then assert (fever)

R5: if (headache)
then assert (achiness)

R6: if (fever)
(achiness)
(cough)
then assert (viremia)

Execution:

1. R2 fires, adding (nasal congestion) to working memory.


2. R4 fires, adding (fever) to working memory.
3. R5 fires, adding (achiness) to working memory.
4. R6 fires, adding (viremia) to working memory.
5. R1 fires, diagnosing the disease as (influenza) and exits, returning the
diagnosis

Appendix B -- Backward-Chaining Example: Medical


Diagnosis

Use same rules/assertions from Appendix A


Hypothesis/Goal: Diagnosis (influenza)
Execution:

1. R1 fires since the goal, diagnosis(influenza), matches the conclusion of that


rule. New goals are created: (nasal congestion) and (viremia) and
backchaining is recursively called with these new goals.
2. R2 fires, matching goal nasal congestion. New goal is created: (runny nose).
Backchaining is recursively called. Since (runny nose) is in working memory, it
returns true.
3. R6 fires, matching goal viremia. Back-chaining recursion with new goals:
(fever), (achiness) and (cough)
4. R4 fires, adding goal (temperature > 100). Since (temperature = 101.7) is in
working memory, it returns true.
5. R3 fires, adding goal (body-aches). On recursion, there is no information in
working memory nor rules that match this goal. Therefore it returns false and
the next matching rule is chosen. That rule is R5 which fires, adding goal
(headache). Since (headache) is in working memory, it returns true.
6. Goal (cough) is in working memory, so that returns true.
7. Now, all recursive procedures have returned true, the system exits, returning
true: this hypothesis was correct: subject has influenza.

Appendix C -- Identification Tree Training

The identification tree will be trained on the following data:


We greedily create a small ID tree from this. For each column (save the first and last,
since the first is simply an identifier and the last is the result we’re trying to identify)
we create a tree based solely on the divisions within that category. The resulting trees
are as follows:

From these, we calculate the disorder of each:

Size_Disorder = Σb (nb/nt) * (Σc -(nbc/nb)log2(nbc/nb))


= (4/8) * ((-(2/4) log2 (2/4))
+ (-(2/4) log2 (2/4))) + ((1/8) * 0) + ((3/8) * 0)
= 0.5

The disorder for the ‘size’ test is 0.5. The other disorders are as follows:

Size: 0.5
Color: 0.69
Weight: 0.94
Rubber: 0.61
Since Size is the lowest disorder, we take that one and further break down any
unhomogenous sets. There is only one, those of the ‘small’ branch. The following trees
are resulting from the further division of the size=small test.

The Rubber test splits the remaining samples into perfect subsets with 0 disorder.
Therefore, our final, simplest ID tree which represents the data is:

Appendix D -- Creating Rules from ID trees

Given the final ID tree from appendix C, follow from the root test down to each
outcome with each node visited becoming a condition of our rule. This gives us the
following rules:
First, we'll follow the rightmost path from the root node: size=large medium. Of the
three in this branch, none bounce.

R1: if (size = medium)


then (ball does not bounce)

Next, we examine the next branch: size=medium large. There is only one in this
branch, and it bounces. Based on this data (this may change under a larger training
set) all medium balls bounce.

R2: if (size = large)


then (ball does bounce)

Third, we follow the leftmost branch: size=no. This leads us to another decision node.
Taking the rightmost branch: rubber=no gives us this rule:

R3: if (size = small)


(rubber = no)
then (ball does not bounce)

And finally, we follow the first branch left: size=no and at the next test follow
rubber=yes. The following rule is produced:

R4: if (size = small)


(rubber = yes)
then (ball does bounce)

Appendix E -- Eliminating unnecessary rule conditions

Given the rules provided in appendix D, we see if there’s any way to simplify those
rules by eliminating unnecessary rule conditions.
The last two rules have two conditions each. Consider, for example, the first of these,
R3:

R3: if (size = small)


(rubber = no)
then (ball does not bounce)

Looking at the probability with event A = (size=small) and event B = (ball does not
bounce)

P(B|A) = (3 non rubber balls do not bounce / 8 total) =


0.375
P(B) = (3 non rubber balls / 8 total) = 0.375
P(B|A) = P(B) therefore B is independent of A
If we were to eliminate the first condition: size=small, then this rule would trigger for
every ball not made of rubber. There are 3 balls not made of rubber. They are 2, 3 and
8 – none of these bounce. Because none bounce, the size does not affect this and we
can eliminate that condition.

if (rubber = no)
then (ball does not bounce)

Examining the next condition, the probability for A = (rubber=no) and B the same:
P(B|A) = (2 small balls do not bounce / 8 total) = 0.25
P(B) = (4 small balls / 8 total) = 0.5
P(B|A) does not equal P(B) therefore A and B are not
independent

If you eliminate the next condition in the same rule, (rubber = no) this triggers for
every small ball. Of the small balls, two bounce and two do not. Therefore, the rubber
does affect if they bounce or not and cannot be eliminated. The small balls bounce
only if they are rubber.
Now, the next rule with two conditions:
R4: if (size = small)
(rubber = yes)
then (ball does bounce)

Examining the probabilities: A = (size=small) B = (ball does bounce)


P(B|A) = P(2 small balls bounce / 8 total) = 0.25
P(B) = P(4 small balls / 8 total) = 0.5
P(B|A) does not equal P(B) therefore A and B are not
independent.

If we eliminate the first rule, it fires for all rubber balls. Of the five rubber balls, two
are small and both bounce. Of the other three, one bounces and two do not. For this
rule, (size=small) is important.
On to the next condition. Examining the probabilities: A = (rubber=yes) B = (ball
does bounce)

P(B|A) = P(3 rubber balls bounce / 8 total) = 0.375


P(B) = P(5 rubber balls / 8 total) = 0.625
P(B|A) does not equal P(B) therefore A and B are not
independent

Eliminating the second fires for all small balls. Of the four small, two bounce and two
do not. Again, the condition is significant and cannot be dropped. Therefore this rule
must stay as it is.

Appendix F -- Eliminating unnecessary rules

We have the following simplified rules from Appendix E:


R1: if (size = large)
then (ball does not bounce)

R2: if (size = medium)


then (ball does bounce)

R3: if (rubber = no)


then (ball does not bounce)

R4: if (size = small)


(rubber = yes)
then (ball does bounce)

Of these, we have two sets of rules, each set shares a common result. The first group
consists of rules R1 and R3. The second consists of rules R2 and R4.
We can eliminate one of these sets and replace it with the rule:
if (no other rule fires)
then (perform these common actions)

Both sets have the same number of rules, 2, but the second set has more conditions
than the first. So, we’ll eliminate the second set and replace it with:

if (no other rule fires)


then (ball does bounce)

Our final rule-base is:

R1: if (size = large)


then (ball does not bounce)
R2: if (rubber = no)
then (ball does not bounce)

R3: if (no other rule fires)


then (ball does bounce)

Appendix G -- Additional Online Resources

• A much more in-depth examination of the Rete Method


• Another source about ID tree machine learning
• CLIPS: A tool for building Expert Systems
• FuzzyCLIPS is an extension of the CLIPS expert system shell.
• A list of papers from CiteSeer
• Companion for a book, there's a section for Rule-Based Systems
• Some class notes (not mine) on Rule-Based Systems
• Some class notes (not mine) on Expert Systems

You might also like