Rule_engine-decision_trees-JP
Rule_engine-decision_trees-JP
Using a set of assertions, which collectively form the ‘working memory’, and a set of
rules that specify how to act on the assertion set, a rule-based system can be created.
Rule-based systems are fairly simplistic, consisting of little more than a set of if-then
statements, but provide the basis for so-called “expert systems” which are widely
used in many fields. The concept of an expert system is this: the knowledge of an
expert is encoded into the rule set. When exposed to the same data, the expert
system AI will perform in a similar manner to the expert.
Rule-based systems are a relatively simple model that can be adapted to any number
of problems. As with any AI, a rule-based system has its strengths as well as
limitations that must be considered before deciding if it’s the right technique to use for
a given problem. Overall, rule-based systems are really only feasible for problems for
which any and all knowledge in the problem area can be written in the form of if-then
rules and for which this problem area is not large. If there are too many rules, the
system can become difficult to maintain and can suffer a performance hit.
To create a rule-based system for a given problem, you must have (or create) the
following:
1. A set of facts to represent the initial working memory. This should be anything
relevant to the beginning state of the system.
2. A set of rules. This should encompass any and all actions that should be taken
within the scope of a problem, but nothing irrelevant. The number of rules in
the system can affect its performance, so you don’t want any that aren’t
needed.
3. A condition that determines that a solution has been found or that none exists.
This is necessary to terminate some rule-based systems that find themselves
in infinite loops otherwise.
The rule-based system itself uses a simple technique: It starts with a rule-base,
which contains all of the appropriate knowledge encoded into If-Then rules, and a
working memory, which may or may not initially contain any data, assertions or
initially known information. The system examines all the rule conditions (IF) and
determines a subset, the conflict set, of the rules whose conditions are satisfied based
on the working memory. Of this conflict set, one of those rules is triggered (fired).
Which one is chosen is based on a conflict resolution strategy. When the rule is fired,
any actions specified in its THEN clause are carried out. These actions can modify the
working memory, the rule-base itself, or do just about anything else the system
programmer decides to include. This loop of firing rules and performing actions
continues until one of two conditions are met: there are no more rules whose
conditions are satisfied or a rule is fired whose action specifies the program should
terminate.
Which rule is chosen to fire is a function of the conflict resolution strategy. Which
strategy is chosen can be determined by the problem or it may be a matter of
preference. In any case, it is vital as it controls which of the applicable rules are fired
and thus how the entire system behaves. There are several different strategies, but
here are a few of the most common:
• First Applicable: If the rules are in a specified order, firing the first applicable
one allows control over the order in which rules fire. This is the simplest
strategy and has a potential for a large problem: that of an infinite loop on the
same rule. If the working memory remains the same, as does the rule-base,
then the conditions of the first rule have not changed and it will fire again and
again. To solve this, it is a common practice to suspend a fired rule and prevent
it from re-firing until the data in working memory, that satisfied the rule’s
conditions, has changed.
• Random: Though it doesn’t provide the predictability or control of the
first-applicable strategy, it does have its advantages. For one thing, its
unpredictability is an advantage in some circumstances (such as games for
example). A random strategy simply chooses a single random rule to fire from
the conflict set. Another possibility for a random strategy is a fuzzy rule-based
system in which each of the rules has a probability such that some rules are
more likely to fire than others.
• Most Specific: This strategy is based on the number of conditions of the rules.
From the conflict set, the rule with the most conditions is chosen. This is based
on the assumption that if it has the most conditions then it has the most
relevance to the existing data.
• Least Recently Used: Each of the rules is accompanied by a time or step
stamp, which marks the last time it was used. This maximizes the number of
individual rules that are fired at least once. If all rules are needed for the
solution of a given problem, this is a perfect strategy.
• "Best" rule: For this to work, each rule is given a ‘weight,’ which specifies how
much it should be considered over the alternatives. The rule with the most
preferable outcomes is chosen based on this weight.
Forward-Chaining
Backward-Chaining
In other problems, a goal is specified and the AI must find a way to achieve that
specified goal. For example, if there is an epidemic of a certain disease, this AI could
presume a given individual had the disease and attempt to determine if its diagnosis
is correct based on available information. A backward-chaining, goal-driven, system
accomplishes this. To do this, the system looks for the action in the THEN clause of the
rules that matches the specified goal. In other words, it looks for the rules that can
produce this goal. If a rule is found and fired, it takes each of that rule’s conditions as
goals and continues until either the available data satisfies all of the goals or there are
no more rules that match.
Semantic Network
A semantic network is the most basic structure upon which an identification tree
(hereafter referred to as an ID tree) is based. Simply put, a semantic network consists
of nodes representing objects and links representing any relations between these
objects.
Semantic Tree
At the next level of complexity exists a semantic tree, which is simply a semantic
network with a few additional conditions and terms. Each node has a parent to which
it is linked (with the exception of the root node which is it’s own parent and which
needs no link). Each link connects the parent node with any and all children nodes. A
single parent node may have multiple children, but no children may have multiple
parents. Nodes with no children are the leaf nodes. The difference between a tree and
a network is this: a network can have loops, a tree cannot.
The root node is marked as such. It is parent to itself, A and B. A is child to the root
and parent to C. B is child to the root and parent to D and E. C is a child to A and has
no children of its own, making it a leaf node. D is parent to F, which is parent to leaf
nodes I and J. E, is parent to leaf nodes G and H.
Decision Tree
Above semantic trees comes a decision tree. Each node of a decision tree is linked to
a set of possible solutions. Each parent node, that is each node that is not a leaf (and
thus has children) is associated with a test, which splits the set of possible answers
into subsets representing every possibility of the test’s outcomes.
Each non-leaf node serves as a test to lead to one of the leaf outcomes.
Identification Trees
Last, but not least, an ID tree is a decision tree in which all possible divisions is
created by training the tree against a list of known data. The purpose of an ID tree is
to take a set of sample data, classify the data and construct a series of test to classify
an unknown object based on like properties.
Training ID Trees
First, the tree must be created and trained. It must be provided with sufficient labeled
samples that are used to create the tree itself. It does this by dividing the samples into
subsets based on features. The sets of samples at the leaves of the tree define a
classification. The tree is created based on Occam’s Razor, which (modified for ID
trees) states that the simplest (smallest) tree, that is consistent with the training
samples, is the best predictor. To find the smallest tree, one could find every possible
tree given the data set then examine each one and choose the smallest. However, this
is expensive and wasteful. The solution to this, therefore, is to greedily create one
small tree:
Then:
ID Trees to Rules
If there are conditions of that rule that are inconsequential to the outcome, discard
them thus simplifying the rule (and thus improving efficiency). This is accomplished
by proving that the outcome is independent of the given condition. Events A and B are
independent if the probability of event B does not change given that event A occurs.
Using Bayes Rule:
P(B | A) = P(B)
This states that the probability of event B given that event A occurs is equal to the
probability that event B occurs by itself. If this holds true, then event A does not effect
whether or not event B occurs. If A is a condition and B is a result, then A can be
discarded without affecting the rule.
For an example of this, see Appendix E.
If two or more rules share the same end result, you may be able to replace them with
a rule that fires in the event that no other rule is fired:
If there is more than one such group of rules, replace only one group. Which one is
determined by some heuristic tiebreaker. Two such tiebreakers follow:
• Replace the larger of the two groups. If group A has six rules which share a
common result and group B only has five, replace the larger group A with will
eliminate more rules and simplify the rule base the most.
• Replace the group with the highest average number of rule conditions. While
more rules may remain, the rules that remain will be more simple as they have
fewer conditions. For example, given the rules:
Conclusion
I have heard a few people, including some of my classmates, say that rule-based and
expert systems are obsolete; and that ID trees are a thing of the past. Granted, this
is not the direction that most research is moving, but that doesn’t negate the existing
accomplishments of these architectures. As it stands, expert rule-based systems are
the most widely used and accepted AI in the world outside of games. The fields of
medicine, finance and many others have benefited greatly by intelligent use of such
systems. With the combination of rule-based systems and ID trees, there is great
potential for most fields.
Appendices
Rules (Rule-Base):
R5: if (headache)
then assert (achiness)
R6: if (fever)
(achiness)
(cough)
then assert (viremia)
Execution:
The disorder for the ‘size’ test is 0.5. The other disorders are as follows:
Size: 0.5
Color: 0.69
Weight: 0.94
Rubber: 0.61
Since Size is the lowest disorder, we take that one and further break down any
unhomogenous sets. There is only one, those of the ‘small’ branch. The following trees
are resulting from the further division of the size=small test.
The Rubber test splits the remaining samples into perfect subsets with 0 disorder.
Therefore, our final, simplest ID tree which represents the data is:
Given the final ID tree from appendix C, follow from the root test down to each
outcome with each node visited becoming a condition of our rule. This gives us the
following rules:
First, we'll follow the rightmost path from the root node: size=large medium. Of the
three in this branch, none bounce.
Next, we examine the next branch: size=medium large. There is only one in this
branch, and it bounces. Based on this data (this may change under a larger training
set) all medium balls bounce.
Third, we follow the leftmost branch: size=no. This leads us to another decision node.
Taking the rightmost branch: rubber=no gives us this rule:
And finally, we follow the first branch left: size=no and at the next test follow
rubber=yes. The following rule is produced:
Given the rules provided in appendix D, we see if there’s any way to simplify those
rules by eliminating unnecessary rule conditions.
The last two rules have two conditions each. Consider, for example, the first of these,
R3:
Looking at the probability with event A = (size=small) and event B = (ball does not
bounce)
if (rubber = no)
then (ball does not bounce)
Examining the next condition, the probability for A = (rubber=no) and B the same:
P(B|A) = (2 small balls do not bounce / 8 total) = 0.25
P(B) = (4 small balls / 8 total) = 0.5
P(B|A) does not equal P(B) therefore A and B are not
independent
If you eliminate the next condition in the same rule, (rubber = no) this triggers for
every small ball. Of the small balls, two bounce and two do not. Therefore, the rubber
does affect if they bounce or not and cannot be eliminated. The small balls bounce
only if they are rubber.
Now, the next rule with two conditions:
R4: if (size = small)
(rubber = yes)
then (ball does bounce)
If we eliminate the first rule, it fires for all rubber balls. Of the five rubber balls, two
are small and both bounce. Of the other three, one bounces and two do not. For this
rule, (size=small) is important.
On to the next condition. Examining the probabilities: A = (rubber=yes) B = (ball
does bounce)
Eliminating the second fires for all small balls. Of the four small, two bounce and two
do not. Again, the condition is significant and cannot be dropped. Therefore this rule
must stay as it is.
Of these, we have two sets of rules, each set shares a common result. The first group
consists of rules R1 and R3. The second consists of rules R2 and R4.
We can eliminate one of these sets and replace it with the rule:
if (no other rule fires)
then (perform these common actions)
Both sets have the same number of rules, 2, but the second set has more conditions
than the first. So, we’ll eliminate the second set and replace it with: