Hkansson 2006
Hkansson 2006
Systems
1 Introduction
B. Gabrys, R.J. Howlett, and L.C. Jain (Eds.): KES 2006, Part I, LNAI 4251, pp. 342 – 351, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Reengineering for Knowledge in Knowledge Based Systems 343
2 Related Work
Several reengineering models have been developed during the years. Some operate on
the source code level, and others recover the design and specifications [4]. These
frequently use a knowledge-based approach to transform the architecture of a system
from one form to another. An example of such a transformation is a model developed
for the reengineering of expert systems to obtain an object-oriented architecture [1].
This model reengineers non-object-oriented systems into object-oriented architectures
by using a knowledge-based approach. Other examples are have been used to change
conventional architectures to object-oriented architectures [4] and transforming
procedural programs to object-oriented programs, such as CORET (object-oriented
reverse engineering) [10].
In our approach, we are not transforming architectures, but instead, using
reengineering in KBSs to collect and represent the knowledge contained in the
systems. This approach allows us to modify the knowledge and to extend the
344 A. Håkansson and R.L. Hartung
Reengineering involves collecting all kinds of knowledge in a system and taking into
account any special functionality, e.g., procedural code, utilized by the system to
make use of the knowledge, in, for example, an inference mechanism. The knowledge
is often presented as productions and heuristics rules and facts. The reengineering
must collect different kinds of rules together and the relationships to all other rules to
cover all of the knowledge contained in the knowledge base. Moreover, as pre-stored
facts are utilized within some of the rules, these must not be omitted since the
execution of the rules depends on these facts. In addition to the rules and facts, as the
reengineering must collect all the information that is involved in the reasoning in the
system, it is necessary to track down the user-given facts, in the form of answers
given or the alternative responses associated with the questions posed, and the
conclusions that will be presented for the different responses.
The reengineering uses different search methods to locate the knowledge in the
system. Different search criteria are used to search for knowledge with each set of
criteria depending on the system’s representation and syntax. Consequently, we use
different approaches to find rules and facts.
Reengineering for Knowledge in Knowledge Based Systems 345
The first and simplest approach is to search for rules as predicates in the
representation. The reengineering can easily pick up the rules using the keywords, i.e.
“rule”, if that is the word used to denote rules. See the example, rule number 105:
The reengineering procedure continues by collecting the rules by using a search for
all predicates referenced in the rule. Many of the associated rules can be found by
using the content of the rules and relationships to other rules. One rule will usually
reference several others and can, therefore, be found by investigating the content of
the predicate, i.e., using “check”. In general, the relationships between the rules
result in a complex graph of dependencies.
By searching the internal contents of these rules, many of the pre-stored facts can
be found, i.e., those labeled “fact”, see Figure 1. These pre-stored facts are often
stored in a database and are present even though the system does not execute
commands involving them.
User-given facts are knowledge that is required by the system, but which,
unfortunately, can only be found in the database during a session, i.e., “reply”, as
shown in Figure 1. Since, the user-given facts are not collected from the rules, the
reengineering must reconstruct the user-given hypothetically. This is accomplished by
inspecting the rules and determining what possible values will be given. This step is
required because the facts inserted into the system will depend on consultation, and
the values obtained will vary from one session to another.
Despite conducting a comprehensive search of all of the rules, there is no
guarantee that all kinds of rules will be identified and picked up because, for
example, some rules take on different structures depending on the purpose of the
rule. These rules are different because they will accept different numbers of
arguments depending upon the call, e.g., “rule(105, 'sign of allergic
purpura', 'Find a doctor immediately text', 1000)” which
has four arguments and “check('general disease rule', 'general
disease text', 1000)” which has three arguments, and where the second
alternative omits the rule number.
The second, and more complicated, approach is to search the syntactic form of the
rules used to identify when the rules have a different appearance to that mentioned
above, and use the presumed word “rules”. Rules frequently have a predetermined
pattern that can be used by the interpretation engine when reengineering grasps the
rules in a knowledge base. This is referred to as the pattern of antecedent-consequent
rules. Thus, determining the structure of the rules is the starting point for a search of
this type, for which it is necessary to identify the first rule and then use the pattern
revealed to find the rest of the rules.
346 A. Håkansson and R.L. Hartung
The engine for interpretation can use the in-built predicate for interpreting
antecedent-consequent rules. The predicate available in Prolog, called Clause,
separates the predicate head from the body. These parts are then utilized by the clause
predicate to check each part separately — the head, and the body. The reengineering
utilizes this clause predicate by tracking the interpreter while forcing the system to
execute the engine as though someone were consulting the system. This clause
predicate uses the predicate’s internal structure to interpret the rules, pre-stored facts
and user-given facts to reach a conclusion or conclusions. This search works for a
couple of KBSs, namely, those with predetermined patterns. However some systems
use other kinds of self-developed interpreters.
In the third approach the reengineering tool looks for the rules without using any
presumed word or pattern. If the relevant word or the pattern it should seek is
unknown, the rules can be found by counting the number of occurrences of a
predicate with the same name. The search will find predicates that occur frequently in
the code. In a system, rules are, usually, the most common predicates and the number
of occurrences of that predicate will generally exceed the number of times the other
predicates occur. Again, the reengineering uses the same method as described in first
approach to determine the pattern intrinsic to the rule and to look for other rules with
the same pattern using the pattern or patterns to find all the other rules in the
knowledge base. However, it is not certain that the reengineering procedure will find
the rules of the system. Instead, the reengineering tool may find a predicate that is the
code for functionality of the system rather than rules or facts. The knowledge
engineer must analyze whether the predicate belongs to the knowledge or the
functionality.
In addition to obtaining the rules, the other predicates, in the form of facts,
questions used to obtain the user-given facts and conclusions all need to be found by
the reengineering using the same kind of searching procedures as mentioned above.
To find the questions and the conclusion, the rules are investigated. However, the
connections between the rules, questions and the conclusions must also be collected.
Some rules are not really related to the domain problem [11] at all, e.g., meta-rules.
Meta-rules are used to direct the problem solving and to determine how best to solve
problems. These meta-rules are used when deciding which rules to apply to avoid a
conflict within the reasoning strategy [8]. A meta-rule devises a strategy for the use of
task-specific rules in the system [13] and is, therefore, dependent on other rules.
Some rules are specially constructed to enable them to be utilized by many other
rules. Most commonly constructed rules are inferred rules. Inferred rules are derived
from the initial rules and represent the process of derive new information from old
[11]. An inferred rule links several different rules (e.g., A->B, B->C) by concluding
that rule A implies rule C. In the inferred rules, the antecedent of the initial rule
appears in the body of the rule. If the relationships between inferred rules and initial
rules are found, it is important that they are preserved since they are affected by
changes to the rule base.
Derived rules, as their name suggests, are rules that are derived from other rules.
They are similar to inferred rules since they involve the process of linking new
information, however, instead of inferring the rules, the rules are derived from other
rules, making a derived rule a short form for several lines of primitive rules. The
derived rules may have direct relationships to the rules that produced them, for which
they are short cuts. These relationships, too, are important and must be retained.
As mention above, in the rule set, the rules also have a relationship to the facts,
both pre-stored and user-given. Moreover, the majority of the rules are connected to
the conclusions presented to the end users. The connections to the facts and
conclusions must be taken care into consideration.
Reengineering must operate on the source code, which implements the functionality,
e.g. the functions and the predicates, used to handle the knowledge of the system. The
functionality is essential for working with the knowledge, e.g., for using the
interpretation engine, performing calculations and for consideration of factors relating
to uncertainty.
Reengineering collects the functionality by searching in the source code. It follows
the code and searches for the first occurrence the engine invokes rules. Other
functionality is evident when the rules themselves need to calculate a result, e.g., to
find averages. In these cases, the reengineering searches for the functionality from the
rules, i.e., it searches for the rules that invoke this functionality. An additional
functionality is the certainty factor. The interpreter usually calculates these certainty
factors at the end of a session. To collect this functionality, the reengineering must
again search in the source code. Hence, the engineering operates on the code for the
interpretation engine.
Consistency is a knowledge base with no rules that are conflicting, redundant, subsumed
or circular. If the knowledge base has circular rules, the reengineering tool will run into
an infinite loop, which must be taken care of during the reengineering. The structure of
circular rules must be recognized to be able to avoid the potential loops.
348 A. Håkansson and R.L. Hartung
Completeness means having a rule base without dead-end rules, i.e., avoiding the
problems caused by having a premise that can never be reached or which does not
exist. The reengineering would try the find the rule and run into a search problem.
The tool works through the code until it finds the rule corresponding to a premise, but
if the premise is lacking, the reengineering tool is searching for a non-existent rule.
The reengineering tool keeps track of all the rules it has encountered, until, when a
certain time limit is exceeded, concludes that the rule is missing.
The definition of correctness corresponds to the match of the conclusion of the system
with the opinions of domain experts. Since correctness is human based, reengineering
cannot make this kind of check, because such a check would require the system to reflect
a domain expert’s knowledge and reasoning to reach the same conclusions.
Graphic modeling is used to make the knowledge more accessible and the experts
can use the models to check the rules and facts that are involved in a conclusion. If
the models are easy to follow and understand, then it will be more straightforward for
the users to judge consistency, completeness and correctness.
fever object
reply fact
> normal
The example presented in Figure 2 illustrates the engineering operation on the rule
“allergic purpura rule”, previously presented in Figure 1. Within the rule, another rule
is implemented, the “general disease object”. This incorporates a fact “fever object”,
which corresponds to a high body temperature, the fact “one rash with strong red spot
object” with the user-given answer “Yes” and the fact “Several symptoms object:
Vomit / Headache / Oversensitive to light / Pain when bending head forward” with
the user-given answer “No”. The conclusion “Find doctor immediately text” is
connected to the rule. When a rule has been investigated, it can be folded into
packages of UML, as demonstrated in the “general disease object” rule in Figure 2.
The internal content of a rule is specified inside a package.
If a user is to be able to grasp the outcome of the reengineering, the tool must
generate diagrams to answer specific questions posed by the user. For example, how a
specified conclusion can be obtained from the rules. The diagram provides a graphic
view of the answers.
The relationships between the rules together with the order of the system invoking
the rules are presented along a lifeline. This lifeline illustrates the order with arrows
presenting the rules, the pre-stored facts and the user-given facts.
At the end of the lifeline the conclusion is presented, in the “Contact the doctor
immediately text” in Figure 2. In this example, the two conclusions that are attainable
are: in the package “general disease rule”, which cannot be seen in this example, and
that at the end of the lifeline.
In addition, all kinds of relationships between the sets of rules or inputs that
produce two conclusions will be presented in sequence diagram. However, to
illustrate this more clearly we use a UML object diagram.
Figure 3 presents the connections between questions, rules and conclusions using
object diagrams. This example illustrates the same rule as was presented in Figure 2,
revealing how the object diagram presents the connections between different parts
more clearly than the sequence diagram.
350 A. Håkansson and R.L. Hartung
References
[1] Babiker, E., Simmons, D., Shannon, R., & Ellis, N.: A model for reengineering legacy
expert systems to object-oriented architecture, Expert systems with applications, vol. 12,
no. 3 (1997) 363-371.
[2] Carriere, J, O'Brian, L., and Verhoef, C. Reconstruction Software Architecture.
Appearing Carriere J., O’Brian L., and Verhoef C. Reconstruction Software Architecture.
Appearing in Bass L., Clements P., and Kazman R., Software Architecture in Practice
(2nd edition), ISBN 0-321-15495-9, Addison-Wesley, 2003.
[3] Durkin, J.: Expert System Design and Development. Prentice Hall International Editions.
MacMillian Publishing Company, New Jersey (1994).
[4] Gall, H., Klösch, R. & Mittermeir, R.: Using Domain Knowledge to Improve Reverse
Engineering. International Journal on Software Engineering and Knowledge
Engineering, World-Scientific Publishing, Vol. 6, No. 3, (1996) 477-505.
[5] Håkansson, A.: UML as an approach to Modelling Knowledge in Rule-based Systems.
(ES2001) The Twenty-first SGES International Conference on Knowledge Based Systems
and Applied Artificial Intelligence. Peterhouse College, Cambridge, UK; December 10th-
12th (2001).
[6] Håkansson, A.: Supporting Illustration and Modification of the Reasoning Strategy by
Visualisation. (SCAI'03) The Eighth Scandinavian Conference on Artificial Intelligence,
Bergen, Norway, November 2th-4th (2003).
Reengineering for Knowledge in Knowledge Based Systems 351