Hkansson 2007
Hkansson 2007
To cite this article: Anne Håkansson & Ronald Hartung (2007) USING REENGINEERING
FOR KNOWLEDGE-BASED SYSTEMS, Cybernetics and Systems: An International Journal,
38:8, 799-824, DOI: 10.1080/01969720701601122
Taylor & Francis makes every effort to ensure the accuracy of all the
information (the “Content”) contained in the publications on our platform.
However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness,
or suitability for any purpose of the Content. Any opinions and views
expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the
Content should not be relied upon and should be independently verified with
primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or
indirectly in connection with, in relation to or arising out of the use of the
Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan,
sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found at
http://www.tandfonline.com/page/terms-and-conditions
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
Cybernetics and Systems: An International Journal, 38: 799–824
Copyright Q 2007 Taylor & Francis Group, LLC
ISSN: 0196-9722 print=1087-6553 online
DOI: 10.1080/01969720701601122
ANNE HÅKANSSON
Department of Information Science, Computer Science,
Uppsala University, Uppsala, Sweden
RONALD HARTUNG
Department of Computer Science, Franklin University,
Columbus, Ohio
INTRODUCTION
Over a couple of decades, a wide variety of systems operating on
knowledge have been developed. These systems support decision-making
across a vast number of areas, with the oldest and the most established
field being medical services. Unfortunately, many of these systems are not
in use any longer as a consequence of the outdated theories, knowledge,
advice or findings. Developing a new system, however, takes a consider-
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
able number of man months and, during this period; changes in the
working environment may have been occurred that will have an impact
on the requirements. Moreover, while the requirements for the system
may have changed, so too may the understanding of the problem, which
may have advanced, or new discoveries might have been made that need
to be incorporated in the system. These changes may have an impact on
the system, and even a system that has just been implemented in an
organization can be instantaneously out of date (Durkin 1994).
Modifying a recently developed system may not be a problem
because the alteration may only require a minor modification of the
source code, the software engineers may still be available, and there
may be an adequate amount of comprehensive well-written system doc-
umentation. On the other hand, it can be a considerable problem if a
profound alteration is required in a large and complex system (Wiratunga
and Craw 2000). If done manually, this is both time-consuming
and requires that a lot of documentation be available for the system.
Moreover, highly interrelated, complex source code may be involved in
the alteration, and changes of such code are difficult and can generate
errors. Any changes to the system can result in verification and validation
problems caused by inconsistency, incompleteness and incorrectness.
An automated assistant, performing reverse engineering can support
the modification of the contents of the knowledge base. Reverse engin-
eering, also called reengineering, is used to collect and understand
source code that is to be maintained and reused (Gall et al. 1996). Beside
source code, we argue that reengineering can be used to extract knowl-
edge in the form of rules, facts and conclusions in a knowledge-based
system (Simmons et al. 1998). However, since reengineering operates
on a large amount of code, it is a cumbersome process and difficult to
handle manually (Gall et al. 1996). As a result of this, automated assis-
tance is needed for reengineering of the domain knowledge and the
functionality of a system.
USING REENGINEERING FOR KNOWLEDGE-BASED SYSTEMS 801
RELATED WORK
Reengineering has been popular since the mid 1980s. Since then, several
reengineering models have been developed, some operating on the
802 A. HÅKANSSON AND R. HARTUNG
source code level, and others recovering the design and specifications
(Antonini et al. 1987; Benedusi et al. 1992; Cimitile and De Carlini
1991; Gall et al. 1996). Many of these frequently use a knowledge-based
approach, also the most popular approach adopted, has been to trans-
form the architecture of a system from one form to another. An example
of such a transformation is a model developed by applying reengineering
on expert systems to obtain an object-oriented architecture (Babiker et al.
1997). This model reengineers non-object-oriented systems into object-
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
REVERSE ENGINEERING
Reverse engineering is the process used to analyze a software system to
identify the components and their relationships with the intention of cre-
ating a representation of the system in another form or at a higher level
of abstraction (Chikofsky and Cross 1990). The purpose of reverse
engineering is to understand software systems to facilitate improvement,
correction, documentation, redesign and recoding (Rugaber 2000; Rugaber
et al. 1992). This might be carried out to maintain and reuse source code,
but it can also be conducted to recover designs and specifications.
The existence of large quantities of undocumented code makes reen-
gineering a common and painful problem in the software industry. This
code needs to be captured, analyzed and understood when it is to be
fixed, modified or incorporated into a new system. Sets of techniques
have been developed to provide support when attempting to understand
code (Carriere et al. 2003). The most common is the ad hoc approach
USING REENGINEERING FOR KNOWLEDGE-BASED SYSTEMS 803
. to identify the part of the rules set that drives a particular conclusion
. to determine which of the conclusions can be drawn from a set of rules
and inputs
. to find the possible difference set between the sets of rules and=or
inputs from the known conclusions.
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
Usually, there are many rules that are involved in arriving at a single
conclusion. The reengineering tool needs to identify the complete set of
rules, facts and questions that drives a particular conclusion to allow
modifications to be made and code to be renewed at the same time as
assuring correctness and completeness of the knowledge base. Making
a change to an incomplete rule set can have unwanted side-effects since
all the rules in the rule set are involved when a modification is made.
Many conclusions can be involved within a set of rules. When a part
of the set of rules is changed, these conclusions may be affected. The tool
needs to identify all of the possible conclusions that can be obtained
from the set of rules to maintain correctness and consistency of the
knowledge base when changing the rules in the set.
Different subsets of rules may exist and there may be many inputs
that are obtained from known conclusions. These different subsets need
to be identified by the tool since changing the conclusions can affect all
of the different subsets. Again, modifying the conclusions involves ensur-
ing correctness, consistency and completeness.
because such facts are only available when the system is being executed.
Commonly, some facts that are used by the rules will not be given at all
by the users because the facts that need to be inserted into the system
will depend on what the consultation is about. If the user-given facts
are not collected from the rules, the reengineering must reconstruct
the user-given answers hypothetically. This is accomplished by inspecting
the rules and determining what possible values might be given. This step
is required because the values obtained will vary from one session with
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
on the other hand, usually use only one statement in the predicate body,
i.e., the text to be presented to the user. If any of these are hard to find,
the third search method will be used. Beside rules, the other predicates
that exist in large numbers are facts, questions and conclusions. More-
over, the connections between the rules, questions and the conclusions
must also be collected.
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
The functionality is essential for working with the knowledge, e.g., for
using the interpretation engine, performing calculations and for con-
sideration of factors relating to uncertainty.
The interpretation engine may either be an in-built function or be
developed by a knowledge engineer or programmer. The reengineering
procedure collects the functionality for the interpretation engine by
searching the source code. It follows the code and searches for the first
occasion on which the engine invokes any rules. Other functionality is
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
evident when the rules need to perform calculations as they do, for
example, when they need to find an average. In these cases, the reengi-
neering searches for the functionality by searching for and identifying
the rules that invoke the required functionality. An additional type of
functionality is the certainty factor. If certainty factors are used by the
system, the interpreter usually calculates them at the end of a session.
When collecting this functionality, the reengineering procedure must
again search in the source code. Hence, the engineering operates on
the code for the interpretation engine.
If a small number of rules with little internal content are used, the pro-
cessing of the rules is effortless, but generally the processing is time-
consuming and labor intensive.
In a knowledge base, concepts can be applied to rules with the inten-
tion of grasping the interpretation of the rules by adding some semantic
information, i.e., a meaningful concept. The semantics may change the
comprehension of the physical structure of the rules to obtain a concep-
tual knowledge management structure (Håkansson 2004). The users’
mental model may change from being centered around the physical
structures usually forced on the user, to harnessing the users’ own knowl-
edge space. The users can define their own concepts and then utilize and
apply these self-defined concepts to the rules. This would change the way
in which the users conceive of the rules and also relieve their workload
since they would no longer need to cope with the contents of the rules
and learn the syntax. Instead, they would define concepts and apply
them in a conceptual content, without any knowledge of the physical
structure being required. This would imply a change: from displaying
rules in antecedent-consequent syntax (i.e., a rule with an if . . . then part)
into semantics by utilizing appropriately chosen concepts. Thus, the
concept would correspond to the conclusion of the rule by applying a
semantic meaning to the rule, which corresponds to its interpretation
(i.e., the achievement of the rule).
It is preferable for the concept to correspond to a rule used by the
interpreter to solve a task, however, a concept can be applicable to sev-
eral other rules in the knowledge base. This relationship can be utilized
to give the user supplementary support in revealing the relevant infor-
mation concerning that concept. This means that it is possible to apply
a single concept to an overall task concept, and thereby collect the
knowledge related to that concept.
One benefit of using meaningful and informative concepts and of
hiding information that is not of current relevance is that this gives a
USING REENGINEERING FOR KNOWLEDGE-BASED SYSTEMS 815
tered rules is associated with the similarity of the task performed by the
rules. The person who develops the knowledge base decides the relevant
similarity between the rules, and then clusters the rules (Murphy and
Pazzani 1994; Wiratunga and Craw 2000). This clustering is done
according to how the rules handle a certain task. In this way, concepts
can simplify and speed up the search for rules dealing with similar tasks
or topics and, in so doing, decrease redundancy. Moreover, conceptuali-
zation by using clustering can support the definition of concepts on a
higher level of abstraction. Recognizing similar rules at this level may
allow them to be generalized.
In a structured conceptual knowledge base, clustering rules with the
same concept in the knowledge base may assist in finding and reusing
existing and meaningful rules. Concepts can assist searching facilities
in the process of finding relevant and meaningful rules by identifying a
rule utilizing a specified concept. This searching mechanism can also
support the search for related rules, by using a semantic concept corre-
sponding to the subtasks or conclusions of these rules. Searching for
related rules is also useful when the task is to reuse rules.
It is desirable to apply the concepts for classifying the rules into
several individual groups whose members are semantically related. The
possibility of using general concepts for a composed group of rules
allows new rules to be developed and modeled by using existing con-
cepts. When the contents of the knowledge base are unknown, users
who are unfamiliar with expert systems define every rule at the lowest
level without taking advantage of the reusing facility, which was revealed
in a study of shells for knowledge-based systems (Håkansson 2002).
Allowing the visualization of the structures of the rules at different
levels of abstraction can be beneficial for the domain expert when
engaged in reengineering. The concepts or topics should be actively used
as a support for expanding the knowledge base. These should be con-
stantly presented to the developer, who may continuously interact with
816 A. HÅKANSSON AND R. HARTUNG
the concepts to avoid problems arising, e.g., with verification. The con-
cepts and their relationships need to be continuously updated as the
knowledge base changes.
This is useful for testing the knowledge base to validate it and for
verification.
The user-given answers are collected from the reengineering of facts.
The answers used in a specific rule are distinguished and set as inputs or
calls to the diagram. In this example, there is a call to the rule ‘‘recent
large object’’ and the data are ‘‘tropical climate,’’ ‘‘map tools’’ and
‘‘not modern topographical maps.’’ These are used as values for the facts
(see check answers in the figure) ‘‘environment’’ is ‘‘tropical climate,’’
‘‘map tools’’ is ‘‘yes’’ and ‘‘map tools’’ is not ‘‘modern topographical
maps.’’ When the rule ‘‘maps large oject’’ is checked together with the
facts, the output is produced, i.e., the text ‘‘recent large.’’ These inputs
and the output can be used when testing the knowledge base.
In addition, all kinds of relationships between the sets of rules or
inputs that produce two conclusions will be presented in a sequence diagram.
However, to illustrate this more clearly, we use an UML object diagram.
Figure 4 presents the connections between the questions, rules and
conclusions using object diagrams. This example illustrates the same rule
as was presented in Figure 2, revealing how an object diagram presents the
connections between different parts more clearly than a sequence diagram.
The user-given facts are presented to the left of the figure, the rules and
prestored facts in the middle and the conclusions to the right.
the quality of the system. It is the process of executing the system with
the intention of finding errors or bugs in the software. Unit testing and
software integration testing can be used to test the source code (IPL
2006). Unit testing is testing each unit to verify that it has been correctly
implemented and software integration testing is the testing of progres-
sively larger groups of software components.
For testing systems, test cases are built. A test case is used to specify
how to test the implementation of a particular requirement or a design
decision and to determine the criteria to be used to define success. We
use test cases to test the individual units of software contained within
the knowledge base. The set of test cases specifies the exact process to
be followed, which is conducted by the test cases. The diagrams in where
a series of inputs give a specific output specify the test cases. This is like
designing units of code from descriptions.
to build a complete set of test cases. One can argue about the complete-
ness of the test cases, but the assumption has to be made that the knowl-
edge base is as correct as possible when we apply reengineering to it. If
the knowledge base is correct, the set of test cases should be complete for
the knowledge base.
The collected test cases are put in a directory for use when executing
all the cases. Each execution of a test case is noted in a test log that will,
therefore, contain information of all the test cases that have been exe-
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014
cuted and any errors obtained. The errors are the outcome of the
execution of the test cases and the result of comparing the test case on
the modified rules with the original rules.
The next step is to make the test-case generator for the rule engine
run automatically. After a change has been made to the knowledge base,
the developer can run all the test cases to ensure that the rules produce
the same output with the same inputs when using the modified rule set.
By using these test cases, the rules are detected and verified either as
being correct or as producing errors. There are two different kinds of
failure that test cases can be subject to, either some of the rules have
been affected by the modification made, in which case the test case error
indicates that the system is not correct, or the failure to produce the
same result is indicative of the changes being correct, and the old result
no longer correct for the modified system. In the first instance, the reen-
gineering process can be applied to produce new diagrams showing how
the incorrect result was obtained and, after correcting the problem, the
tests can be rerun until the errors are removed. In the second instance,
however, a new test case needs to be generated.
ory and educational support for end users using the environmental
impact assessment method for hydropower development and river regu-
lation in tropical environments within developing countries (Håkansson
2004:2)
The outcome of the reengineering has been presented by using
ontology in graphical representations, such as UML sequential
diagrams, collaboration diagrams and object diagrams to be enable the
knowledge to be modified. The concepts in the ontology correspond to
the rules and facts; the relationships between concepts correspond to
the relationships between the rules and facts. Through the sequence dia-
gram we can show the relationships between rules and facts in a rule set
and relate these to the conclusions. In the collaboration diagram we
show the inputs from the user that are needed by a rule set for it to be
executed correctly and, finally, the object diagram illustrates the inputs,
i.e., the user-given facts, facts stored in the database, rules and conclu-
sions. However, we have not yet worked with the application of reengin-
eering to the source code handling the interpretation engine. It has
previously been shown that the UML activity diagrams and state-chart
diagrams can be used to illustrate source code in KBSs (Håkansson
2001), which explains why they also might be suitable for illustrating
the outcome of the reengineering of the functionality.
Moreover, testing has been applied to diagrams to develop test cases.
These test cases are used in the modification mode to check the correct-
ness, and to check for consistency and completeness. This will assure the
quality of the knowledge base after changing the contents.
Further work is needed to extend our reengineering tool to enable it
to handle reverse engineering for the source code. The outcome from this
will also be illustrated in diagrams. As long as the predicates have a pat-
tern of antecedent-consequent, the UML diagrams can be used. If not,
other diagrams must be investigated and tested for their ability to visually
present the outcomes. Moreover, we would like to distinguish between
822 A. HÅKANSSON AND R. HARTUNG
REFERENCES
Antonini, P., Benedusi, P., Cantone, G., and Cimitile, A. 1987. Maintenance and
reverse engineering: Low level design documents production and improve-
ment. Proc. conference on software maintenance, IEEE, 91–100.
Babiker, E., Simmons, D., Shannon, R., and Ellis, N. 1997. A model for reengi-
neering legacy expert systems to object-oriented architecture. Expert Systems
with Applications 12(3): 363–371.
Benedusi, P., Cimitile, A., and De Carlini, U. 1992. Reverse engineering pro-
cesses, design document production, and structure charts. Journal of Systems
and Software 19(3): 225–245.
Booch, G., Rumbaugh, J., and Jacobson, I. 1999. The unified modeling language
user guide. New York: Addison Wesley Longman, Inc.
Carriere, J., O’Brian, L., and Verhoef, C. 2003. Reconstruction software architec-
ture. In Software architecture in practice (2nd edition), edited by L. Bass,
P. Clements, and R. Kazman. ISBN 0-321-15495-9, Boston: Addison-Wesley.
Chikofsky, E. J. and Cross, J. H. 1990. Reverse engineering and design recovery:
A taxonomy. Software, IEEE 7(1), ISSN: 0740-7459, pp. 13–17.
Cimitile, A. and De Carlini, U. 1991. Reverse engineering: Algorithms for
Program Graph Production. Software—Practice and Experience 21(5):
519–537.
Durkin, J. 1994. Expert system design and development. Prentice Hall Inter-
national Editions. New Jersey: MacMillian Publishing Company.
Gall, H., Klösch, R., and Mittermeir, R. 1996. Using domain knowledge to
improve reverse engineering. International Journal on Software Engineering
and Knowledge Engineering, World-Scientific Publishing, 6(3): 477–505.
Gruber, T. B. 1993. A translation approach to portable ontologies. Knowledge
Acquisition 5(2): 199–220.
Håkansson, A. 2001. UML as an approach to Modeling Knowledge in Rule-
based Systems. (ES2001) The Twenty-first SGES International Conference
on Knowledge Based Systems and Applied Artificial Intelligence. Peterhouse
College, Cambridge, UK; December 10th–12th.
USING REENGINEERING FOR KNOWLEDGE-BASED SYSTEMS 823
Håkansson, A. 2002. Comparing two knowledge based system shells using usabil-
ity testing. Technical report. TRITA-NA-, NADA, KTH, Stockholm,
March, IPLab-199.
Håkansson, A. 2003. Supporting Illustration and Modification of the Reasoning
Strategy by Visualisation. (SCAI’03) The Eighth Scandinavian Conference
on Artificial Intelligence, Bergen, Norway, November 2th–4th.
Håkansson, A. 2004. An expert system for the environment impact assessment
method. Research Report, ACTA, Sweden 2004:1: Department of Information
Downloaded by [University of Illinois Chicago] at 18:02 01 December 2014