Tutorial On Rough Sets
Tutorial On Rough Sets
2. U U R U R R R = = = =
) ( ) ( ; ) ( ) (
3. ) ( ) ( ( Y R X R Y X R
=
4. ) ( ) ( ) ( Y R X R Y X R
=
5. ) ( ) ( ) ( Y R X R Y X R
Granules of knowledge The set of objects
The upper
approximation
The set The lower
approximation
11
6. ) ( ) ( ) ( Y R X R Y X R
7. ) ( ) ( & ) ( ) ( Y R X R Y R X R Y X
8. ) ( ) ( X R X R
=
9. ) ( ) ( X R X R
=
10. ) ( ) ( ) ( X R X R R X R R
= =
11. ) ( ) ( ) ( X R X R R X R R
= =
It is easily seen that the lower and the upper approximations of a set are, respectively the
interior and closure of this set in the topology generated by the indiscernibility relation.
One can define the following four basic classes of rough sets, i.e., four categories of
vagueness:
1. A set X is roughly R-definable, iff ( ) X R
*
and ( ) U X R
*
.
2. A set X is internally R-undefinable, iff ( ) = X R
*
and ( ) U X R
*
.
3. A set X is externally R-undefinable, iff ( ) X R
*
and ( ) U X R =
*
.
4. A set X is totally R-undefinable, iff ( ) = X R
*
and ( ) U X R =
*
.
The intuitive meaning of this classification is the following.
A set X is roughly R-definable means that with respect to R we are able to decide for some
elements of U that they belong to X and for some elements of U that they belong to X.
A set X is internally R-undefinable means with respect to R we are able to decide for some
elements of U that they belong to X, but we are unable to decide for any element of U
whether it belongs to X.
A set X is externally R-undefinable means that with respect to R we are able to decide for
some elements of U that they belong to X, but we are unable to decide for any element of U
whether it belongs to X.
A set X is totally R-undefinable means that with respect to R we are unable to decide for
any element of U whether it belongs to X or X.
A rough set X can be also characterized numerically by the following coefficient
( )
( )
( ) X R
X R
X
R
*
*
=
called the accuracy of approximation, where X denotes the cardinality of X .
Obviously ( ) . 1 0 X
R
If ( ) 1 = X
R
then X is crisp with respect to R (X is precise with
respect to R), and otherwise, if ( ) 1 < X
R
, X is rough with respect to R (X is vague with
respect to R).
1.2 Rough Membership Function
Rough sets can be also defined by using, instead of approximations, a rough membership
function proposed in [2].
12
In classical set theory, either an element belongs to a set or it does not. The corresponding
membership function is the characteristic function for the set, i.e. the function takes values 1
and 0, respectively. In the case of rough sets, the notion of membership is different. The
rough membership function quantifies the degree of relative overlap between the set X and the
equivalence class R(x) to which x belongs. It is defined as follows:
> < 1 , 0 : U
R
X
where
( )
( )
( ) | |
| |
x R
x R X
x
R
X
=
and |X| denotes the cardinality of X.
The rough membership function expresses conditional probability that x belongs to X given
R and can be interpreted as a degree that x belongs to X in view of information about x
expressed by R.
The meaning of rough membership function can be depicted as shown in Fig.2.
Fig. 2
The rough membership function can be used to define approximations and the boundary
region of a set, as shown below:
( ) ( ) { } 1 : = =
x U x X R
R
X
,
( ) ( ) { } 0 : > =
x U x X R
R
X
,
( ) ( ) { } 1 0 : < < = x U x X RN
R
X R
.
1 ) ( 0 < < x
R
X
X
X
R(x)
x
R(x)
x
R(x)
x
0 ) ( = x
R
X
1 ) ( = x
R
X
13
It can be shown that the rough membership function has the following properties [2]:
1. 1 ) ( = x
R
X
iff ) (
*
X R x
2. 0 ) ( = x
R
X
iff ) (
*
X R U x
3. 1 ) ( 0 < < x
R
X
iff ) ( X RN x
R
4. ) ( 1 ) ( x x
R
X
R
X U
=
for any xU
5.
) (x
R
Y X
max )) ( ), ( ( x x
R
Y
R
X
for any xU
6.
) (x
R
Y X
min )) ( ), ( ( x x
R
Y
R
X
for any xU
From the above properties it follows that the rough membership differs essentially from the
fuzzy membership, for properties 5 and 6 show that the membership for union and
intersection of sets, in general, cannot be computed as in the case of fuzzy sets from their
constituents membership. Thus formally the rough membership is a generalization of the
fuzzy membership. Besides, the rough membership function, in contrast to fuzzy membership
function, has a probabilistic flavour.
The formulae for the lower and upper set approximations can be generalized to some
arbitrary level of precision
(
\
|
1 ,
2
1
by means of the rough membership function in the
following way:
( ) ( ) { }
= x U x X R
R
X
:
*
( ) ( ) { }
> = 1 :
*
x U x X R
R
X
Note that the lower and upper approximations as originally formulated are obtained as a
special case with 0 . 1 = .
Approximations of concepts are constructed on the basis of background knowledge.
Obviously, concepts are also related to unseen so far objects. Hence it is very useful to define
parameterized approximations with parameters tuned in the searching process for
approximations of concepts. This idea is crucial for construction of concept approximations
using rough set methods. For more information about the parameterized approximation spaces
the reader is referred to [3].
Rough sets can thus approximately describe sets of patients, events, outcomes, etc.
that may be otherwise difficult to circumscribe.
References
[1] Z. Pawlak: Rough sets, Int. J. of Information and Computer Sciences, 11, 5, 341-356, 1982.
[2] Z. Pawlak, A. Skowron: Rough membership function, in: R. E Yeager, M. Fedrizzi and J.
Kacprzyk (eds.), Advaces in the Dempster-Schafer of Evidence, Wiley, New York, 1994, 251-
271.
[3] A. Skowron , Z. Pawlak, J. Komorowski, L. Polkowski,: A rough set perspective on data and
knowledge, in W. Kloesgen, J. ytkow (Eds.): Handbook of KDD. Oxford University Press,
Oxford (2002), 134-149.
14
2. Rough Sets in Data Analysis
In this section we define basic concepts of rough set theory in terms
of data, in contrast to general formulation presented in Section 1. This
is necessary if we want to apply rough sets in data analysis.
2.1 Information Systems
A data set is represented as a table, where each row represents a case, an event, a patient,
or simply an object. Every column represents an attribute (a variable, an observation, a
property, etc.) that can be measured for each object; the attribute may be also supplied by a
human expert or the user. Such table is called an information system. Formally, an
information system is a pair ) , ( A U S = where U is a non-empty finite set of objects called the
universe and A is a non-empty finite set of attributes such that
a
V U a : for every a .
The set
a
V is called the value set of a.
Example 2.1. Let us consider a very simple information system shown in Table 1. The set
of objects U consists of seven objects:
1
x ,
2
x ,
3
x ,
4
x ,
5
x ,
6
x ,
7
x , and the set of attributes
includes two attributes: Age and LEMS (Lower Extremity Motor Score).
Age LEMS
1
x
16-30 50
2
x
16-30 0
3
x
31-45 1-25
4
x
31-45 1-25
5
x
46-60 26-49
6
x
16-30 26-49
7
x
46-60 26-49
Table 1
One can easily notice that objects
3
x and
4
x as well as
5
x and
7
x have exactly the same
values of attributes. The objects are (pairwise) indiscernible using the available attributes.
In many applications there is an outcome of classification that is known. This a posteriori
knowledge is expressed by one distinguished attribute called decision attribute; the process is
known as supervised learning. Information systems of this kind are called decision systems. A
decision system (a decision table) is any information system of the form { } ( ) d A U S = , ,
where A d is the decision attribute. The elements of A are called conditional attributes or
simply conditions. The decision attribute may take several values though binary outcomes are
rather frequent.
15
Example 2.2. Consider a decision system presented in Table 2. The table includes the
same seven objects as in Example 2.1 and one decision attribute (Walk) with two values: Yes,
No.
Age LEMS Walk
1
x
16-30 50 Yes
2
x
16-30 0 No
3
x
31-45 1-25 No
4
x
31-45 1-25 Yes
5
x
46-60 26-49 No
6
x
16-30 26-49 Yes
7
x
46-60 26-49 No
Table 2
One may again notice that cases
3
x cases
4
x as well as
5
x and
7
x still have exactly the
same values of conditions, but the first pair has different value of the decision attribute while
the second pair has the same value.
2.2 Indiscernibility Relation
A decision system expresses all the knowledge about the model. This table may be
unnecessarily large in part because it is redundant in at least two ways. The same or
indiscernible objects may be represented several times, or some of the attributes may be
superfluous. We shall look into these issues now.
Let ( ) A U S , = be an information system, and A B . A binary relation ( ) B IND
S
defined
in the following way
( ) ( ) ( ) ( ) { } ' | ' ,
2
x a x a B a U x x B IND
S
= =
is called the B-indiscernibility relation. It is easy to see that ( ) B IND
S
is equivalence relation.
If ( ) ( ) B IND x x
S
' , , then objects x and ' x are indiscernible from each other by attributes from
B. The equivalence classes of the B-indiscernibility relation are denoted [ ]
B
x . The subscript S
in the indiscernibility relation is usually omitted if it is clear which information system is
meant.
Some extensions of standard rough sets do not require from a relation to be transitive (see,
for instance, [8]). Such a relation is called tolerance relation or similarity.
Example 2.3. In order to illustrate how a decision system from Table 2 defines an
indiscernibility relation, we consider the following three non-empty subsets of the conditional
attributes: { }, Age { } LEMS and { } LEMS Age, .
16
If we take into consideration the set { } LEMS then objects
3
x and
4
x belong to the same
equivalence class; they are indiscernible. From the same reason,
6 5
, x x and
7
x belong to
another indiscernibility class. The relation IND defines three partitions of the universe.
{ } ( ) { } { } { } { }
7 5 4 3 6 2 1
, , , , , , x x x x x x x Age IND = ,
{ } ( ) { } { } { } { } { }
7 6 5 4 3 2 1
, , , , , , x x x x x x x LEMS IND = ,
{ } ( ) { } { } { } { } { } { }
6 7 5 4 3 2 1
, , , , , , , x x x x x x x LEMS Age IND = .
2.3. Set Approximation
In this subsection, we define formally the approximations of a set using the discernibility
relation.
Let ( ) A U S , = be an information system and let A B , and U X . Now, we can
approximate a set X using only the information contained in the set of attributes B by
constructing the B-lower and B-upper approximations of X, denoted X B and X B
respectively, where [ ] { } X x x X B
B
= | and [ ] { } = X x x X B
B
| .
Analogously as in a general case, the objects in X B can be with certainly classified as
members of X on the basis of knowledge in B, while the objects in X B can be only classified
as possible members of X on the basis of knowledge in B. The set ( ) X B X B X BN
B
= is
called the B-boundary region of X, and thus consists of those objects that we cannot
decisively classify into X on the basis of knowledge in B. The set X B U is called the B-
outside region of X and consists of those objects which can be with certainly classified as do
not belonging to X (on the basis of knowledge in B). A set is said to be rough (respectively
crisp) if the boundary region is non-empty (respectively empty).
Example 2.4. Let ( ) { } Yes x Walk x X = = : , as given by Table 2. In fact, the set X
consists of three objects:
6 4 1
, , x x x . Now, we want to describe this set in terms of the set of
conditional attributes A ={Age, LEMS}. Using the above definitions, we obtain the following
approximations: the A-lower approximation { }
6 1
, x x X A = , the A-upper approximation
{ }
6 4 3 1
, , , x x x x X A = , the A-boundary region ( ) { }
4 3
, x x X BN
S
= , and the A-outside region
{ }
7 5 2
, , x x x X A U = . It is easy to see that the set X is rough since the boundary region is not
empty. The graphical illustration of approximations of the set X together with the equivalence
classes contained in the corresponding approximations are shown in Fig. 3.
17
{{x2},{x5,x7}}
{{x3,x4}}
Yes
{{x1},{x6}}
No
Fig. 3
One can easily the following properties of approximations:
1. ( ) ( ) X B X X B
2. ( ) ( ) ( ) U U B B B = = = ,
3. ( ) ( ) ( ) Y B X B Y X B =
4. ( ) ( ) ( ) Y B X B Y X B =
5. Y X implies ( ) ( ) Y B X B and ( ) ( ) Y B X B
6. ( ) ( ) ( ) Y B X B Y X B
7. ( ) ( ) ( ) Y B X B Y X B
8. ( ) ( ) X B X B =
9. ( ) ( ) X B X B =
10. ( ) ( ) ( ) ( ) ( ) X B X B B X B B = =
11. ( ) ( ) ( ) ( ) ( ) X B X B B X B B = =
where X denotes U X.
It is easily seen that the lower and the upper approximations of a set, are respectively, the
interior and the closure of this set in the topology generated by the indiscernibility relation.
One can define the following four basic classes of rough sets, i.e., four categories of
vagueness:
1. A set X is roughly B-definable, iff ( ) X B and ( ) U X B ,
2. A set X is internally B-undefinable, iff ( ) = X B and ( ) U X B ,
18
3. A set X is externally B-undefinable, iff ( ) X B and ( ) U X B = ,
4. A set X is totally B-undefinable, iff ( ) = X B and ( ) U X B = .
The intuitive meaning of this classification is the following.
A set X is roughly B-definable means that with the help of B we are able to decide for
some elements of U that they belong to X and for some elements of U that they belong to X.
A set X is internally B-undefinable means that using B we are able to decide for some
elements of U that they belong to X, but we are unable to decide for any element of U
whether it belong to X.
A set X is externally B-undefinable means that using B we are able to decide for some
elements of U that they belong to X, but we are unable to decide for any element of U whether
it belongs to X.
A set X is totally B-undefinable means that using B we are unable to decide for any
element of U whether it belongs to X or X.
A rough set can be also characterized numerically by the following coefficient
( )
( )
( ) X B
X B
X
B
=
called the accuracy of approximation, where X denotes the cardinally of X .
Obviously ( ) . 1 0 X
B
If ( ) 1 = X
B
, X is crisp with respect to B (X is precise with
respect to B), and otherwise, if ( ) 1 < X
B
X is rough with respect to B (X is vague with
respect to B).
Example 2.5. Let us consider a decision system shown in Table 3 in order to explain the
above definitions.
Patient Headache Muscle-pain Temperature Flu
p1 no yes high yes
p2 yes no high yes
p3 yes yes very high yes
p4 no yes normal no
p5 yes no high no
p6 no yes very high yes
Table 3
Let ( ) { } Yes x Flu x X = = : = { p1, p2, p3, p6} and the set of attributes B = {Headache,
Muscle-pain, Temperature}. The set X is roughly B-definable, because ( ) X B
= } 6 , 3 , 1 { p p p and ( ) X B U p p p p p = } 6 , 5 , 3 , 2 , 1 { . For this case we get
B
(X) = 3/5. It
means that the set X can be characterized partially employing symptoms (attributes)
Headache, Muscle-pain and Temperature. Taking only one symptom B = {Headache} we get
( ) X B = and ( ) X B U = , which means that the set X is totally undefinable in terms of
attribute Headache, i.e., this attribute is not characteristic for the set X whatsoever. However,
19
taking single attribute B = {Temperature} we get ( ) X B } 6 , 3 { p p = and ( ) X B
} 6 , 5 , 3 , 2 , 1 { p p p p p = , thus the set X is again roughly definable, but in this case we obtain
B
(X)= 2/5, which means that the single symptom Temperature is less characteristic for the
set X, than the whole set of symptoms, and patient p1 cannot be now classified as having flu
in this case.
2.4 Rough Sets and Membership Function
In classical set theory, either an element belongs to a set or it does not. The corresponding
membership function is the characteristic function for the set, i.e. the function takes values 1
and 0, respectively. In the case of rough sets, the notion of membership is different. The
rough membership function quantifies the degree of relative overlap between the set X and the
equivalence [ ]
B
x class to which x belongs. It is defined as follows:
[ ] 1 , 0 : U
B
X
and ( )
[ ]
[ ]
B
B B
X
x
X x
x
= .
Obviously ] 1 , 0 [ ) ( x
B
X
. A value of the membership function ) (x
B
X
is a kind of conditional
probability, and can be interpreted as a degree of certainty to which x belongs to X (or 1
) (x
B
X
), as a degree of uncertainty).
The rough membership function can be used to define approximations and the boundary
region of a set, as shown below:
( ) X B } 1 ) ( : { = = x U x
B
X
,
( ) X B } 0 ) ( : { > = x U x
B
X
,
} 1 ) ( 0 : { ) ( < < = x U x X BN
B
X B
.
The rough membership function has the following properties [6]:
1. 1 ) ( = x
B
X
iff x ( ) X B ,
2. 0 ) ( = x
B
X
iff x - ( ) X B ,
3. 1 ) ( 0 < < x
B
X
iff ) ( X BN x
B
,
4. If ( ) B IND
S
} : ) , {( U x x x = , then ) (x
B
X
is the characteristic function of X,
5. If x ( ) B IND
S
y, then ) (x
B
X
= ) ( y
B
X
provided ( ) B IND
S
,
6. ) ( 1 ) ( x x
B
X
B
X U
=
) (x
B
Y X
max )) ( ), ( ( x x
B
Y
B
X
for any xU,
8.
) (x
B
Y X
min )) ( ), ( ( x x
B
Y
B
X
for any xU.
The rough membership function can be interpreted as a frequency-based estimate of
Pr ( ) u X x | , the conditional probability that object x belongs to set X, given knowledge u of
the information signature of x with respect to attributes B.
20
The formulae for the lower and upper set approximations can be generalized to some
arbitrary level of precision
(
\
|
1 ,
2
1
by means of the rough membership function [6], as
shown below.
( ) { }
= x x X B
B
X
:
( ) { } > = 1 : x x X B
B
X
Note that the lower and upper approximations as originally formulated are obtained as a
special case with 0 . 1 = .
Approximations of concepts are constructed on the basis of background knowledge.
Obviously, concepts are also related to unseen so far objects. Hence, it is very useful to define
parameterized approximations with parameters tuned in the searching process for
approximations of concepts. This idea is crucial for construction of concept approximations
using rough set methods.
2.5 Dependency of Attributes
An important issue in data is discovering dependencies between attributes. Intuitively, a
set of attributes D depends totally on a set of attributes C, denoted D C , if all values of
attributes from D are uniquely determined by values of attributes from C. In other words, D
depends totally on C, if there exists a functional dependency between values of D and C.
Formally, a functional dependency can be defined in the following way. Let D and C be
subsets of A.
We will say that D depends on C in a degree k ( ) 1 0 k , denoted
k
D C
If
( )
( )
U
D POS
D C k
C
= = ,
where ( ) ( ) X C D POS
D U X
C
/
= U called a positive region of the partition U/D with respect to C,
is the set of all elements of U that can be uniquely classified to blocks of the partition U/D, by
means of C.
Obviously
( )
( )
=
D U X
U
x C
D C
/
.
If k=1 we say that D depends totally on C, and if k<1, we say that D depends partially (in
a degree k) on C.
The coefficient k expresses the ratio of all elements of the universe, which can be property
classified to blocks of the partition U/D, employing attributes C and will be called the degree
of the dependency.
Example 2.6. Let us consider again a decision system shown in Table 3. For example, for
dependency {Headache, Muscle-pain, Temperature} {Flu} we get k = 4/6 = 2/3, because
21
four out of six patients can be uniquely classified as having flu or not, employing attributes
Headache, Muscle-pain and Temperature.
If we were interested in how exactly patients can be diagnosed using only the attribute
Temperature, that is in the degree of the dependence {Temperature}{Flu}, we would get
k = 3/6 = 1/2, since in this case only three patients p3, p4 and p6 out of six can be uniquely
classified as having flu. In contrast to the previous case patient p4 cannot be classified now as
having flu or not. Hence the single attribute Temperature offers worse classification than the
whole set of attributes Headache, Muscle-pain and Temperature. It is interesting to observe
that neither Headache nor Muscle-pain can be used to recognize flu, because for both
dependencies {Headache}{Flu} and {Muscle-pain}{Flu} we have k = 0.
It can be easily seen that if D depends totally on C then ( ) ( ) D IND C IND . This means
that the partition generated by C is finer than the partition generated by D. Let us notice that
the concept of dependency discussed above corresponds to that considered in relational
databases.
Summing up: D is totally (partially) dependent on C, if employing C all (possibly some)
elements of the universe U may be uniquely classified to blocks of the partition U/D.
2.6 Reduction of Attributes
In the Section 2.2 we investigated one of the natural aspects of reducing data which
concerns identifying equivalence classes, i.e. objects that are indiscernible using the available
attributes. In order to make some savings only one element of the equivalence class is needed
to represent the entire class. The other aspect in data reduction is to keep only those attributes
that preserve the indiscernibility relation and, consequently, set approximation. The rejected
attributes are redundant since their removal cannot worse the classification.
In order to express the above idea more precisely we need some auxiliary notions.
Let S=(U,A) be an information system, A B , and let B a .
We say that a is dispensable in B if ( ) ( ) } {a B IND B IND
S S
= ; otherwise a is
indispensable in B.
A set B is called independent if all its attributes are indispensable.
Any subset B' of B is called a reduct of B if B' is independent and ( ) ( ) B IND B IND
S S
= ' .
Hence, a reduct is a set of attributes that preserves partition. It means that a reduct is the
minimal subset of attributes that enables the same classification of elements of the universe as
the whole set of attributes. In other words, attributes that do not belong to a reduct are
superfluous with regard to classification of elements of the universe.
There is usually several such subsets of attributes and those which are minimal are called
reducts. Computing equivalence classes is straightforward. Finding a minimal reduct (i.e.
reduct with a minimal number of attributes) among all reducts is NP-hard [6]. It is easy to that
the number of reducts of an information system with m attributes may be equal to
|
.
|
\
|
m
m 2 /
22
This means that computing reducts is not a trivial task. This fact is one of the bottlenecks
of the rough set methodology. Fortunately, there exist good heuristics (e.g. [1],[7],[9],[10])
based on genetic algorithms that compute sufficiently many reducts in often acceptable time,
unless the number of attributes is very high.
The reducts have several important properties. In what follows we will present two of
them. First, we define a notion of a core of attributes.
Let B be a subset of A. The core of B is the set off all indispensable attributes of B.
The following is an important property, connecting the notion of the core and reducts
I
) ( ) ( B d Re B Core = ,
where Red(B) is the set of all reducts of B.
Because the core is the intersection of all reducts, it is included in every reduct, i.e., each
element of the core belongs to some reduct. Thus, in a sense, the core is the most important
subset of attributes, for none of its elements can be removed without affecting the
classification power of attributes.
To further simplification of an information table we can eliminate some values of attribute
from the table in such a way that we are still able to discern objects in the table as the original
one. To this end we can apply similar procedure as to eliminate superfluous attributes, which
is defined next.
We will say that the value of attribute aB, is dispensable for x, if [x]
B
= [x]
B {a}
;
otherwise the value of attribute a is indispensable for x.
If for every attribute aB the value of a is indispensable for x, then B will be called
orthogonal for x.
A subset B' B is a value reduct of B for x, iff B' is orthogonal for x and [x]
B
= [x]
B'
.
The set of all indispensable values of attributes in B for x will be called the value core of
B for x, and will be denoted CORE
x
(B).
Also in this case we have
) ( ) ( B d Re B CORE
x x
I
= ,
where Red
x
(B) is the family of all reducts of B for x.
Suppose we are given a dependency C D. It may happen that the set D depends not on
the whole set C but on its subset C' and therefore we might be interested to find this subset. In
order to solve this problem we need the notion of a relative reduct, which will be defined and
discussed next.
Let C,D A. Obviously, if C' C is a D-reduct of C, then C' is a minimal subset of C
such that ) , ( ) , ( D C D C = .
We will say that attribute aC is D-dispensable in C, if POS
C
(D) = POS
(C{a})
(D);
otherwise the attribute a is D-indispensable in C.
If all attributes aC are C-indispensable in C, then C will be called D-independent.
A subset C' C is a D-reduct of C, iff C' is D-independent and POS
C
(D) = POS
C'
(D).
The set of all D-indispensable attributes in C will be called D-core of C, and will be
denoted by CORE
D
(C). In this case we have also the property
23
) ( ) ( C d Re C CORE
D D I
= ,
where Red
D
(C) is the family of all D-reducts of C. If D = C we will get the previous
definitions.
Example 2.7. In Table 3 there are two relative reducts with respect to Flu, {Headache,
Temperature} and {Muscle-pain, Temperature} of the set of condition attributes {Headache,
Muscle-pain, Temperature}. That means that either the attribute Headache or Muscle-pain
can be eliminated from the table and consequently instead of Table 3 we can use either
Table 4
Patient Headache Temperature Flu
p1 no high yes
p2 yes high yes
p3 yes very high yes
p4 no normal no
p5 yes high no
p6 no very high yes
Table 4
or Table 5
Patient Muscle-
pain
Temperature Flu
p1 yes high yes
p2 no high yes
p3 yes very high yes
p4 yes normal no
p5 no high no
p6 yes very high yes
Table 5
For Table 3 the relative core of with respect to the set {Headache, Muscle-pain,
Temperature} is the Temperature. This confirms our previous considerations showing that
Temperature is the only symptom that enables, at least, partial diagnosis of patients.
We will need also a concept of a value reduct and value core. Suppose we are given a
dependency D C where C is relative D-reduct of C. To further investigation of the
dependency we might be interested to know exactly how values of attributes from D depend
on values of attributes from C. To this end we need a procedure eliminating values of
attributes form C which does not influence on values of attributes from D.
We say that value of attribute aC, is D-dispensable for xU, if [x]
C
[x]
D
implies
[x]
C{a}
[x]
D
; otherwise the value of attribute a is D-indispensable for x.
If for every attribute aC value of a is D-indispensable for x, then C will be called
D-independent (orthogonal) for x.
24
A subset C' C is a D-reduct of C for x (a value reduct), iff C' is D-independent for x and
[x]
C
[x]
D
implies [x]
C'
[x]
D
.
The set of all D-indispensable for x values of attributes in C will be called the D-core of C
for x (the value core), and will be denoted ) (C CORE
x
D
.
We have also the following property
) ( ) ( C d Re C CORE
x
D
x
D I
= ,
where ) (C d Re
x
D
is the family of all D-reducts of C for x.
Using the concept of a value reduct, Table 4 and Table 5 can be simplified as follows:
Patient Headache Temperature Flu
p1 no high yes
p2 yes high yes
p3
very high yes
p4
normal no
p5 yes high no
p6
very high yes
Table 6
Patient Muscle-
pain
Temperature Flu
p1 yes high yes
p2 no high yes
p3
very high yes
p4
normal no
p5 no high no
p6
very high yes
Table 7
The following important property
B' B B', where B' is a reduct of B,
connects reducts and dependency.
Besides, we have:
If B C, then B C', for every C' C,
in particular
If B C, then B {a}, for every aC.
Moreover, we have:
25
If B' is a reduct of B, then neither {a} {b} nor {b} {a} holds, for every a, bB', i.e.,
all attributes in a reduct are pairwise independent.
2.7 Discernibility Matrices and Functions
In order to compute easily reducts and the core we can use discernibility matrix [6], which is
defined below.
Let S=(U,A) be an information system with n objects. The discernibility matrix of S is a
symmetric n n matrix with entries
ij
c as given below.
( ) ( ) { }
j i ij
x a x a a c = | for i,j=1,,n
Each entry thus consists of the set of attributes upon which objects
i
x and
j
x differ. Since
discernibility matrix is symmetric and =
ii
c (the empty set) for i=1,,n. Thus, this matrix
can be represented using only elements in its lower triangular part, i.e. for n i j < 1 .
With every discernibility matrix one can uniquely associate a discernibility function
defined below.
A discernibility function
S
f for an information system S is a Boolean function of m
Boolean variables
*
1
a ,,
*
m
a (corresponding to the attribute
m
a a ,...,
1
) defined as follows.
( ) { } =
ij ij m S
c n i j c a a f , 1 | ,...,
* * *
1
where { }
ij ij
c a a c = |
* *
. The set of all prime implicants
2
of
S
f determines the set of all
reducts of A.
Example 2.8 Using the above definitions for the information system S from Example 2.5
(Table. 3), we obtain the following discernibility matrix presented in Table 8 and
discernibility function presented below.
p1 p2 p3 p4 p5 p6
p1
p2 H,M
p3 H,T M,T
p4 T,F H,M,T,F H,T,F
p5 H,M,F F M,T,F H,M,T
p6 T H,M,T H T,F H,M,T,F
Table 8
2
An implicant of a Boolean function f is any conjunction of literals (variables or their negations) such that if the
values of these literals are true under an arbitrary valuation v of variables then the value of the function f under v
is also true. A prime implicant is a minimal implicant (with respect to the number of its literals). Here we are
interested in implicants of monotone Boolean functions only, i.e. functions constructed without negation (see
Subsection ???).
26
In this table H,M,T,F denote Headache, Muscle-pain, Temperature and Flu, respectively.
The discernibility function for this table is
( ) = F T M H f
S
, , , (HM)(HT)(TF)(HMF)T
(MT)(HMTF)F(HMT)
(HTF)(MTF)H
(HMT)(TF)
(HMTF)
where denotes the disjunction and the conjunction is omitted in the
formula.
Let us also notice that each row in the above discernibility function corresponds to one
column in the discernibility matrix. This matrix is symmetrical with the empty diagonal. Each
parenthesized tuple is a conjunction in the Boolean expression, and where the one-letter
Boolean variables correspond to the attribute names in an obvious way. After simplification,
the discernibility function using laws of Boolean algebra we obtain the following expression
HTF,
which says that there is only one reduct {H,T,F} in the data table and it is the core.
Relative reducts and core can be computed also using the discernibility matrix, which
needs slight modification
) ( ) ( : {
j i ij
x a x a C a c = and )} , (
j i
x x w ,
where or ) ( and ) ( , ( D POS x D POS x x x w
C j C i j i
or ) ( and ) ( D POS x D POS x
C j C i
) ( ) , ( and ) ( , D IND x x D POS x x
j j C j i
for n j i , , 2 , 1 , K = .
If the partition defined by D is definable by C then the condition ) , (
j i
x x w in the above
definition can be reduced to ) ( ) , ( D IND x x
j i
.
Thus, entry c
ij
is the set of all attributes which discern objects x
i
and x
j
that do not belong to
the same equivalence class of the relation IND(D).
If we instead construct a Boolean function by restricting the conjunction to only run over
column k in the discernibility matrix (instead of over all columns), we obtain the so-called k-
relative discernibility function. The set of all prime implicants of this function determines the
set of all k-relative reducts of A. These reducts reveal the minimum amount of information
needed to discern U x
k
(or, more precisely, [ ] U x
k
) from all other objects.
Example 2.9 Considering the information system S from Example 2.5 as a decision
system, we can illustrate the above considerations by computing relative reducts for the set of
attributes {Headache, Muscle-pain, Temperature} with respect to Flu. The corresponding
discernibility matrix is shown in Table 9.
27
p1 p2 p3 p4 p5 p6
p1
p2
p3
p4
T H, M, T
p5
H, M M, T
p6
T H, M, T
Table 9
The discernibility function for this table is
) )( )( ( T M T M H M H T .
After simplication the discernibility function we obtain the following expression
TH TM,
which represents two reducts TH and TM in the data table and T is the core.
2.8 Decision Rule Synthesis
The reader has certainly realized that the reducts (of all the various types) can be used to
synthesize minimal decision rules. Once the reducts have been computed, the rules are easily
constructed by overlaying the reducts over the originating decision table and reading off the
values.
Example 2.10 Given the reduct {Headache, Temperature} in Table 4, the rule
read off the first object is "if Headache is no and Temperature is high then
Flu is yes".
We shall make these notions precise.
Let } { , ( d A U S = be a decision system and let . } : {
d a
V A a V V = U Atomic
formulae over } {d A B and V are expressions of the form a = v; they are called
descriptors over B and V, where B a and
a
V v . The set ) , ( V B F of formulae over B and V
is the least set containing all atomic formulae over B and V and closed with respect to the
prepositional connectives (conjunction), (disjunction) and (negation).
Let || || ). , (
A
V B F denotes the meaning of in the decision table A which is the set
of all objects in U with the property . These sets are defined as follows:
1. if is of the form a = v then } ) ( | { || || v x a U x
a
= =
2. || || || || ||; ' || || || || ' || ||; ' || || || || ' ||
A A A A A A A A
U = = =
The set ) , ( V B F is called the set of conditional formulae of A and is denoted C(B,V).
A decision rule for A is any expression of the form v d = , where ) , ( V B C ,
d
V v and 0 || ||
A
. Formulae and v d = are referred to as the predecessor and the
successor of decision rule v d = .
28
Decision rule v d = is true in A if, and only if, || || || ||
A A
v d = ; || ||
A
is the set of
objects matching the decision rule; || || || ||
A A
v d = is the set of objects supporting the
rule.
Example 2.11 Looking again at Tab. 4, some of the rules are, for example:
(Headache= no) (Temperature= high) (Flu= yes),
(Headache= yes) (Temperature= high) (Flu= yes),
The first rule is true in Tab. 4 while the second one is not true in that table.
Several numerical factors can be associated with a synthesized rule. For example, the
support of a decision rule is the number of objects that match the predecessor of the rule.
Various frequency-related numerical quantities may be computed from such counts like the
accuracy coefficient equal to
||| |||
||| || || |||
A
A A
v d
=
For a systematic overview of rule synthesis see e.g. [1], [9], [10].
Remark. The rough set theory has found many applications in medical data analysis,
finance, voice recognition, image processing and others. However the approach presented in
this section is too simple to many real-life applications and was extended in many ways by
various authors. The detailed discussion of the above issues can be found in [5], [7] and the
internet (e.g., http://www.rsds.wsiz.rzeszow.pl)
References
[1] J.W.Grzymaa-Busse (1997): A new version of the rule induction system LERS. Fundamenta
Informaticae 31, pp. 27-39.
[2] Z. Pawlak: Rough sets, International Journal of Computer and Information Sciences, 11, 341-
356, 1982.
[3] Z. Pawlak: Rough Sets Theoretical Aspects of Reasoning about Data, Kluwer Academic
Publishers, Boston, London, Dordrecht, 1991.
[4] Z. Pawlak, A. Skowron: Rough membership functions, in: R. R Yaeger, M. Fedrizzi and J.
Kacprzyk (eds.), Advances in the Dempster Shafer Theory of Evidence, John Wiley & Sons,
Inc., New York, Chichester, Brisbane, Toronto, Singapore, 1994, 251-271.
[5] L. Polkowski: Rough Sets Mathematical Foundations, Advances in Soft Computing,
Physica-Verlag, Springer-Verlag Company, 2002, 1-534.
[6] A. Skowron, C. Rauszer: The discernibility matrices and functions in information systems, in:
R. Sowiski (ed.), Intelligent Decision Support. Handbook of Applications and Advances of
the Rough Set Theory, Kluwer Academic Publishers, Dordrecht, 1992, 311-362.
[7] A. Skowron et al: Rough set perspective on data and knowledge, Handbook of Data Mining
and Knowledge Discovery (W. Klsgen, J. ytkow eds.), Oxford University Press, 2002, 134-
149.
[8] A. Skowron, L. Polkowski, J. Komorowski: Learning Tolerance Relations by Boolean
Descriptors, Automatic Feature Extraction from Data Tables. In S. Tsumoto, S. Kobayashi, T.
Yokomori, H. Tanaka, and A. Nakamura (Eds.) (1996), Proceedings of the Fourth
International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD1996),
The University of Tokyo, November 6-8, pp. 11-17.
29
[9] A. Skowron (1995): Synthesis of adaptive decision systems from experimental data. In A.
Aamodt and J. Komorowski (Eds.)(1995), Proc. Fifth Scandinavian Conference on Artificial
Intelligence. Trondheim, Norway, May 29-31, Frontiers. In: Artificial Intelligence and
Applications 28, IOS Press, pp. 220-238.
[10] J. Stefanowski (1998): On rough set based approaches to induction of decision rules. In L.
Polkowski, A. Skowron (Eds.): Rough Sets in Knowledge Discovery 1. Methodology and
Applications. Physica-Verlag, Heidelberg 1998, pp. 500-529.
30
3. Rough Sets and Concurrency
Knowledge discovery and data mining [3],[5] is one of important and current research
problems. Discovering relationships between data is the main task of so called machine
learning [12].
The aim of this section is to present an overview of a new methodology of discovering
concurrent data models and decision algorithms from experimental data tables. This work is a
continuation of a new research direction concerning relationships between rough set theory
and concurrency. Such research direction was started by Z. Pawlak in 1992 [17] and later
developed intensively by many authors (see e.g. [6],[8],[13-15],[20],[23-24],[25-29],[30]). In
the section, we consider the following two main problems: (1) discovering concurrent data
models from data tables; (2) discovering decision algorithms from data tables. Foundation of
the proposed approach are rough sets [16], fuzzy sets [1] and Petri nets [11]. The
preprocessing of data [3] and Boolean reasoning [2] are also used. The proposed methodology
is a result of the author and his coauthors long term research [6],[8],[13-15],[20],[23-24],[25-
29],[30].
These problems are current and very important not only with the respect to cognitive
aspect but also to its possible applications. Discovering concurrent systems models from
experimental data tables is very interesting and useful for a number of application domains, in
particular, in Artificial Intelligence, e.g. by speech recognition [3], in biology and molecular
biology, for example, for obtaining the answer concerning the following question: How is the
model of cell evolution dependent on changes of the gene code (see e.g. [31], pp. 780-804)?
Discovering decision algorithms from data tables is very important for real-time applications
in such areas as real-time decision making by groups of intelligent agents (e.g. robots) [19],
navigation of intelligent mobile robots [4], searching information in centralized or distributed
data bases [5] and, in general, in real-time knowledge-based control systems [21].
3.1 Discovering concurrent data models from data tables
This subsection includes an informal description of the first problem in question, together
with a general scheme of its solution. The examples of this problems possible applications
are also indicated. We assume that the reader is familiarized with the basic notions of rough
set theory [16] and Petri nets [11].
Problem 1. Let an experimental data table be given. Construct a concurrent data model
on the base of knowledge extracted from a given data table in such a way that its global states
are consistent with the extracted knowledge.
This problem we can interpret as the synthesis problem of concurrent systems specified by
experimental data tables. We assume that the knowledge extracted from a given data table
includes information on the structure as well as the behavior of the modeled system.
The solution
In order to solve this problem, we propose to realize the following three main stages:
Stage 1. Data Representation.
We assume that the data table S (an information system in Pawlaks sense [16])
representing experimental knowledge is given. It consists of a number of rows labeled by
elements from the set of objects U, which contain the results of sensor measurements
31
represented by the value vector of attributes from A. Values of attributes we interpret as
states of local processes in the modeled system. However, the rows of data table we interpret
as global states of the system. Sometimes, it is necessary to transform the given experimental
data table by taking into account other relevant features (new attributes) instead of the original
ones. This step is necessary when the result concurrent model constructed directly from the
original data table yields a very large model or when the complexity of model synthesis from
the original table is too high. In this case some additional time is necessary to compute the
values of new features, after the results of sensor measurements are given. The input for our
methodology consists of the data table (if necessary, preprocessed in a way as described
above).
In order to represent data, apart from information systems, it is also possible to use
dynamical information systems introduced in [28] and specialized tables (i.e., matrices of
forbidden states and matrices of forbidden transitions) [14] in our considerations. We assume
that information systems include knowledge on global states of modeled concurrent systems
only, however, dynamical information systems additionally accumulate the knowledge on the
next state relation (i.e., transitions between global states in a given system). The specialized
tables include information both on which global states of the modeled system and which
transitions between global states are forbidden. This representation is, in a sense, a dual
representation of dynamical information systems. In the following, we use only information
systems as models for experimental data tables. The proposed generalizations can be also
applied to the remaining data representations mentioned above.
In general, data tables obtained from the measurements are relatively large. Besides, such
data is usually incomplete, imprecise and vague. Rough set methods allow to reduce the large
amounts of source data, as well as pre-process and analyze it. Moreover, computer tools for
data analysis based on rough set methodology are available (see e.g. [9]). This is the reason,
why we use these methods.
Stage 2. Knowledge representation.
First, we generate a set of rules (deterministic and non-deterministic) by applying rough
set methods in the form if then from a given data table S. These rules represent the
knowledge (usually partial) on the modeled system. We consider rules true in a degree CF
(0<CF1) in S, where CF is the number equal to the ratio of the number of objects from U
matching the left and right hand side of the rule, and the number of objects from U matching
only the left hand side of the rule. The number CF is called the certainty factor of the rule. We
assume that there is an object u matching the left hand side of the rule. If CF =1 for a given
rule, then we say that the rule is deterministic; otherwise, it is non-deterministic. Next, using
the result set of rules (or the result concurrent model described below) we can get much more
information on the behavior of the modeled system, constructing so called an extension (in a
degree CF) of a given information system. An extension S (in a degree CF) of a given
information system S is created by adding to S all new global states (rows) of S which are
consistent with all rules true (in a degree CF) in S. If we use only a set of deterministic rules
generated from S, then we can construct so called maximal consistent extension of S.
Stage 3. Transformation of data tables into colored Petri nets.
Now, we transform a set of rules obtained in Stage 2 into a concurrent model represented
in the form of a colored Petri net with the following property: the reachability set of the result
net corresponds one-to-one to an extension of a given information system. The construction of
a colored Petri net for the given data table consists of three steps (levels). Firstly, a net
representing the set of components (functional modules) of a given information is
constructed. Any component (a subset of rules or an information subsystem) in a sense
32
represents the strongest functional module of the system. The components of the system are
computed means of its reducts [25]. Secondly, a net defined by the set of rules of a given
information system, corresponding to all nontrivial dependencies (connections or links)
between the values of attributes belonging to different components of the information system,
to the net obtained in the first step is added. The connections between components represent
constraints which must be satisfied when these functional modules coexist in the system. The
components together with the connections define so called covering of the system. In general,
there are many coverings of a given information system. It means that for a given information
system we can construct its several concurrent models presenting the same behavior but a
different internal structure. Thirdly, the elements (places, transitions and arcs) of the net
defined in steps 1-2 are additionally described according to the definition of a colored Petri
net. The modular approach described above makes the appropriate construction of a net much
clearer. Moreover, the application of colored Petri nets to represent concurrent systems allows
to obtain coherent and clear models (also hierarchical ones this aspect is omitted in the
paper) suitable for further computer analysis and verification [10].
Such approach allows the adaptation of structures of complex concurrent systems to the
new conditions, changing in time, re-engineering (reconstruction) of structures of systems
organization together with optimization of reconstruction costs, and adapting the systems
organization to new requirements [29].
Example 3.1. Consider an information system ) , ( A U S = where a set of objects
} , , , {
4 3 2 1
u u u u U = , a set of attributes } , { b a A = and the values of the attributes are
defined as in Table 10.
U\A a b
1
u
0 1
2
u
1 0
3
u
0 2
4
u
2 0
Table 10
By applying methods for generating rules in minimal form, i.e., with a minimal number of
descriptors on its left hand side, described in [24], we obtain the following set of rules for the
system S: if a=1 then b=0, if a=2 then b=0, if b=1 then a=0, if b=2 then a=0, all of them are
true in degree CF=1 in S. For generating rules from the given information system, we can also
use specialized computer tools, e.g. [9].
After applying the standard Boolean algebra laws to the given set of rules, we obtain the
following Boolean expression: (a=0 AND b=0) OR (a=0 AND b=1) OR (a=0 AND b=2) OR
(a=1 AND b=0) OR (a=2 AND b=0). Using the computed Boolean expression, we can
construct the guard expression corresponding to rules presented above. It has the following
form: [ya =(a=0) AND yb = (b=0) OR ya =(a=0) AND yb =(b=1) OR ya =(a=0) AND yb
=(b=2) OR ya =(a=1) AND yb =(b=0) OR ya =(a=2) AND yb =(b=0)].
33
Fig. 4
The concurrent model of S in the form of colored Petri net constructed by using our
approach is shown in Fig. 4. The guard expression form associated with the transition t (see
Fig. 4) differs slightly from the one presented above. It follows the formal requirements
imposed by the syntax of the CPN ML language implemented into the Design/CPN system
[10].
The set of markings of the constructed net corresponds to all global states consistent with
all rules true in degree CF=1 in S. It is easy to see that the maximal consistent extension S' of
S additionally includes the value vector (0,0). This vector is consistent with all rules true in S.
Thus, the set of all value vectors corresponding to S is equal to {(0,1), (1,0), (0,2), (2,0),
(0,0)}.
3.2 Discovering decision algorithms from data tables
This section presents an informal description of the second problem considered here
together with the general scheme of its solution. The examples of possible applications of this
problem are also indicated.
Problem 2. Let a decision data table be given. Construct a concurrent decision algorithm
on the base of the knowledge extracted from a given decision data table in such a way that:
(1) its computation leading to decision making has minimal length and/or satisfies other
additional analysis criteria.
In the following, we consider this problem as the modeling problem of approximate
reasoning process on the base of knowledge included into experimental decision tables with
uncertain, imprecise and vague information. We choose the timed approximate Petri nets
defined in [8] as a concurrent model for the constructed decision algorithm.
The solution
The proposed solution of this problem is obtained by realizing the following three stages:
Stage 1. Data Representation.
a b
1
1a=0
1
1b=1
a
p
b
p
t
1a =0 1b=1
1xa 1xb
1ya 1yb
color a =with a =0 | a =1 | a =2;
color b =with b =0| b =1 | b =2;
var xa, ya : a;
var xb, yb : b;
)] 0 ( ) 2 (
) 0 ( ) 1 (
) 2 ( ) 0 (
) 1 ( ) 0 (
) 0 ( ) 0 ( [
= = = =
= = = =
= = = =
= = = =
= = = =
b yb andalso a ya
orelse b yb andalso a ya
orelse b yb andalso a ya
orelse b yb andalso a ya
orelse b yb andalso a ya
34
We assume that the decision table S (a decision system in Pawlaks sense [16])
representing experimental knowledge is given. It consists of a number of rows labeled by
elements from a set of objects U which contain the results of sensor measurements
represented by the value vector of conditional attributes (conditions) from A together with the
decision d corresponding to this vector. The decision is given by a domain expert. Values of
conditions are identified by sensors in a finite, but unknown number of time units.
Sometimes, it is necessary to transform the given experimental decision table in a similar way
as in Problem 1, Stage 1. This step is necessary when the decision algorithm constructed
directly from the original decision table yields an inadequate classification of unseen objects,
or when the complexity of decision algorithm synthesis from the original decision table is too
high. The input for our algorithm consists of the decision table (preprocessed, if necessary).
In the paper, we consider the values of attributes to be crisp [16] or fuzzy [1].
Stage 2. Knowledge representation.
We assume that the knowledge encoded in S is represented by rules automatically
extracted from S, using the standard rough set methods for rules generation. We consider two
kinds of rules: conditional and decision ones. Rules of the first kind express some
relationships between values of conditions. However, rules of the second kind express some
relationships between values of conditions and decision. Besides, each of such rules can be
deterministic or non-deterministic. The rule is active if the values of all attributes on its left
hand side have been measured. An active rule which is true (in a degree CF) in S can be used
to predict the value of the attribute on its right hand side even if its value has not been
measured, yet. Our concurrent model of a decision algorithm propagates information from
sensors (attributes) to other attributes, as soon as possible. It is done by using rules in minimal
form, i.e. with a minimal number of descriptors on its left hand side. We use the method for
generating rules minimal and true in S, described in [24].
Stage 3. Transformation of decision tables into timed approximate Petri nets.
The construction of a timed approximate Petri net for the given decision table consists of
four steps (levels). Each step of the net construction provides one module of the constructed
net. At first, the places representing the set of all conditions of the given decision table are
constructed. Then, the fragments of the net, defined by a set of conditional rules generated
from the given decision table are added, to the places obtained in the first step. Next, the net
obtained in the second step is extended by adding fragments of the net defined by the set of
decision rules generated from the given decision table. Finally, the elements (places,
transitions and arcs) of the net defined in steps 1-3 are additionally described according to the
definition of a timed approximate Petri net. The modular approach described above makes the
appropriate construction of a net much clearer.
The constructed timed approximate Petri net allows to make a decision, as soon as a
sufficient number of attribute values is known as a result of measurements and conclusions
drawn from the knowledge encoded in S. It is easy to prove that any of its computations
leading to decision making has minimal length, i.e., no prediction of the proper decision based
on the knowledge encoded in S and the measurement of attribute values is possible before the
end of the computation. Each step of the computation of the constructed timed approximate
Petri net consists of two phases. In the first phase, checking is performed to see whether some
new values of conditions have been identified by sensors, and in the second phase, new
information about values is transmitted through the net at high speed. The whole process is
realized by implementation of the rules true (in a degree CF) in the given decision table.
It is worth to point out that the proposed methodology can be used to model the decision
processes considered by Pawlak in [18], in an easy way.
35
Example 3.2. Consider a decision system }) { , ( e A U S = where the set of objects
} , , , , , {
6 5 4 3 2 1
u u u u u u U =
, and the set of conditions } , , , { d c b a A = . The decision is denoted by e. The
possible values of conditions and the decision from S are defined as in Table 11.
U\ } {e A
a b c d e
1
u
1 1 1 1 0
2
u
1 1 2 1 1
3
u
2 0 1 2 0
4
u
2 0 2 2 0
5
u
2 0 1 1 0
6
u
2 0 2 1 0
Table 11
Using standard rough set methods for generating rules and computing certainty factors of
rules, we compute all (deterministic and non-deterministic) decision and conditional rules
from S together with appropriate certainty factors. For the sake of the paper's length, we
consider only a sample of rules for the decision system S. It is as follows: r1: if a=1 then e=0
(CF=0.5, non-deterministic decision rule), r2: if b=1 then e=0 (CF=0.5, non-deterministic
decision rule), r3: if a=1 AND c=2 then e=1 (CF=1, deterministic decision rule), r4: if a=2
OR b=0 OR c=1 OR d=2 then e=0 (CF=1, deterministic decision rule), r5: if b=1 AND c=2
then e=1 (CF=1, deterministic decision rule), r6: if d=1 then e=0 (CF=0.75, non-deterministic
decision rule), r7: if a=1 OR b=1 then d=1 (CF=1, deterministic conditional rule).
The timed approximate Petri net corresponding to these rules is presented in Fig. 5. The
net construction is realized according to the approach described above. More detailed
information on the algorithm for transforming rules representing the given decision table into
an approximate Petri net can be found in [6]. Complete description of the method for
transforming rules into the timed approximate Petri net will be presented in the full version of
this paper. At present, we give an intuitive explanation of such a net construction, taking into
account our example.
In the timed approximate Petri net from Fig. 5, places p
a
,
p
b
,
p
c
, p
d
represent the conditions
a, b, c, d from S, respectively. However, the place p
e
represents the decision e. The transitions
t
1
,, t
6
represent the rules r1,, r6, respectively. Transition t
7
represents the rule r7.
Time values associated with the transitions of the net are defined as follows: 1
corresponds to transitions t
1
, t
2
, t
6
; 2 corresponds to transitions t
3
, t
5
, t
7
; 4 corresponds to the
transition t
4
. Bi-directional input/output arcs in the net check only whether a suitable
transition in the net is enabled at a given marking. After firing that transition the marking of
input/output places of transition does not change. The color sets (types) corresponding to
places p
a
,
p
b
,
p
c
, p
d
are defined as follows: a={a=1, a=2}, b = {b=0, b=1}, c= {c=1, c=2},
d={d=1, d=2}, e={e=0, e=1}, respectively.
36
Fig. 5
The time stamps carried by tokens for places p
a
,
p
b
,
p
c
, p
d
are equal to 7,7,12,10,
respectively. The operator AND is associated with transitions t
3
, t
5
, and OR with transitions
t
4
, t
7
. Time stamps and operators are omitted in the figure. Additionally, in our net model we
assume that the truth values for all propositions (descriptors) associated with places are equal
to 1 (true), as well as the threshold values are set to 0 for all transitions. Moreover, in the net
any transition can fire only once at a given simulation session.
Place\Time g=0s g=8s g=9s g=10s g=14s
p
a
1/a
1
@7 1/a
1
@7 1/a
1
@7 1/a
1
@7
p
b
1/b
1
@7 1/b
1
@7 1/b
1
@7 1/b
1
@7
p
c
1/c
2
@12
p
d
1/d
1
@9 1/d
1
@10 1/d
1
@10
p
e
0.5/e
0
@8 0.5/e
0
@8 0.75/e
0
@11 0.75/e
0
@11
+1/e
1
@14
Table 12
We assume that the initial marking of each place in the net at a global clock g = 0s is
equal to the empty set. The example of an approximate reasoning process realized in the net
model presented in Fig. 5 is shown in Table 12. The sign @ in the marking of a place denotes
a time stamp. From Fig. 5 and Table 12 it follows that, for example, when the value of global
clock is equal to g =8s, then places p
a
,
and p
b
are marked. At the same time, transitions t
1
and
t
2
are ready and they fire. Further, at the time g = 8s, we can make the first decision, i.e., the
place p
e
is marked. In Table 12, we can also observe that for different time moments we
obtain different markings of the place p
e
. Analyzing the approximate reasoning process in the
net model, we can consider different paths of its execution. As a consequence, we can choose
the most appropriate path of the reasoning process in the net model, i.e., satisfying some
imposed requirements (criteria).
Remarks. The described data model of concurrent systems discovered from a given
information system allows to understand better the structure and behavior of the modeled
37
system. Due to this approach, it is possible to represent the dependencies between the
processes in information system and their dynamic interactions in graphical way. The
presented approach can be treated as a kind of decomposition of a given information system.
Besides, our methodology can be applied for automatic feature extraction. The components
and the connections between components in the system can be interpreted as new features of
the modeled system. Properties of the constructed concurrent systems (e.g. their invariants)
can be understand as higher level laws of experimental data. As a consequence, this approach
seems to be useful also for state identification in real-time.
The presented concurrent model of a decision algorithm allows a comprehensive analysis
of approximate reasoning process described by the given decision table, among others, a very
quick identification of objects in the given decision table. The proposed net model allows
also to analyze the performance of approximate reasoning process. Using the time
approximate nets we can answer such questions as, how much time we need to make a
decision, or which execution path in the approximate reasoning process should be chosen in
order to make a decision as quick as it is possible. We have two self-implemented computer
tools used for the verification of proposed methods and algorithms [7],[13].
References
[1] Bandemer, H., Gottwald, S.: Fuzzy Sets, Fuzzy Logic, Fuzzy Methods with Applications,
Wiley, New York 1995.
[2] Brown, E.M.: Boolean Reasoning, Kluwer Academic Publishers, Dordrecht 1990.
[3] Cios, J.K., Pedrycz, W., winiarski, R.W.: Data Mining. Methods for Knowledge Discovery,
Kluwer Academic Publishers, Boston 1998.
[4] Crowley, J.L.: Navigation for an Intelligent Mobile Robot, IEEE Journal of Rob. Auto. RA-1,
1985, 31-41.
[5] Fayyad, Usama M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, Ramasamy (Eds.),
Advances in Knowledge Discovery and Data Mining, The AAAI Press, Menlo Park, CA,
1996.
[6] Fryc, B., Pancerz, K., Suraj, Z.: Approximate Petri Nets for Rule-Based Decision Making, in:
[31], pp. 733-742.
[7] Fryc, B., Makara, Z., Pancerz, K., Suraj, Z.: A Petri Net System, in: Proceedings of the
Workshop on Theory and Applications of Soft Computing (TASC04), L. Polkowski (Ed.),
Warsaw, Poland, November 26, 2004, Polish-Japanese Institute of Information Technology,
Warsaw 2004 (to appear).
[8] Fryc, B., Suraj, Z.: Timed Approximate Petri Nets, in: Proceedings of the Workshop on
Theory and Applications of Soft Computing (TASC04), L. Polkowski (Ed.), Warsaw, Poland,
November 26, 2004, Polish-Japanese Institute of Information Technology, Warsaw 2004 (to
appear).
[9] http://logic.mimuw.edu.pl/rses
[10] http://www.daimi.au.dk/designCPN/
[11] Jensen, K.: Coloured Petri Nets. Basic Concepts, Analysis Methods and Practical Use, Vol. 1,
Springer-Verlag, Berlin 1997.
[12] Kodratoff, Y., Michalski, R. (Eds.): Machine Learning, Vol. 3, Morgan Kaufmann Publishers,
San Mateo, CA, 1990.
[13] Pancerz, K., Suraj, Z.: Discovering Concurrent Models from Data Tables with the ROSECON
System. Fundamenta Informaticae, Vol. 60 (1-4), IOS Press, Amsterdam 2004, 251-268.
38
[14] Pancerz, K., Suraj, Z.: On Some Approach to Restricted-Based Concurrent System Design, in:
Proceedings of the International Workshop on Concurrency, Specification and Programming
(CS&P2004), Vol. 1, H.D. Burkhard, L. Czaja, G. Lindemann, A. Skowron, H. Schlingloff,
Z. Suraj (Eds.), Caputh, Germany, September 24-26, 2004, Humboldt University, Berlin 2004,
112-123..
[15] Pancerz, K., Suraj, Z.: Automated Discovering of Concurrent Models from Data Tables: An
Overview, in: Proceedings of the 3
rd
ACS/IEEE International Conference on Computer
Systems and Applications (AICCSA-05), Cairo, Egypt, January 3-6, 2005, American
University in Cairo, 2005 (to appear).
[16] Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning About Data, Kluwer Academic
Publishers, Dordrecht 1991.
[17] Pawlak, Z.: Concurrent Versus Sequential the Rough Sets Perspective, Bulletin of the EATCS,
48, 1992, 178-190.
[18] Pawlak, Z.: Flow Graphs, their Fusion, and Data Analysis, in: A. Jankowski, A. Skowron, M.
Szczuka (Eds.), Proceedings of the International Workshop on Monitoring, Security and
Rescue Techniques in Multiagent Systems (MSRAS 2004), Pock, Poland, June 7-9, 2004, 3-
4.
[19] Payton, D.W., Bihari, T.E.: Intelligent Real-Time Control of Robotic Vehicles, Comm. ACM,
1991, Vol. 34-8, 48-63.
[20] Peters, J.F., Skowron, A., Suraj, Z., Pedrycz, W., Ramanna, S.: Aproximate Real-Time
Decision Making: Concepts and Rough Fuzzy Petri Net Models, International Journal of
Intelligent Systems, Vol. 14, John Wiley & Sons, Inc., New York 1999, 805-839.
[21] Schoppers, M.: Real-Time Knowledge-Based Control Systems, Comm. ACM, Vol. 34- 8,
1991, 26-30.
[22] Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems,
in: R. Sowiski (Ed.), Intelligent Decision Support. Handbook of Applications and Advances
of Rough Set Theory, Kluwer Academic Publishers, Dordrecht 1992, 331-362.
[23] Skowron, A., Suraj, Z.: Rough Sets and Concurrency, Bulletin of the Polish Academy of
Sciences, Vol. 41, No. 3, 1993, 237-254.
[24] Skowron, A., Suraj, Z. (1996): A Parallel Algorithm for Real-Time Decision Making: A
Rough Set Approach, Journal of Intelligent Information Systems 7, Kluwer Academic
Publishers, Dordrecht, 5-28.
[25] Suraj, Z.: Discovery of Concurrent Data Models from Experimental Tables: A Rough Set
Approach. Fundamenta Informaticae, Vol. 28, No. 3-4, IOS Press, Amsterdam 1996, 353-376.
[26] Suraj, Z.: An Application of Rough Set Methods to Cooperative Information Systems
Reengineering, in: S. Tsumoto, S. Kobayashi, T. Yokomori, H. Tanaka (Eds.), Proceedings of
the Fourth International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery
(RSFD96), Tokyo, Japan, November 6-8, 1996, 364-371.
[27] Suraj, Z.: Reconstruction of Cooperative Information Systems under Cost Constraints:
A Rough Set Approach, Information Sciences: An International Journal 111, Elsevier Inc.,
New York 1998, 273-291.
[28] Suraj, Z.: The Synthesis Problem of Concurrent Systems Specified by Dynamic Information
Systems, in: L. Polkowski, A. Skowron (Eds.), Rough Sets in Knowledge Discovery, 2,
Physica-Verlag, Berlin 1998, 418-448.
[29] Suraj, Z.: Rough Set Methods for the Synthesis and Analysis of Concurrent Processes, in: L.
Polkowski, S. Tsumoto, T.Y. Lin (Eds.), Rough Set Methods and Applications, Springer,
Berlin 2000, 379-488.
39
[30] Suraj, Z., Pancerz, K.: A Synthesis of Concurrent Systems: A Rough Set Approach, in:
G. Wang, Q. Liu, Y. Yao, A. Skowron (Eds.), Proceedings of the 9th International Conference
on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC'2003),
Chongqing, China, May 26-29, 2003, Lecture Notes in Artificial Intelligence, Vol. 2639,
Springer-Verlag, Berlin-Heidelberg 2003, 299-302.
[31] Tsumoto, S., Sowiski, R., Komorowski, J., Grzymala-Busse, J.W. (Eds.), Proceedings of the
4th International Conference on Rough Sets and Current Trends in Computing (RSCTC'2004),
Uppsala, Sweden, June 1-5, 2004, Lecture Notes in Artificial Intelligence, Vol. 3066,
Springer-Verlag, Berlin Heidelberg, 2004, 733-742.