UML and Data Modeling - A Reconciliation
UML and Data Modeling - A Reconciliation
Modeling:
A Reconciliation
David C. Hay
Foreword by Sridhar Iyengar
Published by:
Technics Publications, LLC
966 Woodmere Drive
Westfield, NJ 07090 U.S.A.
www.technicspub.com
Edited by Carol Lehn
Cover design by Mark Brye
Cover Origami:
Designed by Tomako Fusé
Folded by David C. Hay
Photographed by Włodzimersz Kurniewicz
Glossary 191
Bibliography 223
Index 231
Foreword
By Sridhar Iyengar
I had the interesting experience of initially meeting David Hay in the late
1990s a couple of years after Unified Modeling Language (UML) became
an Object Management Group (OMG) standard. I was giving a talk at
the DAMA Data Warehouse Conference on the topic of modeling and
metadata management using UML and a related standard at OMG called
the Meta Object Facility (MOF). The audience was interested to learn
about UML but somewhat skeptical because the use of Peter Chen’s E/R
modeling notation was well known and established in the data modeling
community. There was one particular attendee (you guessed right - it
was David!) who was a little more vocal than the rest and challenged me
when I asserted that UML and its notation was not just for object
modelers but could also help data modelers. I thoroughly enjoyed the
debate but confess I was a bit irritated because the flow of my talk was
interrupted a bit!
What followed back and forth at this conference and again in a couple of
follow on conferences was an indication of how widespread the
‘impedance mismatch’ was that existed between the community of data
modelers/data architects and object modelers/object architects. There
were several debates during talks and also after talks during cocktails on
this clash of data/object modelers and I challenged the audience to be
more open minded about UML in part because there was a lot more to
UML than just simple structural modeling of objects.
I was extremely pleased to see David join the effort at OMG in
establishing a new Information modeling and Metadata Management
standard. David was determined to do something that others had tried
but given up too soon. He really wanted to bridge the data modeling and
object/UML modeling community not just by using the UML notation in
a superficial manner, but also by addressing concerns that data architects
and data modelers actually faced in their daily work – concerns about
structure and semantics, as well as notation and methodology familiar to
data modelers. I have followed the debates on OMG mailing lists where
David over the years has earned the respect of his object modeling
colleagues (he clearly already did this in the data modeling community
1
2 UML and Data Modeling: A Reconciliation
years ago) and ultimately influenced the standard and along the way
finished this much needed book – a practical handbook.
David has pulled off the impossible – balancing the need to keep the
notation familiar enough to data modelers but acknowledge the audience
already familiar with UML – and explaining not just the notation, but also
the best practices in data modeling as he leads the reader using practical
and simple to understand examples. In that sense, David is ‘with the
reader’ in his/her journey to use the UML notation (with or without
UML tools) effectively for data modeling and architecture. The author’s
experience, pragmatism and a community building expertise are well
demonstrated in this book. He has even included a historical background
of the two communities involved. (We are both showing our years of
experience and gray hair!)
This book comes at a time when data modelers, object modelers and
semantic web modelers are all beginning to realize the value of modeling
and architecture. My hope is that this book brings those communities
together, because in this world of big data and deep analytics, and the
need to understand both structured and unstructured data – attention to
design and architecture is key to building resilient data intensive systems
for the mobile and connected world. We are realizing more and more
that the value we derive is not just from the programs that run on various
devices and servers, but from the underlying data. The better we
understand the data, the more we can gain from the designing, using and
analyzing internet scale data systems.
David – You have done it! Thanks for taking on the this very important
work of bridging data modelers and UML modelers, extending the state
of the art and for writing about this important notation and technique
that will guide data modelers for years to come.
I hope you learn and gain as much from reading this book as I did.
Enjoy!
1
Norbert Wiener. 1948, 1961. Cybernetics: of Control and Communication in
the Animal and the Machine, second edition. (Cambridge, MA, The MIT
Press). 2.
2
Peter Chen. 1977. “The Entity-Relationship Approach to Logical
Data Base Design”. The Q.E.D. Monograph Series: Data
Management. Wellesley, MA: Q.E.D. Information Sciences, Inc. This
is based on his article, “The Entity-Relationship Model: Towards a
3
4 UML and Data Modeling: A Reconciliation
6
Norbert Wiener. 1948, 1961. Op. cit. 2.
6 UML and Data Modeling: A Reconciliation
Oh, and did I mention that the Semantic Web is lurking out there? You
understand, don’t you, that this is an entirely new body of knowledge–that
will change everything yet again.
The assignment today is to try to reconcile the object-oriented and the
data groups. On the one hand, it is for data modelers to learn how to use
a new technique–a sub-set of the UML notation–to produce the
business-oriented entity/relationship models they know. On the other
hand, it is for object-oriented UML modelers to improve their knowledge
of a technique they already know, in order to expand their understanding
just what a data model could be. Moreover, after reading this book, both
groups should have a better understanding of just what makes a “good”
business data model.
David C. Hay
Houston, Texas
Acknowledgements
After 20 years as a data modeling bigot, and 5 of those being UML’s
worst critic, I must thank Dagna Gaythorpe and DAMA International
for signing me up to work with the Object Management Group, in order
to work with them on the Information Metadata Model (IMM). I must
admit that I felt a little like a KGB agent in the CIA the first time I
attended an OMG meeting, but all were most friendly and helpful, so the
experience turned out to be profound and extremely valuable.
This of course leads me to offer my true thanks to the OMG IMM team
for finally forcing me to really understand what UML is all about, and
why it is like it is.
In particular, I want to thank Jim Logan, Ken Hussey, and Pete Rivet for
helping me come to grips with the different thought processes that are
behind UML. Learning a new language––especially when it means
learning a new culture and a new way of looking at the world––is
difficult, and I really appreciate their patience.
I only hope that I have represented that point of view fairly.
My gratitude also goes to my mentors in the data modeling world:
Richard Barker, Cliff Longman, and Mike Lynott. They are the ones who
introduced me to the conceptual (“semantic”) way of looking at the
world. Meeting them as an adult finally showed me what I wanted to do
when I grew up.
In particular, Mike has put in a great deal of effort editing and helping to
shape this book.
Thanks also to Bob Seiner, publisher of The Data Administration Newsletter,
for publishing a series of four articles in 2008, “UML as a Data Modeling
Notation”. These articles served as the seeds for this book.
Much appreciation goes to the people who read the manuscript and
provided useful comments and suggestions: Roland Berg, Harry Ellis,
William Frank, Allan Kolber, Kent Graziano, Frank Palmeri, and Russell
Searle.
Thanks also must go to my Publisher, Steve Hoberman and Editor Carol
Lehn-Dodson for helping put this whole work together.
7
8 UML and Data Modeling: A Reconciliation
9
10 UML and Data Modeling: A Reconciliation
Observations
Before proceeding, three observations should be kept in mind:
There are better and worse data modelers.
There are better and worse UML modelers.
Neither “community” is as homogeneous as the previous
paragraphs would suggest.
7
Peter Coad and Edward Yourdon. 1990. Object-Oriented Analysis
(Englewood Cliffs, NJ: Yourdon Press).
As it happens, they were wrong on both counts. Semantic data
modeling does account for inheritance through sub-types and super-
types. And classification and assembly structures can be represented
as well.
8
Ibid. page 31.
9
James Rumbaugh, Ivar Jacobson, and Grady Booch. 1999. The
Unified Modeling Language Reference Manual. Reading, Massachusetts:
Addison-Wesley. 30.
12 UML and Data Modeling: A Reconciliation
10
Ibid.p.185.
When he first developed the technique of data modeling, Dr. Chen
referred to an “entity” as a thing in the world and an “entity type” as
the definition of a class of such things. It was an entity type that was
represented by a box on the diagram. Over time, however, people
became careless and used the word “entity” to refer to the entity type
boxes. With the advent of object-orientation, with its clearer
distinction between “objects” and “classes”, it became apparent that
discipline should be re-introduced to the data modeling world. In the
interests of this, and to recognize the common structures of the two
worlds, entity types will herein be referred to as entity classes.
11
Richard Barker, 1990, CASE*Method: Entity Relationship Modeling
(Wokingham, England: Addison-Wesley).
Introductions 13
12
James Martin and James Odell, 1995, Object-Oriented Method: A
Foundation (Englewood Cliffs, NJ: PTR Prentice Hall).
13
Among others, see David Hay, 1999, “UML Misses the Boat.” East
Coast Oracle Users’ Group: ECO 99 (Conference Proceedings /
HTML File). Apr 1, 1999. Available at
http://essentialstrategies.com/publications/objects/umleco.htm.
14 UML and Data Modeling: A Reconciliation
Translation for non-Americans: back in the 19th Century, private
clubs would only admit people who revealed their membership by
shaking hands in a particular way. To get in, you “just had to know
the secret handshake”. Much software is the same way. To get access
to a particular feature, “you just have to know…”.
“Business at hand” refers to the subject being modeled, which might
be a business to be sure, but it might also be a microbiology lab or a
space shuttle. The key is that we are interested in describing the
“problem space”, not the “solution space.” For convenience in this
book the term will be “business”, even though the subject matter
could well be other than commercial. In other cases the non-
committal “domain” will be used.
Introductions 15
The first group creates what are here called logical data models, while
the second group creates what will here be called conceptual data
models.
Both of these groups find UML to be at least annoying, if not threatening
to their world views. The database modelers are up against the fact that
the object-oriented approach to data is dramatically different from the
relational database approach. Most significantly, object-orientation
makes extensive use of sub-typing (inheritance), while this cannot be
directly represented in a purely relational database.
Moreover, while the database modeler views a database as a corporate
resource and is concerned with controlling access to the data and their
definitions, the object-oriented developer is concerned with designing
data (object) structures as a part of program design. As used by an object-
oriented designer, a class in UML refers to a piece of program code that
describes a set of attributes and behaviorswith objects in that class
coming into existence and going out of existence as needed. There are no
formal structures for controlling the definitions of classes, and any data
security measures must be programmed explicitly. Thus, to an object-
oriented designer and programmer, the disciplines and security
constraints being invoked by the database administrator are a hindrance
to the rapid development of systems.
From the point of view of the business concept modelers, UML class
models are different from entity/relationship models (conceptual ones,
at least) because the object-oriented community is not constrained in
specifying what constitutes a class. Pretty much anything (including
elements of the technology itself) can be an object. These then are
collected into a UML class. In the conceptual entity/relationship world,
Words and phrases highlighted with bold italics are defined further in
“Appendix I: Glossary”.
For an industry that is all about getting the business to clean up its
vocabulary to describe its world coherently and consistently, the data
modeling business has its own problem with language. There are at
least two definitions extant for both “conceptual modeling” and
“logical modeling” These discrepancies will be disposed of shortly.
For the moment, the definitions presented here will have to do.
Introductions 17
In fact, some data modelers are a bit casual in the way they define
entities, so it is to be hoped that they too can benefit from this book.
“Business at hand” refers to the subject being modeled, which might
be a business or a microbiology lab or a Space Shuttle. The key is
that we are interested in describing the “problem space”, not the
“solution space.” For convenience in this book, the term will be
“business”, even though the subject matter could well be other than
18 UML and Data Modeling: A Reconciliation
Combined Introduction
The approach to modeling in this book is predicated on an
understanding of both the history and the current perspectives of the two
communities. The history is presented in more detail in Appendix B, but
a few words about it are worth mentioning here. The current
perspectives of the two communities are best understood in terms
originally laid out by John Zachman in his “Framework for Information
Architecture” and refined by your author. This “combined introduction”
addresses both of these points.
Historical Threads
The history describes the concepts that have driven the information
processing industry and how they are interrelated in complex ways. Each
person who created a ground-breaking concept at any time was
undoubtedly well-read and knew a lot about what had gone before—in
various areas of interest. But even so, that person’s readings and
exposure to ideas were invariably selective, reflecting his or her personal
biases and interests.
Appendix B contains an extensive history of the information technology
industry. What is striking about that are the three main threads:
Data Processing – The origins of computers and programming
Object-oriented Development – The advent of object-oriented
programming and the changes that it brought to the whole
development process.
Data Architecture – Recognition of an enterprise’s data as an
asset as significant as money, human resources, and capital
equipment and the effect of this on how data are managed.
The stream that is “Object-oriented Development” is noteworthy for
being dominated by the technology of computer programming. Both the
languages themselves and the way people organized programs were the
domain of people whose lives were dominated by the task of producing
working, effective, and powerful program code. Data are seen by these
people as elements to be manipulated by programs. Even the insight of
object-orientation–organize the programs around data rather than
processes–did not change the fundamental fact that programs are
themselves “processes”.
The stream that is here labeled “Data Architecture” is noteworthy for not
being dominated by technology at all. While database technologies have
formed a large part of the history of this field, the focus of this industry
is less on the technology than on how the data are gathered, processed,
and–most significantly–used in the business. Data are treated as resources
to be managed, manipulated, and controlled. Technology’s role is to
make this possible. Who produces data? What transportation and
translation occurs before they get to their ultimate consumers? Who has
the right to modify data structures? Who has the right to have access to
the data? These are among the questions addressed by data management.
Programs are written to support the actors that manipulate data
themselves, but it is the organization and security of those data that are
most important to the enterprise.
Architectural Framework
The current perspectives of the two communities are best understood by
looking at the world in terms of an Architectural Framework.
20 UML and Data Modeling: A Reconciliation
Views of Technology
4. Technology model (designer’s view): This row (called the
“builders’ view” in Mr. Zachman’s version) is the first
14
John Zachman, 1987, “A framework for information systems
architecture”, IBM Systems Journal, Vol. 26, No. 3. (IBM Publication
G321-5298)
15
David Hay, 2003. Requirements Analysis: From Business Views to
Architecture. Englewood Cliffs, NJ: Prentice Hall PTR.
Introductions 21
For the best introduction to the Semantic Web and the OWL, see
Dan Allemang and Jim Hendler. 2009. Semantic Web for the Working
Ontologist: Effective Modeling in RDFS and OWL. (Boston: Morgan
Kaufmann).
Introductions 23
Architect’s View
A more powerful way to use the entity/relationship approach is to
describe the architect’s view (“Row Three”). This Essential Data Model
is specifically concerned with identifying the language that describes what
To learn about SBVR there are two sources: 1) The source document
is “Semantics of Business Vocabulary and Business Rules”. 2008.
Object Management Group. http://www.omg.org/spec/SBVR/1.0/pdf.
The descriptions here are dense, but it is thorough and logically
approached. And there is a Case Study. 2) A more accessible
approach is to preview Graham Witt’s forthcoming book on the
subject in a series of articles at
http://www.brcommunity.com/b461.php.
24 UML and Data Modeling: A Reconciliation
is fundamental to the enterprise as a whole. That is, the model captures its
essence. Here, an entity is a fundamental “thing of significance to the
organization”, and an entity class is a collection of such things. A
relationship associates a hypothetical instance of an entity class with one
or more hypothetical instances of another entity class. This has
traditionally been addressed using one of several data modeling
approaches and notations.
This model is translatable into English sentences understandable to the
business, but it is constrained to language that describes fundamental
concepts, not just the more concrete elements that are part of a business’
everyday life.
Designer’s View
This architectural model can then be converted to a logical data model
that more closely reflects the technology of database management. This is
the “Row Four” model, of which there are numerous kinds: This can be
relational tables and columns, object-oriented classes, XML Schema tags,
or whatever.
Each of the elements shown on the figure has a different view of the
world. Not only does the architect see the world differently from the
designers, but object-oriented designers see the world quite differently
from the relational database designers.
The Unified Modeling Language (UML) was originally designed to
support object-oriented design. This is why it was not widely accepted
either by architectural modelers or relational database designers.
Summary
This chapter sets the stage for the rest of the book, acknowledging that
there are two very different communities in this industry—the object
oriented and the data oriented—whose misunderstanding of each other
has caused difficulties on several fronts. Actually, there are three
communities: the object-oriented developers, the data-oriented analysts,
and data oriented developers. The distinctions are actually three-way:
Those who support data management and define system
requirements on behalf of those who run the business or the
government agency.
Those who design and implement databases either to support
operational systems or to support analytical “data warehouses”.
Introductions 25
Impedance Mismatch
The Unified Modeling Language was introduced as the unification of
multiple languages for describing not only object classes, but also processing,
events, use cases, and other artifacts of the system development process.
While the various notations are expected to describe the business that is
paying the bills for systems projects, it is clear that the objective of the class
model was to support object-oriented systems design. This becomes
apparent as we try to use the notation to describe business concepts without
regard for technology.
Even so, the class diagram is closely enough related to the more business-
oriented entity/relationship model, that the approach described in this book
is possible. The design perspective, however, is sufficiently different from the
architectural perspective that the differences must be made explicit to
proceed.
First, it must be recognized that the underlying structure of a relational
database is vastly different from the underlying structure of object-oriented
code, and the problems associated with translating one to the other have led
to the adoption of the term, impedance mismatch to describe those
problems.
The term “impedance mismatch” is borrowed from electrical engineering. In
electrical engineering, the term “impedance matching” refers to the use of a
transformer to make the load (impedance) required on a target device (such
27
28 UML and Data Modeling: A Reconciliation
16
American Radio Relay League, 1958. The Radio Amateur’s Handbook: The
Standard Manual of Amateur Radio Communication. (Concord, New
Hampshire: The Rumford Press). 42.
17
Ted Neward. 2006. “The Vietnam of Computer Science”. The Blog Ride:
Ted Neward’s Technical Blog. Retrieved 8/6/2001 from
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Compute
r+Science.aspx
18
Scott Ambler, 2009, “The Cultural Impedance Mismatch”, The Data
Administration Newsletter, August 1, 2009. Available at:
http://www.tdan.com/view-articles/11066.
UML and Essential Data Models 29
These two points of view are far from incompatible. Indeed, both are
required in order to build useful systems. The problem is that the body of
knowledge for each has become complex enough to discourage those in the
other camp from mastering it. Hence misunderstandings abound.
This mismatch has several aspects, some philosophical and some
technological. Each of these will be addressed in detail later in this chapter.
But first:
Note that there are “impedance mismatches” among three groups, not just
two. In addition to the object-oriented design world, the data world itself
consists of two groups: those concerned primarily with the technical aspects
of designing databases and squeezing them onto real computers, and those
concerned with the business dimensions of managing the information
derived from those data. The latter group produces the data models that
support both database design and object-oriented program design, but these
are different from both design approaches. The object-oriented designer and
the relational database designer have issues to resolve between themselves, as
well.
Figure 2-1 reproduces Figure 1-1 from Chapter 1, to show again the different
players (and the different kinds of models) involved with creating systems.
The parties we are concerned with here are the architects, the relational
database designers, and the object-oriented designers. Note that their
products are circled on the Figure.
The following sections in this chapter will first address the relationship
between architectural entity/relationship modeling and object-oriented
design (which is the primary concern of this book), and then address the
relationships between object-oriented design and relational database design.
UML and Essential Data Models 31
19
Richard Barker, 1990, Op. cit.
32 UML and Data Modeling: A Reconciliation
Behavior
One area of concern that generates the greatest discussion is that of behavior.
It is, however, the easiest to address: object-oriented object models represent
it; data models don’t.
Object-orientation bundles together, along with the program code that
defines each class, references to the pieces of code that process events
affecting objects in that class. That is, when you define a class, you define not
only its attributes and roles, you also define the processing required to create,
manipulate, and destroy instances of that class. The definition of a class is a
combination of its structure and its behavior.23
This is not done in the data world–at least not in data models. It is true that
relational databases may include stored procedures that are thus integrated
into a database. But they do not come up during the modeling of a business’
20
James Rumbaugh, et. al. 1999. Op. cit.
21
James Rumbaugh, et. al. 1999. Op. cit.
22
James Martin and James Odell. 1995, Op. cit.
Again, this domain could be an enterprise (or a significant part of one), a
government agency, or an area of academic interest. But it is a subject
area, not concerned with any technology that might be used to
accommodate it.
23
Meilir Page-Jones, 2000, Fundamentals of Object-Oriented Design in
UML (New York: Dorset House).
UML and Essential Data Models 33
data. Data modelers are interested first and foremost in capturing the structure
of data. Separately, analysts address the processes involved with manipulating
them. Data modelers are particularly adamant on this point because the
subject of their models is limited to classes of things of interest to the
business. To describe the behavior of these real-world classes is much more
complicated than simply identifying the computer programs that create and
update data describing them.
In fact, analysts have several modeling approaches to choose from for
represent business processes—the “behavior” of the enterprise. The
simplest, and one that should be done along with a data model, is the
function hierarchy. This does not describe processing as such, but it does
describe what the enterprise being analyzed does. That is, beginning with its
mission, the ongoing operational activity of the enterprise.24 This is then
composed of 7 or 8 or 9 principal functions are identified. A function
describes something the enterprise does in support of the mission. This is
independent of any mechanisms that might be involved. Each of these
functions is similarly described in terms of 7 or 8 or 9 principal sub-functions
required to carry it out. This goes on, exploding successive levels of
functions until reaching the most atomic functions that cannot be
meaningfully sub-divided.
The business process model is descended from the data flow diagram. This
is, in fact, one of the kinds of UML diagrams available as well. It presents
business processes and the flow of data among them. This model can be
linked to the entity/relationship model in two ways: First, the CRUD Matrix
simply associates each entity class with the business processes that (C)reate,
(R)etrieve, (U)pdate, or (D)elete instances of that entity class. A second
approach, which works for data flow diagrams at least, is to map the data
stores (places and forms where data pause in traversing the model) to the
data model entity classes. Each data store can be considered a SQL-like
“view” of the data model, or a virtual entity class being defined in terms of
sets of basic entity classes and relationships from the data model. This is
analogous to the way “views” might be defined from tables and columns in a
relational database.
One modeling technique (also available as a UML technique) consists of
creating a diagram to describe the “behavior” of either a part of the business
or a particular entity class. The state/transition diagram portrays each state
24
Object Management Group (OMG). 2010. “The Business Motivation
Model Version 1.1”. (OMG Document formal/2010-05-01). Available at
http://www.omg.org/spec/BMM/1.1/PDF/. 13.
34 UML and Data Modeling: A Reconciliation
25
For more information about this and all of the techniques described
above, see David C. Hay, 2003, Requirements Analysis: From Business Views
to Architecture. (Upper Saddle River, NJ: Prentice Hall PTR).
That is, when entity/relationship modelers don’t label relationships, they
are not using one approach, but when UML modelers don’t label
relationships, they are not using a different approach.
26
G.C. Simsion, and Graham C. Witt, 2005, Data Modeling Essentials. Third
Edition (Boston: Morgan Kaufmann).
27
Richard Barker, 1990, Op. Cit.
28
David C. Hay, 1995, Data Model Patterns: Conventions of Thought (New
York: Dorset House).
UML and Essential Data Models 35
29
DK Illustrated Oxford Dictionary, 1998. (New York: Oxford University
Press). 642.
30
James Rumbaugh, et. al. 1999. Op. cit. 414.
36 UML and Data Modeling: A Reconciliation
Entity/Relationship Predicates
In general, entity/relationship modeling books acknowledge that
relationships can be named with a “verb phrase”, creating a semantic
structure of subject/predicate/object. Specification about what such a verb
phrase might consist of varies from author to author. It can be as simple as
“has”, or “is related to”, or it can be as specific as “lives at” or “is assigned
to”.
This author, however, endorses the Barker-Ellis approach. It adds a semantic
discipline to the structure of these verb phrases. Specifically, in all cases, the
verb is “to be”. This has the effect that this part of the predicate can be used
to specify the relationship’s minimum cardinality—in the form “may be” or
“must be”. (“May be” and “must be”, in grammar, are called verbal
31
auxiliaries. )
The field of logic actually has a word for what remains of the predicate: the
32
copula. This is “a part of the verb be connecting a subject and predicate”.
Expanding the definition cited above, then, a predicate is more completely
described as “what is affirmed or denied about the subject by means of a
33
copula”. That is, the copula is the preposition (not a verb) that describes the
content of the assertion being made.
Figure 2-2 shows a model using the Barker-Ellis notation. In this case, each
relationship can be expressed as two ordinary English sentences:
Each Order may be composed of one or more Line Items.
Each Line Item must be part of one and only one Order.
That is, reading the relationship name in each direction, you get two strong
assertions about the nature of the domain being modeled. The role names
“part of” and “composed of” are the parts of the predicates that in logic are
the copulas referred to above.
31
Merriam Webster, “Must”. Retrieved on September 28, 2010 from
http://www.merriam-webster.com/dictionary/must.
32
DK Illustrated Oxford Dictionary, 1998. Op. cit. 187.
33
Ibid. 642.
UML and Essential Data Models 37
The sentences are constructed from the elements in the drawing according to
the following template:
Each
<subject entity class>
must be …if the line next to the subject entity class is solid
(or)
may be …if the line next to the subject entity class is dashed
<predicate copula>
One or more …if the object entity class has a “crow’s foot” (>)
next to it
(or)
exactly one …if the object entity class does not have a “crow’s
foot” (>) next to it.
<object entity class>
The first entity class, then, is the subject of the assertion, and the second
entity class is the object. The predicate “verb phrase”, then, consists of the
“verbal auxiliaries” (“must be” or “may be”), a prepositional phrase copula
All Barker-Ellis models in this book were produced by the data modeling
tool, Oracle Designer. In that tool, all entity class names can only be
shown as all upper case characters.
Ok, the <relationship name>…
38 UML and Data Modeling: A Reconciliation
(“composed of” or “part of”, in this case), plus an auxiliary phrase describing
the maximum cardinality (“one and only one” or “one or more”).
This means the heart of a relationship is the predicate copula—which is
(nearly) always a prepositional phrase. This is appropriate, since the
preposition is the part of speech that describes relationships. You may recall
Grover in the children’s television program, “Sesame Street”, who loved
prepositions. His favorite words were “in”, “above”, “around”, and so forth.
We can think of predicate prepositions here, then, as “Grover Words”.
If the modeler is successful, these relationship sentences appear self-evident
to the viewer. These are perfectly normal, non-technical sentences. Not only
do they sound like conventional English, they are also strong sentences,
such that if the assertions are, in fact, wrong, one cannot simply let them go.
One has to disagree with them.
The entity/relationship-oriented reader will recognize that not all in that
community follow the constrained naming conventions described here.
Without this discipline in constructing “verb phrases”, however, modelers
are forced to come up with something like “An Order has zero, one or more
Line Items” or “A Person drives zero, one, or more Cars”. This is not
conventional business English. Alternatively, the copula might be added, but
it only gets concatenated into something like “a Project Assignment is of one
and only one Person” and “a Project Assignment is the basis for zero, one, or
Note that on the diagram, the role name is closer to the subject entity
class. This facilitates reading this sentence, since the “must be/may be”
designation is closest to that entity class. When we get to the UML
version, it will be at the other end. See full description of this on page 3-
69.)
Ok, it can also be either a gerund or an infinitive, but the idea is the
same.
By the way, your author has used this in Spanish, and a Quebecois
colleague has used French, which suggests that it should work with any
western language. Moreover, his book, Requirements Analysis: From
Business Rules to Architecture has been translated into Chinese, and he is
told that the relationship names translate very well–at least in one
direction. (Apparently the backwards syntax gets a little clumsy in
Chinese.)
UML and Essential Data Models 39
more Time Sheet Entries”. The problem here is that “is” does not convey
optionality.
In addition, without using the structure described here, it is very easy to
simply use a phrase like “has” or “is related to”. These are quite meaningless
as relationship names. For any relationship, these are “true” statements. For
the sentences to be effective, however, if they are not true, this must be
obvious to the observer.
Note, however, that coming up with disciplined but self-evident role names
is very difficult. To do so means that you really understand the nature of the
relationship, and that you are good at manipulating language. This requires
considerable analytic skill to understand the true nature of the relationship, as
well as the linguistic skill to come up with the right words. You know you
are successful, however, if the resulting sentence seems self-evident to the
reader, or, if it is untrue, this also is evident.
Unfortunately, many modelers don’t have the inclination or the ability to do
so. The final product suffers.
I know, I know: “It depends upon what the meaning of the word ‘is’ is.”
– William J. Clinton. December 22, 2007. (I don’t think he was talking
about this kind of Copula, though.)
If you went into the computer science curriculum because you had
trouble in English class, this may not be the career for you.
As an indication of how important it can be to use the right role name,
the editors of the Hitchhiker’s Guide to the Universe were once “sued by
the families of those who had died as a result of taking the entry on the
planet Tral literally (it said ‘Ravenous Bugblatter Beasts often make a very
good meal for visiting tourists’ instead of ‘Ravenous Bugblatter Beasts
often make a very good meal of visiting tourists’).” [Douglas Adams.
1982. The Restaurant at the End of the Universe. (New York: Pocket Books).
37-38.]
40 UML and Data Modeling: A Reconciliation
We have already made one departure from UML syntax. In a business-
oriented model, to be read by non-technical people, relationship names
contain spaces between words. Also, by convention, they are entirely
lower case. This is discussed further below (pages 39 and 69).
UML and Essential Data Models 41
Each
<Subject Entity Class>
must be …if the second entity class has “1..” next
to it
(or)
may be … if the second entity class has “0..” next
to it
<Predicate Copula>
one or more …if the second entity class has “..*” next
to it
(or)
one and only one … if the second entity class has “..1” next
to it.
<Object Entity Class>
.
Thus, in Figure 2-4, the role reading from right to left produces the sentence:
Each Order may be composed of one or more Line Items.
From left to right, it reads:
Each Line Item must be part of one and only one Order.
These are exactly the sentences we had in the Barker-Ellis version.
…that is, <relationship name>
Note that the position of the role names has been reversed from that of
the entity/relationship model in Figure 2-3. This is because instead of
the optionality half-line being a dashed or solid line next to the first entity
class, the “must be”/ “may be” symbol is now described by the symbols
“..1” and “..0”. This is closer to the second entity class, thereby making
the sentence more intuitive.
42 UML and Data Modeling: A Reconciliation
A business rule, outside this model, may assert that the same Party may
not be both a customer in and a vendor in the same Order.
UML and Essential Data Models 43
Product. From the perspective of Line Item, there are two properties that
have the same name.
The MagicDraw Tool is adamant about enforcing this, but that’s only if the
association is owned by (“property of”) the subject class. To fix it, make sure
that each Role Name is owned by the relationship, not the subject class.
(Unfortunately, the class is the default, so you have to go through and change
them all manually.) Then, if you have two relationships that share a role
name, it isn’t an issue. “for” is a property of each association separately, so
there is no conflict.
34
James Rumbaugh, et. al. 1999. Op. cit. P. 449.
UML and Essential Data Models 47
Different companies have different conventions for naming tables.
Here, the convention (adopted by the Oracle community) is that entity
class names are single, because each class represents one concept. Table
names are plural, because each table contains many instances. This is
truly an arbitrary distinction, but it comes in handy in many cases.
UML conventions call for “camel case” (or “camelCase”?). This involves
first removing all spaces. Then, for classes, the first letter of each word is
capitalized. For attributes and role names, all component words have
capital first letters except for the first one.
35
James Rumbaugh, et. al. 1999. Op. cit. 415.
48 UML and Data Modeling: A Reconciliation
36
Richard Barker, 1990, Op. Cit.
50 UML and Data Modeling: A Reconciliation
37
For more on value sets and code sets, see David C. Hay. 2006. Data
Model Patterns: A Metadata Map. (Boston: Morgan Kaufmann).
UML and Essential Data Models 51
38
James Rumbaugh, et. al. 1999. Op. cit.
MagicDraw, the software used to produce all the model drawings in this
book, has only one shortcoming that I have discovered: a limit on
attribute name length. Hence “Internal Organization Type” had to be
abbreviated. The idea should be clear, however.
In a typical model of any size there can be many …Type entity classes.
Each one has the attributes “Name” and “Description”. Displaying
them isn’t very useful. Displaying the list of values in each case is much
more useful.
52 UML and Data Modeling: A Reconciliation
Namespaces
UML is organized around the object-oriented concept of namespace, which
refers to a collection of objects to be treated as a named group. The
architecture of object-oriented programming is predicated on this concept.
The problem with this, however—as described above in the section on
relationships—is that it makes it extremely difficult to treat a
subject/predicate/object as a logical unit, since the object class cannot be
contained in the namespace that is the subject class. A “property” of the
subject class is the rolename only, without the class that is being named. For
example, in Figure 2-11, above, “an example of” can be a property of Internal
Organization, but “an example of Internal Organization Type” cannot.
Moreover, since in the object-oriented world, there can be no duplicate role
names, there could not also be “an example of Internal Organization Class”, or
some such—even though the total meaning of the predicate is not duplicated.
Here is another place where the architectural entity/relationship modeler
must part company from the object-oriented designer.
Persistent Data
At the technology level, one significant difference in point of view between
the object-oriented design and database design camps is the fact that the
latter are concerned with storing data. This means the structure of data is
important, since it determines how to organize data that are relatively
permanent. Object-orientation, on the other hand, originated in real-time
systems, and it is thus primarily concerned only with the manipulation of data.
Originally, the data describing the objects manipulated (chiefly fluid flows
and the valves that controlled them) were not expected to survive the end of
the program’s processing. Object-orientation became a force in commercial
UML and Essential Data Models 53
data processing in about 1980, with the advent of the personal computer and
its graphical user interfaces. As in process control, windows and cursors are
transient. By virtue of the applications driven by these interfaces, however,
the object-oriented world is now required to deal with persistent data – data
describing objects that continue to exist after the program is complete. Thus,
the issues of connecting to a database, with its different kinds of data
structure, take on new importance. What kinds of transformations are
required to convert to that structure? Who defines class structure, and how
are object-oriented classes mapped to a database? These have become
important questions, which, in each case, require a great deal of negotiation
to resolve.
Inheritance
Most significantly, the way data are organized in an object-oriented program
is very different from the way they are organized in the industry’s most
popular storage medium—the relational database. The structure of object-
oriented data makes extensive use of inheritance (super-types and sub-types)
that cannot be supported directly in relational databases. In addition to being
an issue between object-orientation and data management technologies, the
problem of this impedance mismatch actually occurs within the data world as
well. An architectural data model makes extensive use of this same object-
oriented concept. Invariably, such models have many super-type/sub-type
combinations. From the beginning, it has always been a challenge to translate
an entity/relationship model with sub-types into a relational database, with
its simple two-dimensional tables and their rows and columns.
Within the data world, however, that’s a translation problem that only occurs
when designing a database. Database designers have always recognized two
alternatives for dealing with it. Neither is perfect, but each is workable under
the right circumstances.
Make one table for the super-type, subsuming all the sub-types. This
has the following effects:
Columns specific to one of the sub-types must accept null values for
the instances that are of another sub-type.
Relationships to one of the sub-types cannot be mandatory.
Make one table for each sub-type including the super-type columns
in each. This has the following effects:
The super-type columns are duplicated for each sub-type table.
54 UML and Data Modeling: A Reconciliation
Make one table for the super-type with its attributes as columns, and
one for each sub-type with its attributes as columns. A mandatory,
one-to-one foreign key is established from each sub-type to the
super-type. A constraint must then be added to ensure that each
instance of the super-type is referred to by exactly one instance of a
sub-type.
This eliminates the shortcomings of both of the previous approaches,
but the constraint mentioned can be very complex to implement.
If the assignment happens once and the issues are resolved for a particular
database design, that is a reasonable task.
When an object-oriented designer or programmer has to map an object-
oriented data structure to a relational database, the same (imperfect)
translation process is required, but it must be navigated frequently, on an
ongoing basis. This may require the program to be more complex, but again,
once the issues have been resolved, implementing the translation should be
straight-forward.
This should not be an issue, although it has generated a disproportionate
amount of discussion.
Security
The object-oriented designer and programmer are accustomed to creating
object classes as necessary to solve particular problems. Even if they are
presented with a conceptual data model as a starting point, there is nothing in
their normal work practices to keep them from at least adding system-
oriented classes. Indeed, many of the classes are only defined to last for the
life of the program’s execution. Only “moral persuasion” discourages them
from changing the architectural model’s class structure.
Once stored, the persistent data are the responsibility of the database
administrator, who must implement the mechanisms for enforcing security
policies. Above all, the database administrator is responsible for the integrity
of the database. This includes both the database structure and its contents.
The object-oriented designer and programmer are, in fact, obligated to
respond to the security requirements identified by the database administrator.
Persistent object classes should not be created without approval from the
data base administrator. Programs retrieving data and updating them in a
UML and Essential Data Models 55
Summary
This chapter laid out the essential distinctions between data modeling and
object-oriented modeling.
The issues are actually more subtle than first appears, however, since the data
modeling world itself is divided between the database design and the
architectural modeling perspectives. So, in this chapter, we distinguished first
between architectural entity relationship modeling and object-oriented design
(as represented by UML). After that, we distinguished between relational
database design and UML’s object oriented design.
In the first category, there are five issues between architectural models and
object-oriented design models. Specifically, creating an architectural model
using the UML notation will require:
Limiting the classes addressed to the things of significance to the
domain.
Addressing behavior with models other than the class model.
Instituting a discipline in the naming of relationship/association
“roles”, and modifying the assumptions of UML to allow this
approach to be captured as “role names”.
Making use of data types and enumerations to address the
entity/relationship concept of “domain”.
Understanding the constraints imposed by “namespaces”.
In the second category, there are three differences between the two design
approaches that have generated friction between these two groups for many
years:
Objects created in an object-oriented program do not “persevere”
unless provision is made to store it. Storing it in a relational database
is not straightforward.
39
Ted Neward. 2006. Op. Cit.
56 UML and Data Modeling: A Reconciliation
Your author used the MagicDraw software sold by No Magic, Inc. to
prepare his book, Enterprise Model Patterns: Describing the World. He
found it very versatile in dealing with the non-standard UML
structures in this book. It was able to handle every modification
described here. Should you find that other tools make such
adjustments either very difficult or impossible to do, please notify
him at [email protected].
57
58 UML and Data Modeling: A Reconciliation
Association (relationship)
Cardinality for attributes and relationships
Exclusive or (xor) Constraint
Use some UML-specific symbols with care
Sub-types and Relationship Sub-types
Enumeration
Derived Attribute
Package
Do not use any other UML symbols
Abstract entities
Association class
Behavior
Composition
Navigation
Ordered
Visibility
Add one symbol
<<ID>> stereotype
(Or use new property {isID})
3. Define Data Model Relationship ends as Predicates, not UML
Roles
4. Define domains
5. Understand “Packages and “Namespaces”
6. Follow Display Conventions:
Sub-types – Show sub-type boxes within super-type boxes.
Spaces in Names – Include spaces inside multi-word entity
class and attribute names.
Role Positions – Position the predicate next to the object
entity.
How to Draw an Essential Data Model in UML 59
40
Richard Barker, 1990, Op. cit. 4-1.
How to Draw an Essential Data Model in UML 61
Also by convention in this book, UML entity classes are shown in
lower case, with an initial upper case character. Where models are
shown using the Barker-Ellis notation, entity class names are shown
entirely in upper case. (This is a restriction of the modeling tool used
for those.)
If the acronym is widely accepted in the organization, and if
everyone agrees on what it means, and if to spell it out would be too
long and clumsy, then it may be permissible to use it in an entity
class name. Maybe.
62 UML and Data Modeling: A Reconciliation
Attribute
The UML definition of attribute is effectively the same as that for
entity/relationship modeling. That is, it is a characteristic of an entity
class that “serves to qualify, identify, classify, quantify, or express the
state of an entity”.41
Note the phrase “state of an entity”. Each attribute must be about the
entity it is attached to. If it is describing a related entity, it should be an
attribute of that entity, not this one.
As with entity class names, attribute names must be common English
names for the characteristics involved. Abbreviations and acronyms
should be avoided. In general, it is not necessary to include the entity
class name in the attribute name, but in some companies, standards
dictate that the entity class name be inserted in front of at least the
common attributes—as in for example, “Person Name” and “Person
ID”.
In Figure 3-2, below, attributes of Order are “Order Number”, “Order
Date”, and “/Total Value”. Attributes of Line Item are “Line Number”,
“Quantity”, “Price”, “Delivery Date”, “/Extended Value”, and /Order
Number. That is, the definition of the entity class Line Item is that it is
any set of entities that have values for the attributes “Line Number”,
“Quantity”, “Price”, “Delivery Date”, and “/Extended Value”.
41
Richard Barker. 1990. Op. cit. 4-6.
How to Draw an Essential Data Model in UML 63
Association (Relationship)
In architectural entity/relationship modeling, a relationship between two
entity classes consists of two assertions about them (one going each way).
Each assertion is a logical statement of fact about the area of interest
being modeled, not simply recognition that two things are somehow
associated with each other. This can be described in a UML diagram
By convention, in other tools, your author always surrounds derived
attributes with parentheses, which is useful for exposition, but it is
not logically connected to the model—although in Oracle Designer,
the derivation can be documented explicitly in the repository behind
the tool.
Indeed, the only time an attribute might be “read-only” would be if it
were a duplicate of another attribute being maintained. This would be
a flaw in the model.
64 UML and Data Modeling: A Reconciliation
using the symbol for an association, but the way it is labeled is different
from the way it is labeled in the UML design context.
It is important to keep the structure and language of architectural
entity/relationship modeling when using UML. A relationship going each
way represents a strong assertion about the nature of the organization
being modeled. The UML concept of rolename applies, but its use for
the architectural model, is very different from the way it is used in a
design model. (Chapter 2, above, had a much more extensive discussion
of the differences between UML associations and data modeling
relationships. See pp. 34-45.)
The notation in an architectural model can be read according to the
following template:
Each
<role name>
One or more …if the second entity class has “..*” next to it
(or)
exactly one … if the second entity class has “..1” next to
it.
For example, the relationship shown in Figure 3-3 translates into the
following two sentences:
Each Order may be composed of one or more Line Items.
Each Line Item must be part of one and only one Order.
See page 69 for a description of positioning these elements on the
drawing.
How to Draw an Essential Data Model in UML 65
Cardinality
Both entity/relationship modeling and UML object-oriented modeling
recognize that attributes and relationships are both properties of an entity
class. In entity/relationship notations, maximum and minimum
66 UML and Data Modeling: A Reconciliation
As mentioned at the beginning of this chapter, your author used the
modeling tool “MagicDraw” from No Magic, Inc. and found it to be
an outstanding tool for developing conceptual UML models. There
was one problem, however: All 1..1 roles were displayed as “1”. It
was several months into the project before the company’s Technical
Support Group finally described to him the “secret handshake” for
forcing it to display “1..1”. Contact him at
[email protected] if you want to know more.
Actually, compressing the optionality configuration to “[0]” for
attributes would make it more symmetrical.
68 UML and Data Modeling: A Reconciliation
for for
42
David C. Hay. 2006. Data Model Patterns: A Metadata Map. (Boston:
Morgan Kaufmann).
70 UML and Data Modeling: A Reconciliation
and to another Party. Second, the diagram shows that each Employment
Contract must be for one Person and with one Organization.
More significantly, it also shows that:
Employment Contract for Person is a sub-type of Party
Relationship from Party.
Employment Contract with Organization is a sub-type of
Party Relationship to Party.
And of course for the inverses:
Person employed via Employment Contract is a sub-type of
Party on one side of Party Relationship.
Person employer in Employment Contract is a sub-type of Party
on the other side of Party Relationship.
<<Enumeration>>
Commonly, in entity/relationship models, a …Type entity describes a
list of elements used to qualify another entity class. For example, each
Status may be an example of one and only one Status Type. The attributes
of such an entity class are typically just “Name” (or “Code”) and
“Description”. It is the list of instances that are actually more interesting
to the reader of a model. In this case, instances might be “Pending”, “In
Force”, and “Closed”. Identifying these instances invariably has to be
part of the documentation of the model.
UML has a very clever way of dealing with such “lists of values”. A UML
stereotype is defined to characterize a class specifically defined for this
purpose. A box labeled <<enumeration>> displays not the attributes,
but the list of values. An <<enumeration>> is a special kind of class
used to capture and display such a list of values. In an entity/relationship
model, where the number of instances is relatively short, this UML
feature can be a nice enhancement. Figure 3-7 shows an example where
Status Type is a list of possible values for the Project Status attribute
“Status Type”. Note that “Status Type” has a data type of “Internal
Organization”. This has an effect similar to that of a foreign key in a
relational database, where the attribute name is equivalent to an
enumeration name. It is not necessary to explicitly define the relationship
between the entity class and the enumeration, but it does clarify the
existence of the relationship, so it is shown in this drawing.
Enumeration was discussed in detail in Chapter 2 (page 48 ff.), and is
discussed briefly below, in the discussion on Domains, below (page 82).
Beware, though, in entity/relationship models it is also possible to use
the ...Type entity class to reproduce the sub-type structure of the entity
class being “typed”. For example, the entity class Party typically has many
sub-types. The entity class Party Type, then, reproduces all of those sub-
types in a list. That is, the first set of instances of Party Type would be
“Person”, “Organization”, “Company”, “Government Agency”, etc. An
additional relationship asserting that each Party Type may be the super-type
of one or more (other) Party Types neatly accommodates the structure of
Party.
That is, Party Type, Geographic Location Type, Activity Type, etc.
72 UML and Data Modeling: A Reconciliation
Derived Attributes
Most entity/relationship notations have no special symbol for derived
attributes, and this is a shortcoming in the available tools. So, by
convention, one can always surround an attribute name with parentheses
if it is derived, and Oracle Corporation’s Designer does provide for
documenting the formulae for such an attribute.
Using derived attributes is an extremely powerful way to present many
important concepts in a model, and using them, when appropriate, is
highly recommended. Back in ancient times (around 1981), your author
encountered a database management system (called “Mitrol Information
Management System”—MIMS) that had a feature he had never before
43
Both of these examples were taken from the companion volume to
this book: David C. Hay. 2011. Enterprise Model Patterns: Describing the
World. Bradley Beach, NJ: Technics Publications.
How to Draw an Essential Data Model in UML 73
(and has never since) encountered. You could specify a “computed field”
in a file. This was not stored, but any time you did a query on the file you
could refer to that field just as though it were any other kind of field. The
computer was then clever enough to compute the value on the fly, based
on other fields in the database. That proved to be an incredibly powerful
feature that allowed development of massive manufacturing applications
in a very short time.
Of course once you ran a query, the lights tended to dim. Having all the
values calculated at query time proved to be, shall we say, not entirely
satisfactory. Clearly it was useful to calculate many of those values when
you entered the data. This becomes a design tradeoff, depending on how
stable the results were and how frequently they will be requested. The
criteria to be applied to that decision are different now, since computer
processing is many times faster than it was in 1981. The point is still
valid, however, since we can be sure the demands on the computer have
increased as well.
Even so, the concept of computed fields established a profound logical
structure. Even in a properly normalized structure, it is possible to
imagine attributes of entity classes that are not “stored”, but are derived
from others. As a way to fully present and understand complex data, it is
extremely valuable to include them (with formulae properly documented,
of course) in the essential data model.
To be sure, it is a design decision (based on volume and expected usage)
whether to have the calculations done when the data are coming into a
database or when the data are being retrieved. But to be able to describe
the calculations in a meaningful way–in context–has a profound effect on
the thoroughness and readability of a model.
In MIMS, the language for expressing values was simple algebra, plus two
very significant functions:
INFER-THROUGH (<relationship name>, <Entity Class Name>
<attribute name>) - Treat an attribute in a parent entity class (via the
specified relationship) as though it were in this entity class.
SUM-THROUGH (<relationship name>, <entity class name>,
<attribute name>) - compute the sum of the values of a specified
attribute in all related instances of the child entity class, via the specified
relationship.
74 UML and Data Modeling: A Reconciliation
Package
In the object-oriented world, a package is “a general-purpose mechanism
for organizing elements into groups”44. Specifically, packages are
“containers whose purpose is to group elements primarily for human
access and understandability, and also to organize models for computer
storage and manipulation during development.”45 In the
entity/relationship context, a package can be defined for a subject area,
Pity that.
44
James Rumbaugh, et. al. 1999. Op. cit.
45 Ibid., 354.
How to Draw an Essential Data Model in UML 75
MagicDraw, for one, is perfectly capable of placing an entity class
into any package that it chooses, if the correct package is not clearly
identified when the entity class is defined. It is valuable to take
control of this. Well organized packages (as with well-organized
directories in MS-Windows) are important in the management of
large models.
76 UML and Data Modeling: A Reconciliation
Blame Bell Labs in the United States for “octothorpe”. The name
was created by this subsidiary of American Telephone and Telegraph
Company. (The current high-tech company, AT&T doesn’t advertise
that “telegraph” used to be part of its name.) In 1963 they came up
with this name for the key showing “#” on the newly invented
“Touch-tone“ telephone keypad. In the United States, this is
known as the “number sign” or the “pound sign”. In the United
Kingdom, it is known as the “hash mark”. (It is not called the
“pound sign” since there is already a symbol (£) with that name.) A
neutral, international word like “octothorpe” seemed like a good idea
at the time. Pity it didn’t take off.
How to Draw an Essential Data Model in UML 77
CONSTRAINED of PERSON
# PERSON ID
PROJECT * FIRST NAME
subject to
ASSIGNMENT o MIDDLE INITIAL
o SCHEDULED START DATE * SURNAME
o SCHEDULED END DATE
...
to
OPEN PROJECT
ASSIGNMENT of
# SCHEDULED START DATE
o SCHEDULED END DATE subject to
o ACTUAL START DATE
o ACTUAL END DATE
to
PROJECT
# NAME
* DESCRIPTION
* SCHEDULED START DATE
o SCHEDULED END DATE
o ACTUAL START DATE
o ACTUAL END DATE
UML stereotypes actually have an advantage over the two most
commonly used entity/relationship notations. The notation for
predicates is more visible than the Barker-Ellis line, and it is less
imposing than the Information Engineering solid and dashed lines.
How to Draw an Essential Data Model in UML 79
…rim shot… (See Glossary, if necessary.)
How to Draw an Essential Data Model in UML 81
3. Define Domains
In entity/relationship models, the common set of characteristics for a set
of attributes is called a domain. It can be as simple as a format, it could
be a list of values, or it could be some sort of validation formula.
Mr. Barker defines a domain as follows:
“A set of business validation rules, format constraints, and other
properties that apply to a group of attributes: for example:
a list of values
a range
a qualified list or range
any combination of these.
How to Draw an Essential Data Model in UML 83
“Note that attributes and columns in the same domain are subject to the
same validation checks.”46
In addition to the things listed above, a domain can also be described by:
data type
length
a list of illegal values
edit rules
a mathematical expression
precision factor
Some entity/relationship-oriented CASE tools have explicit support for
documenting domains behind the scenes as part of an attribute’s
documentation—others do not.
UML does not have what entity/relationship modelers are accustomed to
calling “domains”. Its concept of data type, however, is extensible and
can be used in the same way.
First of all, UML has a basic list of what are standard “data types”. This
describes the nature of characters that can be used: “String”, “Number”,
“Date”, and so forth. This list can be expanded to include the kinds of
domain constraints just described.
One alternative to specifying a domain is to represent it as a “reference”
entity type. For example, Internal Organization might have had the
attribute “Internal Organization Type” with a domain that is a list of
values such as “Division”, “Department”, “Section”, etc. This could be
documented in the definition of the attribute, or it could be shown as an
entity class. This is a solution for both entity/relationship and UML
modeling.
In addition, UML, as was described above, can display a “list of values”
explicitly with an <<enumeration>>. (See page 71.) Enumerations are
described in more detail in the last section. (See page 48.) Note in Figure
2-12 (page. 30), that Internal Organization has an attribute “Internal Org
Type”, which has as a data type “internal”. This is linked to the
<<enumeration>> Internal Organization Type. Instead of displaying the
assumed attributes of “Name” and “Description”, the diagram shows a
46
Richard Barker, 1990, op. cit., G1-3.
84 UML and Data Modeling: A Reconciliation
4. Understand “Namespaces”
In UML, a namespace is “a part of the model in which the names may be
defined and used. Within a namespace, each name has a unique meaning.
All named elements are declared in a namespace, and their names have
scope within it. The top-level namespaces are packages (including
subsystems), containers whose purpose is to group elements primarily for
human access and understandability, and also to organize models for
computer storage and manipulation during development.”47
That is, a namespace is “owner of” a set of objects, and no duplicate
names are allowed within that namespace. In our UML version of an
entity/relationship model, the parent package that is the entire model,
plus any sub-packages we may define, plus each entity class, are all
namespaces. So is each relationship. The rule against duplicate names
only applies at the lowest level of namespace, so you cannot have
duplicate attribute names within a class, but you can have duplicate
attribute names across classes. You cannot have duplicate class names at
the lowest package level, but you can duplicate class names across
packages.
As just mentioned, in a UML class model, each class is defined as a
namespace. One class cannot be contained in another class’s namespace,
which means that for a property that is a relationship, the object entity
class in a predicate sentence cannot be contained in the subject class’s
namespace. More significantly, this is why, in the design model, role
names (properties of the subject class) are used only to label object
classes, not to describe the entire relationship. (See the discussion of
“Associations and Relationships” in Chapter Two.)
This is the source of problems we entity/relationship modelers have in
naming relationships, since the object-oriented community sees attributes
and relationships as properties of the subject class. The solution rests in
the fact that the association is also a namespace. Both classes and the
predicate can be members of that namespace.
Name Formats
Since entity/relationship models are intended to be presented to non-
technical people, the names of entity classes, attributes, and relationships
should be as readable as possible. So include spaces between two word
names: “Line Item”, not “LineItem”. And capitalize each word in either
an attribute name or an entity class name. Don’t capitalize at all in
relationship names.
Role Positions
Position the predicate next to the object entity class. Since both
cardinality terms are there, it makes it easier to read the relationship
sentence.
For those who use both the Barker/Ellis notation and UML, the position
of role names can be confusing, since relationship names are at the other
end for the Barker/Ellis notation. To keep your sanity, in each notation
always position the predicate so that it can be read in a clockwise
direction. This means, (in UML) for a vertical relationship line, the lower
one is to the right and the upper one is to the left. For a horizontal
relationship line, the one to the right should be above and the one to the
left should be below. In each case the cardinality terms are on the other
side of the line at the same end as the role to which they refer.
For example, Figure 3-8, above, is reproduced in Figure 3-11, below.
Note that each relationship sentence is constructed by reading clockwise:
“Each Line Item must be (1..) part of one and only one (..1)
Order.”
“Each Order may be (0..) composed of one or more (..*) Line
Items.”
86 UML and Data Modeling: A Reconciliation
Cardinality Display
Because it is a very common combination, the cardinality “must be one
and only one”, which should appear as “1..1”, is collapsed into the single
symbol “1” in UML modeling tools. This is unfortunate, since the two
components form the two parts of the natural language sentence
describing the predicate. If at all possible, adjust the tool to make all of
these appear as “1..1”.
Summary
To draw an architectural entity/relationship model, using the UML
notation, follow these steps:
1. Show domain-specific entity classes only.
2. Use symbols selectively.
Use only recommended symbols.
Optionally, use selected UML symbols not available normally
to entity/relationship modeling.
Do not use UML design-oriented symbols.
You should be aware that it took considerable time for your author
to discover the “secret handshake” for doing this in No Magic, Inc.’s
product “MagicDraw”. Eventually I found a technical support
person who knew how to do it, however. Interested parties should
contact me ([email protected]) for the secret.
How to Draw an Essential Data Model in UML 87
89
90 UML and Data Modeling: A Reconciliation
instance of the second class; “..*” means that it can be associated with an
unlimited number of instances of the second class.48
Thus, the relationship portrayed in Figure 4-1 shows cardinality and
optionality in graphic terms. The optionality and multiplicity can be seen
immediately. The equivalent relationship shown in Figure 4-2 shows
these characteristics portrayed with digits and other characters. Instead of
seeing these concepts graphically, the viewer has to read the symbols
before understanding them.
48
For a comprehensive comparison of data modeling notations,
including all described here, see Appendix A of [Hay, 2003] or go to
http://www.essentialstrategies.com/publications/modeling/compare
.htm.
Aesthetic Guidelines and Best Practices 91
This chapter presents principles and best practices for documenting and
presenting an architectural entity/relationship model, regardless of the
notation used. Many data modelers are, shall we say, cavalier about these
practices, but they are important if you are using Information
Engineering, IDEF1X, or Barker-Ellis, and—because of the inherent
disadvantage described here—particularly so if you are using UML.
The basic principles are:
1. Place sub-type boxes inside super-type boxes.
2. Eliminate bent lines
3. Orient the “many” ends of relationships to the left or top of the
diagram.
This is in addition to the typographical standards described in the last
chapter. The following guidelines apply no matter which notation you
are using.
PARTY
* NAME
PERSON
* BIRTH DATE
ORGANIZATION
* PURPOSE
INTERNAL
ORGANIZATION
GOVERNMENT
COMPANY
GOVERNMENT
AGENCY
POLITICAL
ORGANIZATION
HOUSEHOLD
One Problem
Unfortunately, to use this notation in UML, we have to recognize that it
has already been taken. In UML version 2.0, a composite structure
diagram is used to describe run-time architectures that aren’t clear from a
typical object or class diagram. Specifically, “UML 2 has added a
composite structure diagram that shows the participating elements and
their relationship in the context of a specific classifier such as a use case,
object, collaboration, class, or activity.”49
Solution
According to the UML specification, a composite structure is “a
composition of interconnected elements, representing run-time instances
collaborating over communications links to achieve some common
objectives.”50 That is, it is analogous to a product structure, where the
interior boxes are components of the outer box.
A composite structure diagram is represented by a larger rectangle, with
its components (which may be classes, packages, or operations)
contained as symbols within it. To look at the diagram shown in Figure
4-5 as a composite structure diagram is to imagine that Person and
Organization are components of Party, not sub-types of it.
To solve this ambiguity, Figure 4-5 includes the generalization lines in
the boxes. This keeps the aesthetic orientation we are looking for, but
signals to UML aficionados that this is inheritance, not composition. This
should not really be an issue because any viewer of this model should
understand that it is an architectural model describing an enterprise, and
not a run-time model describing a physical system.
Constraints
In the approach to entity/relationship modeling described in this book,
there are three constraints on the treatment of super-/sub-types:
49
H-E. Eriksson, Magnus Penker, Brian Lyons, and David Fado, 2004,
UML 2 Toolkit. (Indianapolis: Wiley Publishing), 34.
50
Object Management Group, “OMG Unified Modeling Language”
(OMG UML), Superstructure, V2.1.2”. OMG Document Number:
formal/2007-11-02, 161.
96 UML and Data Modeling: A Reconciliation
Categories
There are more flexible ways to categorize things, of course, but these
should be represented in a data model separately, without using sub-
types. Figure 4-6 shows a reasonable way to do this for Party, by adding
the entity classes Party Category and Party Categorization. A Party
Categorization is the fact that a particular Party falls into a particular
Party Category for a period of time. That is, each Party Categorization must
be of exactly one Party into exactly one Party Category—at a particular
time. The Party Categorization is effective on an “Effective Date”, and
ceases to be effective on an “Until Date”.
This structure allows a Party to be categorized into multiple Party
Categories at a time, and also allows for that Party Categorization to
change over time.
Note that, in addition to being of a single Party, the Party Categorization
must be by a single Party, as well. This supports the concept of data
stewardship, and, moreover, recognizes that different people in an
organization might well categorize Parties differently. Parties may be
subject to different Party Categorizations by different Parties, each for its
own purpose. For example, the Internal Organization “Market Research
Department” might place the Household “Hay family” in a different
Party Category than the Internal Organization “Sales” does.
Moreover, each Party Category must be defined by exactly one Party. The
set of Party Categories that is of interest to Market Research may be very
different from the set of Party Categories that is of interest to
Accounting. A Party must be appointed as a steward for every Party
Category.
For example, a Party Category that would apply to Person could be
“Income Level”. This might be defined by the Internal Organization,
“Market Research Department”. The Person “David Letterman”, then,
would be subject to Party Categorization into the Party Category with the
“name” “Over $500,000”. This might be according to the Person “Sam
Sneed”, who happens to be Mr. Letterman’s Gardener.
Sometimes a Government may be the owner of a Company, but that
is a different kind of relationship.
98 UML and Data Modeling: A Reconciliation
Note that in this case, the one example of crossed lines is not a problem.
The only way to interpret it is as two relationships that happen to cross
each other.
Still, the overall structure is not yet as clear as it could be.
what is describing those things. Tests are performed on Samples, and these
are the source of Measurements.51
Among other things, the resulting drawing can be made more compact.
Presentation
Probably the worst thing to happen to data models was the invention of
the plotter. This permits modelers to create wallpaper-sized models that
are completely unintelligible.
Presentation Rule
If you have a plotter at your disposal, turn it off.
Quietly and carefully…walk away. Pretend it
does not exist.
51
Yes, it’s true that there are some heretics among you (Canadians?)
who prefer to orient the relationships with the “many” end towards
the right and the bottom. “How can this be?” we ask.
(Ok, it can be, and it actually is fine, as long as you adopt it as a
convention and use it consistently.)
Aesthetic Guidelines and Best Practices 103
One sad victim of the assent to power of the PowerPoint®
presentation is the overhead projector. With physical slides, you can
mark them up, erase the markings, and then mark them up again with
a different color. In ancient times (say ten years ago), the measure of
success in a feedback session was the number of scribbles on the
transparencies. Nothing in the Microsoft world can compare to that.
104 UML and Data Modeling: A Reconciliation
Summary
What distinguishes an entity/relationship model from either a UML
design model—or a database design, for that matter—is that its first
purpose is to be presented to the business community. It will be
presented to human beings, most of whom have no prior experience with
data models and who have little patience with things technical or
technological. For this reason, the aesthetic characteristics are critical.
This chapter laid out the basic guidelines for producing a model that can
be presented to a non-technical audience. Given the textual business of
UML’s approach to cardinality, some patience will be required from the
audience initially, but if all other guidelines are followed, the presentation
should be a success.
Success, you must note, is measured in terms of the number of
corrections identified. If no changes come out of the meeting, you failed.
The reason the relationship names are as explicit as they are is so that the
52
G. A. Miller. 1956. “The Magical Number Seven, Plus or Minus Two:
Some Limits on Our Capacity for Processing Information”, The
Psychological Review, Vol. 63, No. 2 (March, 1956). Pp. 81-97. Available
at http://www.musanim.com/miller1956/.
Speed dialing was invented just in time…
Aesthetic Guidelines and Best Practices 105
viewer will not be able to say simply “Sure, looks ok to me.” In each case,
the assertion should be clearly true or truly false. And the viewer should
be compelled to speak up if it is false.
The basic principles are:
1. Place sub-type boxes inside super-type boxes.
2. Eliminate bent lines.
3. Orient the “many” ends of relationships to the left or top of the
diagram.
This is in addition to the typographical standards described in the last
chapter.
Chapter 5: An Example:
Party
As an example of both the use of UML notation to describe an
architectural model, as well as best practices for presenting any
architectural model, this chapter consists of an excerpt from the
companion book, Enterprise Model Patterns: Describing the World.
Take note of the following characteristics:
1. No more than five entity classes are highlighted in each Figure.
2. Logical business rules (extensions of the logic of the model,
rather than rules that are business policy) are included.
*
Yes, it is possible to define an organization without specifying the
people it contains. The presumption is that even if it is being
postulated, it is expected to consist of human beings.
107
108 UML and Data Modeling: A Reconciliation
detail later on, but here for the sake of argument, Person has the attribute
Birth Date, and Organization has the attribute Purpose.
Five kinds of Organization (sub-types) are shown:
Parties
People and organizations share many attributes and relationships to other
entities. A corporation is, after all, a legal Person. Both people and
organizations can be described by “Names” and “Addresses”, and both
may be party to contracts. For this reason, while Person and
Organization are useful entities, so too is the super-set of the two, which
we will here call Party. This is shown in Figure 5-1.
In this example, Party has the common attributes Global Party Identifier
and Name. In Figure 5-1 we saw that Person has the attribute Birth Date,
and Organization has the attribute Purpose. Now we can see that both
Person and Organization also have Global Party Identifier and Name.
Of course, Person actually has two names (plus a middle initial, if you
want to get thorough). This could be handled by moving Name to
Organization and giving Person First Name and Last Name. An
alternative is shown here, with the principle Name being equivalent to a
*
Note that according to the data modeling approach being followed in
this book, sub-types do not overlap. The same Organization may not
be both an Internal Organization and a Company. Note also that the
constraint that every instance of a super-type must be an instance of
one sub-type. This constraint is finessed by including Other
Organization for the ones we haven’t thought of yet.
110 UML and Data Modeling: A Reconciliation
Person’s surname, and only the given name is specific to Person. Names
will be treated in a much more comprehensive way, below.*
Notice that Global Party Identifier is specified as a unique identifier for
both Person and Organization. This is a fairly strong assertion: no Person
can have the same identifier as any Organization. The modeler may,
instead, choose to have a separate set of identifiers for Person (say,
Person ID) and for Organization (say, Organization ID).
The answer to the question of which approach to take is, as they say,
beyond the scope of this book. This is, after all, only a “pattern”. It is for
the reader to apply it appropriately.
Note that in Figure 5-2, each Party must be an example of exactly one
Party Type. This is a structure that we’ll see again often. In this case,
Party Type is defined as the definition of a kind of Party. That each Party
must be an example of exactly one Party Type suggests some overlap with
the sub-type structure, where each Party must be either a Person, a
Company, a Government Agency, and so forth. Indeed, Party Type does
exactly reproduce the sub-type structure. That is, instances of Party Type
include “Person”, “Organization”, “Company”, and so forth. Note the
relationship that permits us also to say that, for example, “Company”
may be a sub-type of “Organization”.
The generalization structure of Party is further represented by the
assertion that “each Party Type may be a super-type of one or more other
Party Types”. (Alternatively, “each Party Type may be a sub-type of one
and only one other Party Type”.)
This redundancy is a bit of a gimmick, but it allows us both to see
graphically the principal categories of Party, and to make use of them in
the formulation of business rules. We will see examples of this later.
Business Rule
Instances of Party Type must at least include one for
each sub-type of Party shown. Other, more detailed
instances are permitted.
*
Henceforth, unless there is reason to display them (as there was here,
to make a point), attributes will only be shown in the first figure
where the entity class appears. For the sake of clarity, they will not
be shown after that.
An Example: Party 111
Party Relationships
People are related to each other; people belong to unions and clubs;
departments are contained in divisions; companies band together into
industrial associations, buying groups, and so forth.
To address this diversity of possible associations among Parties, the
entity class Party Relationship is introduced in Figure 5-3. This allows us
112 UML and Data Modeling: A Reconciliation
to represent any relationship between two parties. That is, each Party
Relationship is defined to be from one Party and to another Party.
Note the “<<ID>>“ symbol next to both the attribute and the role
names. This means that George Smith can have two different values of
Employee ID. There could be a business rule preventing that, but if the
situation exists, it should be captured—and then revealed.
Tables 5-1 and 5-2 show examples of tables derived from these entity
classes. The names of identifying columns are highlighted. In Table 5-2,
the reference to for Party Identifier matches the value of Party
Identifiers Name shown in Table 5-1. Note that including the Identifier
Value to the unique identifier allows for George Smith to have two
values for “Employee Number” at two different times. Without that,
George Smith could not be employed more than once.
<<ID>> <<ID>>
assigned for <<ID>>
to Party Identifier issued by Effective Until
Party Identifier Value Party Date Date
Marlon Social 234-99- Social Security February July 1,
Brando Security 3122 Administration 4, 1940 2004
Number
The Employee 99- Social Security August 4, --
Company Identification 1234567 Administration 1989
Company Number
George Employee 241533 Lockheed December December
Smith Number Martin 15, 2004 16, 2004
George Employee 242687 Lockheed January August
Smith Number Martin 24, 2006 15, 2009
116 UML and Data Modeling: A Reconciliation
The Figure also shows that each Party Identifier must be managed by a
Party. For example, the concept of the Social Security Number is
managed by the U.S. Social Security Administration. The Employee ID is
managed by the company’s Human Resources Department. Along the
same lines, the actual assignment of an Identifier to a Party is issued by a
Party, such as an Internal Organization, or even the Party itself.
Moving from identifying people to the task of identifying data (instances
of Party Identifier Value, in this case), each instance of Party Identifier
Value is identified by the combination of the two relationships just
described, plus the attribute Identifier Value. That is, there is only one
instance of Essential Strategies having an “Employee Identification
Number” of “99-1234-123”. With this configuration, it is possible for
Essential Strategies, Inc. to have another Employer Identification
Number.
To constrain it so that, for example, Essential Strategies, Inc. may never
have more than one Employer Identification Number, simply remove the
designation of Identifier Value as part of the unique identifier. Then,
there could be no more than one instance of the combination of Party
and Party Identifier.
Note that each Party Identifier must be managed by a Party. For example,
the concept of the “Social Security Number” is managed by the U.S.
Social Security Administration. The “Employee ID” is managed by the
company’s Human Resources Department. Along the same lines, the
actual assignment of a Party Identifier Value to a Party is itself issued by
one Party, such as an Internal Organization, a Government Agency, or
the Party itself.
An obvious attribute for Party, as we have seen, is Name. Well, not
exactly. You can have a name for an organization, but what about
people’s names? In addition to simple first and last name, what about
middle? Or Jr.? What about the title? No, it’s more complex than even
that. For that matter, companies have official names, nicknames, and
stock tickers. And on top of everything else, names change. No, a simple
name attribute just won’t do.
Figure 5-5 shows that, as with Party Identifier, Party Name has been
pulled out from the Party entity class, so that each Party may be labeled by
one or more Party Names. That is, each Party Name must be of one
Party and (significantly) it must be of one Party Name Type. This allows a
woman, for example to have one Party Name that is of Party Name
Type “married name”, and another of Party Name Type maiden name.
An Example: Party 117
Note that each Party Name is uniquely identified by both the Name Text
and the Party being named. The name “David Charles Hay” can only
exist once for the fellow who is the author of this book.
The Company IBM may be labeled by the Name Text attribute of Party
Name, “International Business Machines Corp”. That Party Name is an
example of Party Name Type Name of “official name”. IBM is also labeled
by the Party Name Name Text, “IBM”, where that Party Name is an
example of Party Name Type Name of “official abbreviation”.
shows a history of names for the company known since 2000 officially (if
redundantly) as FedEx Express.
Table 5-3 The Names of Federal Express
Party Name Party Name
Party Effective Date Text Type Name
Federal 6/1/96 Fed Ex Nickname
Express
Federal 1/1/98 ( became Federal Official name
Express subsidiary of FDX Express
Corp)
FDX Corp 1/1/2000 Fedex Corp Official name
Federal 1/1/2000 FedEx Express Official name
Express
Note that while each Party Name must be of one Party, it may also be
issued by a Party. Presumably, in the case of a Person, that Person is
responsible at least for reporting h’ own name (rare is the organization
that requires parents to sign off on someone’s name), but in the case of
Organizations, it could be important who specified the Party Name.
The naming problem for people is more complicated than that. As
pointed out above, people’s names have a complex structure. Each may
be composed of one or more Party Name Components, where each Party
Name Component must be an example of exactly one Party Name
Component Type, such as “first name”, “last name”, “suffix”, “title”, and
so forth. Some Party Name Component Types are constrained to certain
values, though. That is, the Component Name in a Party Name
Component, if it is an example of Party Legal Name Component Value
“Title”, must then be “Mr.”, “Ms.”, “Dr.”, and so forth.
As a modern man, your author is properly reticent to use “he” to
mean “he or she”. As a grammarian, however, he also objects to
using “their” when only one person is involved. (Although he does
acknowledge that this practice goes back at least to the 17th century).
He also finds “he or she” to be incredibly clumsy. So, in the interest
of conciseness and logical consistency, he hereby proposes the
following conventions: ‘e means “he or she”; h’ means either “him or
her” or “his or her”.
Remember, you saw it here first.
An Example: Party 119
Figure 5-5 showed annotations with examples of the primary entity types.
Table 5-4 shows the example in tabular form.
Table 5-4: An Example of a Name
Party Party Party Name Legal Party Party Name
Name Name Component Name Component
Type Type Component
Type Value
Mr. David Birth Title Mr. Mr.
Charles Hay name
II
Mrs.
Ms.
Dr.
Professor
Given name (no list of David
values)
Middle name (no list of Charles
values)
Surname (no list of Hay
values)
Suffix II II
III
IV
Esq.
Business Rule
If a Party Name Role Type is constrained to one or more
Legal Name Component Values, then any instance of Party
Name Component that is an example of that Party Name
Component Type must have as its Component Name the
“value” of one of those Legal Name Component Values.
Translation: If a Party Name Component Type is constrained
to one or more Party Legal Name Component Value, any
Party Name Component must be an example of one of
those values.
120 UML and Data Modeling: A Reconciliation
Constraints
The structure for identifiers and names has the advantage of being
supremely flexible. Any name value can be given to any Person or
Organization. Any identifier value can be given to any Person or
Organization. But only certain kinds of Party Identifiers are appropriate
for certain kinds of Parties, just as only certain Party Name Types are
appropriate for certain kinds of Parties. For example, the Party Identifier
“Social Security Number” is only appropriate for Person. The Party
Name Type “corporate abbreviation” is only appropriate for Company.
Here is where splitting out Party Type becomes useful. Figure 5-6 shows
that we can specify that each Party Identifier is only appropriate for
particular Party Types.
<<ID>> of <<ID>> to
Party Name Party Type
Type
Given Name Person
Corporate Company
Official Name
Location Name Government
Note that in a data model, only the existence of constraints such as these
assignments can be shown. Since they will ultimately be implemented via
program code, rather than database structures, the constraints themselves
must be specified separately. The constraints are:
122 UML and Data Modeling: A Reconciliation
Business Rules
A Party Name Name Text of a Party may only be an
example of a Party Name Type if a Party Name
Assignment exists that is of that Party Name Type
and is to the Party Type that the named Party is an
example of.
That is, a Party may not be labeled by a Party Name,
unless the Party Name Type it is an example of is
subject to a Party Name Assignment to the Party
Type embodied in the Party involved.
Translation: Only certain kinds of Party Names can be used for
each kind of Party.
Summary
Yes, architecture modelers, you can create an entity/relationship model
in UML and have it meet all your requirements—if you’re willing to
adjust your way of looking at the problem (and your sense of aesthetics)
just a little. And yes, UML modelers, you can create a genuine
entity/relationship model and present it to business people—if you’re
willing to adjust your way of looking at the problem (and your sense of
aesthetics), just a little.
A model developed according to the principles described in this book is
an architectural entity/relationship model. It remains to be converted to
an object-oriented design model and/or a database design model.
But lest we get too wrapped up in the perfection of our notation or our
approach, we should remember:
“Essentially, all models are wrong, but some are useful.”53
53
Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building
and Response Surfaces. Wiley. pp. p. 424.
Appendix A: A Brief
Summary of The
Approach
1. Show domain-specific entity cases only. Consider only classes
that are collections of things of significance to the enterprise or
the domain being addressed. These are referred to here as entity
classes.
2. Use UML symbols selectively:
Use the following common symbols from UML:
Class (entity class)
Attribute
Association (relationship)
Cardinality for attributes and relationships
Exclusive or (xor) constraint
Use some UML-specific symbols (with care)
Enumeration
Derived attributes
Package
Do not use any other UML symbols
Abstract Entities
Behavior
Composition
Navigation
Ordered
Visibility
125
126 UML and Data Modeling: A Reconciliation
127
128 UML and Data Modeling: A Reconciliation
Graphical User
Data Computer Personal Interfaces
Time-sharing Computers
Processing
SmallTalk Java
Simula 67
C++
Object-
UML
Object
FORTRAN PL/I
Pascal
Modeling
Design
oriented
COBOL Structured Patterns Development
Structured Object-oriented
Programming
Design Analysis Analysis
Patterns
“Impedance
Mismatch”
Business Data
Structured
Process Quality
Analysis
Re-engineering
Three-Schema Data Model
Architecture Information Patterns
Engineering Barker/
Bachman Relational
Ellis ORM Data
(IDMS) Theory NIAM Business Rules Architecture
Chen Data
CODASYL Zachman
Modeling IDEF1X
Standard Framework
Early
Relational Viable
Databases Relational
Databases
Data Processing
The first widely-accepted term for this industry was “data processing”,
with an emphasis on “processing”. The first widely accepted languages
were algorithmic languages focused on the work done by computers,
with data being a kind of “raw material” for the processes.
Appendix B 129
54
Wikipedia. 2009. “Fortran”. Retrieved October 9, 2010 from
http://en.wikipedia.org/wiki/Fortran.
Through FORTRAN 77, the language name was spelled in all
capitals. Since Fortran 90, it’s been shown in upper and lower case.
[Wikipedia. Retrieved July 17, 2011 from
http://en.wikipedia.org/wiki/Fortran#cite_note-0.
55
Wikipedia. 2009. “COBOL”. Retrieved October 9, 2010 from
http://en.wikipedia.org/wiki/Cobol.
130 UML and Data Modeling: A Reconciliation
56
Grady Booch. 1994. Object-oriented Analysis and Design with Applications.
Redwood City, California: The Benjamin/Cummings Publishing
Company, Inc. Pp. 16-17.
57
Wikipedia. 2010, “Simula”, Retrieved September 15, 2010 from
http://en.wikipedia.org/wiki/Simula-67.
Appendix B 131
Structured Techniques
Computer programming languages proved to be wonderful intellectual
tools for solving problems. They each provided a language in which
almost anything could be specified—unfortunately. Very quickly
programs became unmanageably complex, to the point where it was
often impossible to prove whether a program actually did what it was
supposed to do. Initially, here was no guidance as to how programs in
any language should be organized.
Structured Programming
In 1976, Edward Dijkstra published A Discipline of Programming—the
culmination of many years’ work of developing structured
programming.59 This approach to programming required all system logic
to be expressed in terms of “IF/THEN/ELSE statements. Doing so
creates programs that are provably correct. Among other things, this
approach eliminated the “GOTO” statements that were a significant
contributor to program complexity. Having these laced throughout a
program made it extremely difficult to understand its logic. As one
anonymous wag once said, though, “It’s not the ‘GOTO’ statements that
are the problem. It’s the fact that there is no ‘COMEFROM’ statement.”
Under structured design, blocks of code are treated together and either
executed or not, according to easy to read conditions.
58
Grady Booch, 1994 Using the Booch Method: A Rational Approach
(Benjamin-Cummings Pub Co).
59
Edward W. Dijkstra, 1976, (New York: Prentice-Hall Series in
Automatic Computation).
132 UML and Data Modeling: A Reconciliation
Structured Design
Based on these insights, Ed Yourdon and Larry Constantine
subsequently introduced the world to structured design.60 This provided
guidance for organizing programs into coherent modules. The approach
addressed the problem of how to divide a large program design into
digestible modules. There were three major components to the approach:
Begin any programming effort by designing—via a drawing—the
successive levels of program structure. The drawings consist of a
box for each program module and a line connecting two boxes
for each subroutine call. Design the “top” module first. This
consists of the calls to component modules. All next level
modules are simply stubs that can be called, receive parameter
values, and return dummy parameter values. Once all the
problems associated with setting up these sub-routine calls have
been dispensed with, flesh out the first module at the next level,
and similarly call the next level of sub-routines—also initially just
stubs. Write programs in the same order, fixing the structure of
subroutine calls at the first level, before moving to successive
levels.
The point of this is that errors in interfaces are much more
difficult to identify and correct than are errors in the code
carrying out the program’s functions. Address the design of those
interfaces first.
60
Edward Yourdon and Larry L. Constantine, 1979, Structured Design:
Fundamentals of a Discipline of Computer Program and Systems Design.
(Englewood Cliffs, NJ: Prentice Hall). (Widely circulated as an
unpublished manuscript in 1976.)
61
Ibid. 85.
62
Ibid. 106.
Appendix B 133
Data Architecture
CODASYL
During the mid 1960’s, while he was working for General Electric,
Charles Bachman developed the Integrated Data Store (IDS), one of the
first database management systems. Subsequently, IBM adopted it as
their product, “Integrated Data Management Store” (IDMS). This used a
network structure, which made use of programmed, hard-coded, links
between elements. This meant that database structures were invariably
very complex and difficult to manage, which further meant, among other
things, that they were difficult to change.
It also meant that traversing the network was economical enough, but if
you wanted to search based on the values of “leaf” nodes, it could get
very expensive.63
The descriptions of various data modeling techniques are taken from
David C. Hay. 2003. Op cit. Pp. 34B-387. See Appendix A of that
book for the complete comparison of the data modeling notations.
63
Your author learned this the hard way in 1971 when using RAMIS, a
database system based on a hierarchical architecture. Its query
language, to be used interactively in a time-sharing environment, was
134 UML and Data Modeling: A Reconciliation
Some wags have suggested that the terminology could have been
based on Richard Nixon’s “normalizing relations” with China. But as
it happens, his ground-breaking visit didn’t take place until 1972. No,
the terms here are all derived from mathematics.
66
Curt Monash, 2005, “Is Ingres a Serious Player? First Some
History…” Computerworld. June 2, 2005.
136 UML and Data Modeling: A Reconciliation
67
American National Standards Institute (ANSI), 1975,
“ANSI/X3/SPARC Study Group on Data Base Management
Systems; Interim Report” FDT(Bulletin of ACM SIGMOD) 7:2.
68
David C. Hay, 2003, Requirements Analysis: From Business Views to
Architectures (Englewood Cliffs, NJ: Prentice Hall PTR).
Appendix B 137
Internal
Schema
External
Schema 1 Logical
Schema
(Relnl.)
Conceptual Logical
External Schema Schema
Schema 2 (XML)
External
Schema 3
Physical
Schema Physical
Schema
69
David C. Hay, 2003, Requirements Analysis: From Business Views to
Architecture (Englewood Cliffs, NJ: Prentice Hall PTR).
138 UML and Data Modeling: A Reconciliation
actual
price
PO party
number name
ID
quantity line
number
(0,) n
CATALOGUE
ITEM
order
date
ORGANI-
PERSON ZATION
PRODUCT SERVICE
sur- corpo-
name rate
mission
This same example will be used to demonstrate all the techniques that
follow. The model shows entity types, attributes, and relationships. It also
has examples of both a super-type/sub-type combination and a
constraint between relationships.
In the diagram, each PURCHASE ORDER is related to a single PARTY and to
one or more examples of either one PRODUCT or one SERVICE.
The diagram also includes two entity types (EVENT and EVENT
CATEGORY) in an unusual relationship. In most “one-to-many”
relationships, the “one” side is mandatory (“ . . . must be exactly one”),
while the “many” side is optional (“ . . . may be one or more”). In this
example, the reverse is true: Each EVENT may be in one and only one
EVENT CATEGORY (zero or one), and each EVENT CATEGORY must be a
classification for one or more EVENTS (one or more). That is, an EVENT may
exist without being classified, or it may be in one and only one EVENT
CATEGORY. An EVENT CATEGORY can come into existence, however,
only if there is at least one event to put into it.
Table B-1 shows the standard elements that go into any data model, and
the particular way each is addressed in Dr. Chen’s notation.
140 UML and Data Modeling: A Reconciliation
70
Peter Chen. 1977. Op cit.
71
Matthew Flavin, 1981, Fundamental Concepts of Information
Modeling (New York: Yourdon Press).
72
R. G. Brown, 1993, “Data Modeling Methodologies – Contrasts in
Style”, Handbook of Data Management. (Boston: Auerbach).
Appendix B 141
Business Analysis
Structured Analysis
If Dr. Dijkstra’s structured programming described the best practices for
arranging program code, and Messrs. Yourdon’s and Constantine’s
structured design described the best practices for organizing program
modules, then it was structured analysis developed by Tom DeMarco,73
as well as Chris Gane and Trish Sarson, 74 that addressed the question of
what programs should be written in the first place. In two different
books, they reveal data flow diagrams—graphic representations of the
way a business processes data. The two books used different symbols,
but presented fundamentally the same concepts. The diagrams were in
terms of:
Processes, where a process is the fact that one or more input
kinds of data are transformed into one or more output kinds of
data,
External entities, which are the organizational units that are the
ultimate sources and destinations of all data,
Data stores, which are the fact that data are stored between
processes, as in “pending purchase orders”, for example, and
Data flows, the links between all of these elements.
All data originate from one or more external entities, and then pass, via
data flows, to processes, and from there either to other processes, data
stores, or external entities.
73
Tom DeMarco, 1978, Structured Analysis and System Specification
(Englewood Cliffs, NJ: Prentice-Hall).
74
Chris Gane and Trish Sarson, 1979, Structured Systems Analysis:
Tools and Techniques (Englewood Cliffs, NJ: Prentice Hall).
142 UML and Data Modeling: A Reconciliation
75
Stephen McMenamin and John Palmer, 1984, Essential Systems
Analysis. (Englewood Cliffs, NJ: Yourdon Press).
76
Michael Hammer and James Champy, 1993, Reengineering the
Corporation: A Manifesto for Business Revolution (New York:
Harper Business).
Appendix B 143
an enterprise (wiping the slate clean) was necessary to lower costs and
increase quality of service and that information technology was the key
enabler for that radical change. Hammer and Champy felt that the design
of workflow in most large corporations was based on assumptions about
technology, people, and organizational goals that were no longer valid.
They suggested seven principles of reengineering to streamline the work
process and thereby achieve significant levels of improvement in quality,
time management, and cost:
1. Organize around outcomes, not tasks.
2. Identify all the processes in an organization and prioritize them in
order of redesign urgency.
3. Integrate information processing work into the real work that
produces the information.
4. Treat geographically dispersed resources as though they were
centralized.
5. Link parallel activities in the workflow instead of just integrating
their results.
6. Put the decision point where the work is performed, and build
control into the process.
7. Capture information once and at the source.
By the mid-1990’s, BPR gained the reputation of being a nice way of
saying “downsizing.” According to Hammer, lack of sustained
management commitment and leadership, unrealistic scope and
expectations, and resistance to change prompted management to
abandon the concept of BPR and embrace the next new methodology,
enterprise resource planning (ERP).” 77
Note that this did not come out of the technology world, so the similarity
of the Business Process Re-engineering approach to data flow diagrams is
coincidental. Its orientation towards results rather than processes, and its
lack of discipline in the notation itself, is a result of the different origins
of the two approaches.
77
This section is taken from: SearchCIO.com. 2001, “Business Process
Engineering”. Available at:
http://searchcio.techtarget.com/definition/business-process-
reengineering
144 UML and Data Modeling: A Reconciliation
78
James Martin and Clive Finkelstein, Nov. 1981, “Information
Engineering”, Technical Report, two volumes, (Lancs, UK : Savant
Institute, Carnforth).
79
James Martin and Carma McClure 1985, Diagramming Techniques
for Analysts and Programmers (Englewood Cliffs, NJ: Prentice Hall).
80
Clive Finkelstein, 1989, An Introduction to Information Engineering
: From Strategic Planning to Information Systems. (Sydney: Addison-
Wesley).
Appendix B 145
Strategic
Planning
About the
Requirements business
Analysis
Physical
Design About the
computer
Construction
Transition
Making it
all work
Production
81
Oracle Corporation, 1986, “Strategic Planning Course”,
SQL*Development Method (Belmont, CA: Oracle Corporation).
146 UML and Data Modeling: A Reconciliation
Because of the dual origin of the techniques, there are minor variations
between Mr. Finkelstein’s and Mr. Martin’s notations. The Information
Engineering version of our test case (with some of the notations from
each version) is shown in Figure B-6.
In the example, each PARTY is vendor in zero, one, or more PURCHASE
ORDERS, each of which initially has zero, one or more LINE ITEMS, but
eventually it must have at least one LINE ITEM. Each LINE ITEM, in turn,
is for either exactly one PRODUCT or exactly one SERVICE. Also, each
EVENT classifies zero or one EVENT TYPE, while each EVENT TYPE must be
(related to) one or more EVENTS.
is for
classifies EVENT
EVENT
TYPE
PRODUCT SERVICE
Table B-5: shows the elements common to all data models and the
particular way Information Engineering addresses each.
Note the symbol that combines the optional circle and the mandatory
bar (|O). This means “initially optional, but eventually mandatory”.
Dr. Finkelstein’s is the only one in any modeling notation that can
express this. It is actually quite profound. Conceptually, the
relationship is mandatory, but in practice, it may be some time before
values are captured.
Appendix B 147
82
Clive Finkelstein, 1992, Op. Cit. 24.
83
James Martin, and Carma McClure, 1985, Op. Cit. 245.
148 UML and Data Modeling: A Reconciliation
for for
ORGANIZATION
bought bought
via via . Corporate mission
PRODUCT SERVICE
# Product code # Serv ice ID
. Description . Description PARTY
. Unit price . Rate per hour
# Party ID
. Name
84
Richard Barker. 1990. Op. Cit.
85
Oracle Corporation.1996. Custom Development Method
(Redwood Shores, CA: Oracle Corporation).
Appendix B 149
may be in one and only one EVENT TYPE, while each EVENT TYPE must
be a classification for one or more EVENTS.
Table B-7 shows the standard model elements for any data modeling
notation, along with the particular way they are addressed in the Barker-
Ellis notation.
Table B-6: Barker-Ellis Model Elements
Element Barker-Ellis Approach
Entity Class Round-cornered rectangles.
Attribute Optionally displayed inside the entity boxes.
Annotations represent: * - required; O – optional; # - part of primary unique
identifier.
Relationship Line between two entity classes.
Binary only.
Relationship Name on each end (each “role”).
Name Preposition as a predicate.
Structure: Each <subject entity class> must be||may be <relationship name>
one and only one||one or more <object entity class>.
Relationship Minimum: “may be” = > dashed line next to subject entity class; “must be” =>
Cardinality solid line next to subject entity class.
(Minimum and Maximum: “one and only one” => (no symbol); “one or more” => “Crow’s
Maximum) foot”.
Identifiers Attributes identified by octothorpe (#).
Roles identified by line across role attached to identified entity class.
Sub-types Represented as boxes inside super-type boxes.
Inter- “Exclusive or” represented by arc across two or more relationships from one
relationship entity class.
Constraints
Note that this was the first modeling approach to adopt a specific
discipline for naming relationships. The “verb phrase” is divided into a
form of “to be” (“must be” or “may be”) to denote minimum cardinality,
a prepositional phrase (to contain the meaning of the relationship) and a
phrase to present the maximum cardinality (“one and only one” and “one
or more”). The result is that each relationship in each direction
represents a strong assertion about the nature of the organization or
domain being modeled.
For example, among other things, the model in Figure B-7 asserts that
“each PURCHASE ORDER may be composed of one or more LINE ITEMS”,
and that “each LINE ITEM must be part of one and only one PURCHASE
ORDER”.
150 UML and Data Modeling: A Reconciliation
IDEF1X
IDEF1X is a data-modeling technique that is used by many branches of
the United States Federal Government86. It was developed to support the
design of relational databases. It represents tables, with the relational
primary keys to represent unique identifiers and foreign keys to represent
relationships. It is therefore not suitable as a notation for conceptual
entity relationship models.
86
Thomas Bruce, 1992, Designing Quality Databases with IDEF1X
Information Models. (New York: Dorset House).
87
Terry Halpin, 2008, Information Modeling and Relational Databases, Second
Edition (Boston: Morgan Kaufman Publishers).
Appendix B 151
88
David C. Hay, 1995, Data Model Patterns: Conventions of Thought (New
York: Dorset House).
152 UML and Data Modeling: A Reconciliation
Architecture Frameworks
89
L. Silverston, 2001, The Data Model Resource Book, Volume 1: A library
of Universal Data Models for All Enterprises (New York: John Wiley &
Sons).
90
L. Silverston, 2001, The Data Model Resource Book, Volume 2: A library of
Universal Data Models by Industry Types (New York: John Wiley &
Sons).
91
L. Silverston and Paul Agnew, 2009, The Data Model Resource Book,
Volume 3 (Indianapolis, IN: Wiley Publishing, Inc).
92
John Zachman, 1987, Op. Cit.
93
John. F. Sowa, and John A. Zachman, 1992, Op. Cit.
Appendix B 153
94
David C. Hay, 2003, Requirements Analysis: From Business Rules to
Architecture. (Englewood Cliffs, NJ: Prentice Hall PTR).
95
David C. Hay, 2006, Data Model Patterns: A Metadata Map. (Boston:
Morgan Kaufmann).
This is, of course, what the person commonly referred to as
“analysts” should be called. But that is probably a lost cause. After
all, who can say “systems synthesist”?
154 UML and Data Modeling: A Reconciliation
Business Rules
Among the other results of Messrs. Zachman’s and Sowa’s work was the
development of the business rules movement that took on the issues
raised by the business, and the natural constraints of operating on an
enterprise. In Architecture Framework terms, business rules are
motivation (“why?”) issues that, as with the other dimensions, look
96
John Zachman and Stan Locke, 2008, “The Zachman Enterprise
Framework”. More information available at
http://zachmanframeworkassociates.com.
Appendix B 155
different from the point of view of the CEO, business owners, architects,
designers, programmers, and maintainers.
97
The Business Rules Group, 2000, “Defining Business Rules ~ What
Are They Really?” 4th ed., formerly known as the “GUIDE Business
Rules Project Report,” (1995). Available at
http://www.BusinessRulesGroup.org
156 UML and Data Modeling: A Reconciliation
Data Management
The motivation for all the efforts to rationalize data storage, of course,
was the requirement to improve the quality of data that are used by the
business—both to carry out its operations and to plan for the future.
Before companies were so interconnected and so dependent on
computer technology, data were as accurate as they had to be to carry out
specific tasks. Beginning with the advent of manufacturing planning
systems in the early 1970s—that depended on accurate order and
inventory data to work—the demand for data quality grew faster than
technologies have been able to keep up. If the wrong product is shipped
or raw materials are not available for manufacturing, this can be very
expensive for the organization.
Achieving data quality has 4 parts:
98
Object Management Group, 2008, “The Semantics of Business
Vocabulary and Business Rules” (SBVR), OMG Available
Specification formal/2008-01-02.
Appendix B 157
Object-oriented Development
A second major section of information technological history has been the
phenomenal growth in the application of object-oriented design and
programming. By the mid 1980s, the object-oriented community began
to take an interest in modeling data to support design and programming.
99
Sally Shlaer and Stephen J. Mellor, 1988, Object-oriented Systems
Analysis: Modeling the World in Data (Englewood Heights, NJ:
Yourdon Press).
158 UML and Data Modeling: A Reconciliation
LINE ITEM
* PO number (R1)
* Line number
. Item number (R1)
. Quantity
. Actual price
is a is a
EVENT
EVENT is of C CATEGORY
* Code * Code
. Description includes . Description
100
P. Chen, 1977, Op. cit.
101
James Martin, 1977, Computer Data-Base Organization (Englewood
Cliffs, NJ: Prentice Hall).
102
Sally Shlaer and Stephen J. Mellor, 1988, Op. Cit.
Note that in the object-oriented world, a “relationship” is called an
“association”.
Appendix B 159
Person
Organi
-zation
Product Service
103
Peter Coad and Edward Yourdon. 1990. Object-oriented Analysis.
Englewood Cliffs, NJ: Yourdon Press/Prentice Hall.
Appendix B 161
104
Jim Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy,
and William Lorensen, 1991, Object-Oriented Modeling and Design
(Boca Raton, Florida: CRC Press).
162 UML and Data Modeling: A Reconciliation
other than being itself. For example, relative to a Party, a Purchase Order
simply plays the role of being an “order”.
This issue is discussed extensively in Chapters 2 and 3 of the main body
of this book.
purchasing
vehicle for
purchased purchases
thing
Embley/Kurtz/Woodfield (1992)
Many of the notation elements that later became part of UML were first
published by David Embley, Barry Kurtz, and Scott Woodfield in 1992,
in their book, Object-oriented Systems Analysis. This introduced a more
elegant notation for cardinality than was used in ORT, although
relationships were only named in one direction. This book, among other
things was an assertion that the modeling used to support object-oriented
design could be used to support business systems analysis. The book
made no mention, however, of the entity/relationship work that had
preceded it. Figure B-11 shows our example using their notation.
“The basic underlying concepts that make object-oriented languages and
object-oriented design successful may also be used to enhance systems
analysis. These concepts include developing highly cohesive but
independent object classes, viewing an object not only as having static
information but also as being able to act and to be acted upon, and
exploiting powerful abstraction concepts such as aggregation,
classification, and generalization for describing many important, but
often overlooked relationships among system components.”105
105
David W. Embly, Barry D. Kurtz, and Scott N. Woodfield, 1992, Object-
oriented Systems Analysis: A Model-driven Approach (Englewood Cliffs,
New Jersey: Prentice-Hall). xv.
Appendix B 165
Booch (1994)
In 1994, Grady Booch published Object-oriented Analysis and Design: With
Applications.106 The example with his notation is shown in Figure B-12.
The way each element is treated is in Table B-10,
“Ultimately I am a developer, not just a methodologist. The first question
you should ask any methodologist is if he or she uses their own methods
to develop software”. [Booch, 1994. Page vi.]
106
Booch, Grady. 1994. Object-oriented Analysis and Design: With
Applications. (Redwood City, California: The Benjamin / Cummings
Publishing Company. 16-17.
Appendix B 167
Object Patterns
At the same time that data modeling patterns were being developed, the
object community developed some patterns of its own.
Design Patterns
In 1995, the year that David Hay published Data Model Patterns:
Conventions of Thought, to describe patterns of business data modeling, Eric
Gamma and three colleagues published Design Patterns: Elements of Reusable
Object-Oriented Software.107 These pieces of C++ and SmallTalk code are
designed to be widely applicable.
107
Eric Gamma, Richard Helm, Ralph Johnson and John Vlissides,
1995, Design Patterns: Elements of Reusable Object-Oriented Software
(Reading, MA: Addison-Wesley Publishing Company).
108
Martin. Fowler, 1997, Analysis Patterns. (Reading, MA: Addison-
Wesley).
168 UML and Data Modeling: A Reconciliation
UML
In 1994, Rational Software Corporation hired James Rumbaugh to join
Grady Booch and the two promoted their two approaches to “object-
oriented modeling”. Mr. Rumbaugh’s Object Modeling Technique was
marketed to support “object-oriented analysis” and Mr. Booch’s “Booch
Method” was promoted for “object-oriented design”. At this point,
however, they also began work to develop an approach that would unify
their two approaches. In 1995, they were joined by Ivar Jacobson, who
had developed the “Object-oriented Software Engineering” method.
In 1996, the “Three Amigos” began work on the truly “Unified Modeling
Language” that became UML. An international consortium, called the
UML Partners, was organized at this time to complete the Unified Modeling
Language (UML) specification, and propose it as a response to a “Request
for Proposal” that had been published that year by the Object
Management Group (OMG).
“The Unified Modeling Language” (originally known as “The UML”, but
subsequently shortened to simply “UML”) brought together many
different techniques, including the object modeling techniques described
above (called the “Class Model”). The class model represented a single
technique for modeling classes and associations. The cardinality notation
developed by Messrs. Embly, Kurtz, and Woodfield was combined with
the segmented rectangles from Mr. Rumbaugh’s Object Modeling
Technique. This part of the UML suite of notations was called the “Class
Model”.
In addition to the “Class Model”, the UML umbrella incorporated a
number of techniques for modeling other aspects of the system
development process:
Use cases describing user interaction with a system
Process flows
Events and responses
Software components
Appendix B 169
109
Object Management Group (OMG), 1997, Op. cit.
110
2005a. “UML 2.0:Infrastructure” (OMG document 05-07-05).
Published as http://www.omg.org/docs/formal/05-07-05.pdf.
111
Object Management Group. 2005. “UML 2.0:Superstructure” (OMG
document 05-07-04). Published as
http://www.omg.org/docs/formal/05-07-04.pdf.
170 UML and Data Modeling: A Reconciliation
Graphical User
Data Computer Personal Interfaces
Time-sharing Computers
Processing
SmallTalk Java
Simula 67
C++
FORTRAN PL/I
Object
UML
Object-
Pascal
Modeling
Design
oriented
COBOL Structured Patterns Development
Structured Object-oriented
Programming
Design Analysis Analysis
Patterns
The
ARPANET Internet
The World
Wide Web
“Impedance
Mismatch”
The Semantic
Business Data Web
Structured
Process Quality
Analysis
Re-engineering
Three-Schema Data Model
Architecture Information Patterns
Engineering Barker/
Bachman Relational
Ellis ORM Data
(IDMS) Theory NIAM Business Rules
Architecture
Chen Data
CODASYL Zachman
Modeling IDEF1X
Standard Framework
Early
Relational Viable
Databases Relational
Databases
Computer Time-sharing
Computer time-sharing, created in 1961, made it possible for a person to
sit at a typewriter-like device, send messages directly to a computer that,
thanks to the telephone network, could be far away. The computer would
then send responses directly back to that person, across the same
network. More significantly, multiple people could be working on the
same computer at the same time, with each imagining that ‘e was using it
exclusively. Programs could be written, tested, and results examined, all
while sitting at the remote device. More significantly, the resulting
programs could then be turned over to end-users who could then sit in
front of similar devices and have the same interactive experience of
entering data and then examining the results of processing them.
It would take a decade or so before the concept of “user interface”
entered the language of the industry, but its origins were here.
Appendix B 171
ARPANET
“The first recorded description of the social interactions that could be
enabled through networking was a series of memos written by J.C.R.
Licklider of MIT in August 1962 discussing his ‘Galactic Network’
concept. He envisioned a globally interconnected set of computers
through which everyone could quickly access data and programs from
any site. In spirit, the concept was very much like the Internet of
today.”114
The only physical network available at the time for connecting computers
was the telephone system. This was a Circuit Network, which required a
physical connection from each device to every other device in order for
them to communicate–no matter how far away. This meant that the
network was extremely vulnerable to weather and natural forces that
could disrupt it. More significantly, during the Cold War, it was also
vulnerable to enemy attacks. For this reason, the Department of Defense
Advanced Research Projects Agency (DARPA, also called at various
times, simply ARPA) sought a way to build a network that would not be
vulnerable in this way.
112
Errol Morris. 2011. “Did My Brother Invent E-Mail With Tom Van
Vleck?”. The New York Times. Five articles, June 19-23, 2011.
113
Tom Van Vleck. “The History of Electronic Mail”. Undated
manuscript reproduced at http://www.multicians.org/thvv/mail-
history.html.
114
Barry M. Leiner, V. G. Cerf, D. D. Clark, R. E. Kahn, L. Kleinrock,
D. C. Lynch,
Jon Postel, L. G. Roberts, S. Wolff. “Histories of the Internet”.
Internet Society. Retrieved 7/24/2011 from
http://www.isoc.org/internet/history/brief.shtml.
172 UML and Data Modeling: A Reconciliation
115
Stephen Segaller. Nerds 2.0.1: A Brief History of the Internet. (New York:
TV Books). 33.
116
Bradley Mitchell. 2011. “What Is Packet Switching on Computer
Networks?” About.com Wireless/Networking. Retrieved 7/27/2011
from
http://compnetworking.about.com/od/networkprotocols/f/packet-
switch.htm.
117
Wikipedia. 2011. “Communications Protocol”. Retrieved 8/4/2011
from http://en.wikipedia.org/wiki/Communications_protocol.
Appendix B 173
Thus, each packet finds its own way to its destination and is reunited
with its fellows only when all the packets that constitute a message get
there.
Recognizing “the theoretical feasibility of communications using packets
rather than circuits, was a major step along the path towards computer
networking…In late 1966 Lawrence G. Roberts went to DARPA to
develop the computer network concept and quickly put together his plan
for the ARPANET, publishing it in 1967.”118
One other problem that had to be dealt with was the fact that the
computers involved in all of this communication were made by different
manufacturers and had different languages for communicating with the
outside world.
An Interface Message Processor (IMP) is a small computer that translates
from the communication language to that required for communication
with one local machine.
In September 1969, the company, Bolt Beranek and Newman (BBN),
installed the first IMP at UCLA and the first host computer was
connected to a network. Stanford Research Institute (SRI) provided a
second node. By the end of 1969, four computers were attached. They
had a network.
Elaborating on the definition above, a protocol (or, more formally, a
communications protocol) is “a formal description of digital message
formats and the rules for exchanging those messages in or between
computing systems and in telecommunications.
“Protocols may include signaling, authentication and error detection and
correction capabilities.
“The specified behavior is typically independent of how it is to be
implemented. A protocol can therefore be implemented as hardware or
software or both.”119
“In December 1970 the Network Working Group (NWG) working
under S. Crocker finished the initial ARPANET Host-to-Host protocol,
called the Network Control Protocol (NCP). As the ARPANET sites
118
Barry M. Leiner et. al. Op. cit.
119
Wikipedia. 2011. “Communications Protocol”. Op. cit.
174 UML and Data Modeling: A Reconciliation
The Internet
The original ARPANET grew from a network for government agencies
and universities into the more universally available Internet we know
today. “The Internet was based on the idea that there would be multiple
independent networks of rather arbitrary design. This began with the
120
Barry M. Leiner et. al. Op. cit.
121
Ibid.
This can be seen at
http://sloan.stanford.edu/MouseSite/1968Demo.html. It is
definitely worth an hour of your time.
122
Ian Peter. 2011. “The History of Email”. Net History. Retrieved
7/27/2011 from http://www.nethistory.info/History of the
Internet/email.html.
Appendix B 175
123
Barry M. Leiner et. al. op. cit.
124
Barry M. Leiner et. al. op. cit.
125
Vinton G. Cerf, Robert E. Kahn, “A Protocol for Packet Network
Intercommunication”, IEEE Transactions on Communications, Vol.
22, No. 5, May 1974 pp. 637-648
176 UML and Data Modeling: A Reconciliation
“The term packet applies to any message formatted as a packet, while
the term datagram is generally reserved for packets of an “unreliable”
service. An “unreliable” service does not notify the user if delivery
fails.” [Wikipedia, “Datagram”]
126
Wikipedia. 2011. “Internet Protocol”. Retrieved 7/29/2011 from
http://en.wikipedia.org/wiki/Internet_Protocol#Version_history
As of 2011, this is still being implemented.
It but resulted in a large distribution of buttons saying “I survived the
TCP/IP transition”.
Appendix B 177
127
Barry M. Leiner et. al. op. cit.
128
Berners-Lee, Tim. 2000. Weaving the Web: The original Design and
Ultimate Destiny of the World Wide Web. (New York: HarperCollins). 4.
178 UML and Data Modeling: A Reconciliation
129
Ibid. 15.
130
Vannevar Bush. 1945. “As We May Think”. The Atlantic. July, 1945.
Retrieved 8/2/2011 from
http://www.theatlantic.com/magazine/archive/1945/07/as-we-
may-think/3881/.
131
Lauren Wedeles. 1965. “Professor Nelson Talk: Analyzes P.R.I.D.E.”
Vassar Miscellany News. Retrieved 8/2/2011 from
http://faculty.vassar.edu/mijoyce/MiscNews_Feb65.html
132
Ibid., 5-6.
Appendix B 179
133
Ibid. 16.
180 UML and Data Modeling: A Reconciliation
To make the address more meaningful to the humans who are looking
for information there, however, Mr. Berners-Lee imagined a Universal
Resource Identifier (URI) that would identify each web site. This is also
referred to as a domain name. This would be a meaningful set of words
that would provide a human readable label as a synonym for that IP
address. The group of his colleagues that he gathered together to manage
the World Wide Web (the World Wide Web Consortium or the W3C)
vetoed that term for it and the label has been known ever since as the
Uniform Resource Locater (URL). This is how the hypertext-equipped
World Wide Web refers to other “documents” to be retrieved as part of
this document.
Mapping of a URL to an IP address is done via the Domain Name
System (DNS). Note that the requested URL consists of four
components. For example, the URL,
http://henriettahay.com/women/99mar12.htm consists of:
http:// - “tells your PC what protocol (what language so to
speak) to use talking with this site. In this case, you are using
HTTP (HyperText Transfer Protocol).”134
henriettahay.com - the domain name. This is the “name” of
the site, a unique name, understandable to humans, and ending
with “.com”, “.edu”, “.gov”, etc. This is what the Domain Name
System maps to an IP address. In this case, Henrietta Hay is a
columnist for the Grand Junction, Colorado Daily Sentinel. This is
the web site where the columns she has written are kept.
/women - one or more directories on the web site. The web
request directs the web server to go to this directory to find the
page. In this example “/women” is the directory on Henrietta
Hay’s site where you will find her columns on the subject of
women’s politics.
Mr. Berners-Lee kept the idea behind the Universal Resource
Identifier to use when the Semantic Web was developed. See the
next section, below.
134
Scott Meyer. 2011. “DNS Tutorial”. GNC Web Creations. Retrieved
8/11/2011 from http://www.gnc-web-creations.com/dns-
tutorial.htm. (This is a particularly good description of the entire
process of locating web sites via the DNS.)
Appendix B 181
Yes, as you may have guessed, your author is related to Henrietta
Hay. She’s his mother. She was 85 years old when she wrote that
column in 1999. She finally decided to stop writing in 2011, when
she was almost 97 years old. Her eyes were going bad, so it was
becoming hard to use her computer. And yes, you can read nearly all
of her columns at http://davehay.com/henrietta. (From 1988 until
about 1995, they were typed onto pieces of paper and mailed to the
newspaper. From 1995, though, they were also recorded on her web
site.)
135
Wikipedia. 2011. “Domain Name Service”. Retrieved 8/9/211 from
http://en.wikipedia.org/wiki/Domain_Name_System.
136
Wikipedia. 2011. “Domain Name Service”. Retrieved 8/9/211 from
http://en.wikipedia.org/wiki/Domain_Name_System.
182 UML and Data Modeling: A Reconciliation
137
Douglas Brian Terry, Mark Painter, David W. Riggle and Songnian
Zhou. 1984. The Berkeley Internet Name Domain Server, Proceedings
USENIX Summer Conference, Salt Lake City, Utah. June 1984. 23–
31. Retrieved 8/10/2011 from
http://www.eecs.berkeley.edu/Pubs/TechRpts/1984/5957.html.
138
Don Moor. 2004. “DNS Survey”. Retrieved 8/9/2011 from
http://mydns.bboy.net/survey/
139
World Wide Web Consortium. 1999. Hypertext Transfer Protocol --
HTTP/1.1. Retrieved 7/29/2011 from
http://www.w3.org/Protocols/rfc2616/rfc2616-sec1.html#sec1.1,
Appendix B 183
140
Dean Allemang and Jim Hendler. 2011. Semantic Web for the Working
Ontologist: Effective Modeling in RDFS and OWL. (Boston: Morgan
Kaufmann).
141
Dean Allemang and Jim Hendler. 2011. Op cit. 23-24
184 UML and Data Modeling: A Reconciliation
assert that “Dave” “owns” “an Acura”. This captures the underlying
simple semantics of the sentences. Each of the components is either a
literal (“Dave”), a description of a relationship (“owns”) or the identifier
of a concept located somewhere on the web.(“Acura”). Note that both
predicates (“owns”) and subjects and objects are defined somewhere on
the Web.
RDF Schema – an enhanced version of RDF that includes language
constructs to recognize that some words describe classes, sub-classes,
and properties. For example: “Person / is a / class.” “Dave / is a
member of / Person.” “Person / may own / car.”
Web Ontology Language (OWL) – brings the expressivity of logic to the
Semantic Web. It allows modelers to express detailed constraints
between classes, entities, and properties. Thus, given the RDF Schema
sentences above, OWL allows the computer to infer that “Dave / may
own / car.”
The underlying language beneath the three of these is XML.
Note that as companies began compiling even basic glossaries, it turned
out that simply imposing a single definition for each word wasn’t going
to work. The accounts receivable department has a different definition
for “customer” than does the marketing department. What was required
was a scheme that not only captured each definition but it did so in
context.
The structure of a Universal Resource Identifier begins with a web
address. This makes it possible to define its context. Because the web
address begins with the identification of organization that sponsors it, the
definition of the word described in the context of that address can be
presumed to be in the context of that organization. For example, if URIs
described both http://www.bigcompany.com/Marketing/#customer
and http://www.bigcompany.com/AccountsReceivable/#customer,
anyone, anywhere in the world, can retrieve a single definition for each
context. For convenience, http://www.bigcompany.com/Marketing
would be abbreviated “bcmk”, so all references to the Big Company,
Inc’s Marketing Department term for customer would be to
“bcmk:customer”. Similarly, all references to Big Company’s Acounts
Receivable Department’s definition of “customer” would be represented
as “bcar:customer”.
Even the languages of RDF Schema and OWL are themselves in terms
of reserved words that are URIs: “subClassOf”, “subPropertyOf”,
“label”, “unionOf”, etc. These are respectively “rdfs:subClassOf”,
“rdfs:subPropertyOf”, and “owl:unionOf”.
Appendix B 185
You may think this is a very strange way of looking at the world. If so,
you are right. But that’s how it is…
188 UML and Data Modeling: A Reconciliation
Graphical User
Data Computer Personal Interfaces
Time-sharing Computers
Processing
SmallTalk Java
Simula 67
C++
Object-
UML
Object
FORTRAN PL/I
Pascal
Modeling
Design
oriented
COBOL Structured Patterns Development
Structured Object-oriented
Programming
Design Analysis Analysis
Patterns
The
ARPANET Internet
The World
Wide Web
Reconciliation
The Semantic
Business Data Web
Structured
Process Quality
Analysis
Re-engineering
Three-Schema Data Model
Architecture Information Patterns
Engineering Barker/
Bachman Relational
Ellis ORM Data
(IDMS) Theory NIAM Business Rules Architecture
Chen Data
CODASYL Zachman
Modeling IDEF1X
Standard Framework
Early
Relational Viable
Databases Relational
Databases
191
192 UML and Data Modeling: A Reconciliation
142
Merriam-Webster Online Dictionary. “aesthetics”. Retrieved
7/19/2011 from http://www.merriam-
webster.com/dictionary/aesthetics.
Glossary 193
143
Wikipedia, 2010, “Assembly Language”. Retrieved October 10, 2010
from http://en.wikipedia.org/wiki/Assembly_language.
194 UML and Data Modeling: A Reconciliation
144
Wikipedia, 2010, “Cobol”. Retrieved October 10, 2010 from
http://en.wikipedia.org/wiki/Cobol.
Glossary 197
145
Edward Yourdon and Larry L. Constantine, 1979, Op. cit. 106.
146
DK Illustrated Oxford Dictionary. 1998, Op. cit. 187.
147
C. Finkelstein, 1992, Information Engineering: Strategic Systems Development
(Sydney: Addison-Wesley) 85.
198 UML and Data Modeling: A Reconciliation
data flow In a data flow diagram, the fact that a specified Appendix
kind of data may flow between two processes or B
between an external entity and a process.
148
S. Ambler, 2009, Op. cit.
Glossary 199
149
DAMA International, 2009, The DAMA Guide to the Data
Management Body of Knowledge (Bradley Beach, NJ: Technics
Publications, LLC).
200 UML and Data Modeling: A Reconciliation
150
Paul Leahy. 2011. “Declarative Language”. About.com. Retrieved
7/19/2011 from
http://java.about.com/od/d/g/declarativelang.htm.
Glossary 201
151
Among other sources, see David C. Hay, 2003, Op. cit.
202 UML and Data Modeling: A Reconciliation
152
James Rumbaugh, et. al. 1999. Op. cit.
Glossary 203
153
David C. Hay, 2003, Op. cit. 413.
204 UML and Data Modeling: A Reconciliation
154
Wikipedia, 2010, “Fortran”. Retrieved October 10, 2010 from
http://en.wikipedia.org/wiki/Fortran
155
David C. Hay, 2003, Op. cit. 413.
Glossary 205
hypertext The fact that a bit of text on one Web Page can Appendix
be used to automatically retrieve text from some B
other Web Page, anywhere on the Internet.
hyperlink The path through the Internet by which hypertext Appendix
is retrieved. B
Hypertext The publishing language of the World Wide Web Appendix
Markup and contains the commands necessary to retrieve B
Language a desired document, transmit it and present it on
the recipient’s computer. It may also contain
pieces of program code to present animation and
accept input data.
hypertext A networking protocol for distributed, Appendix
transfer collaborative, hypermedia information systems. B
protocol (http) HTTP is the foundation of data communication for
the World Wide Web.
HTTP functions as a request-response protocol in
the client-server computing model. In HTTP, a web
browser, for example, acts as a client, while an
application running on a computer hosting a web
site functions as a server. The client submits an
HTTP request message to the server. The server,
which stores content, or provides resources, such
as HTML files, or performs other functions on
behalf of the client, returns a response message
to the client. A response contains completion
status information about the request and may
contain any content requested by the client in its
message body.156
156
Wikipedia. 2011. “HTTP”. Retrieved 7/28/2011 from
http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol
206 UML and Data Modeling: A Reconciliation
157
The Business Rules Group. 2005. “The Business Motivation Model:
Business Governance in a Volatile World”.
158
James Rumbaugh, et. al. 1999. Op. cit.
Glossary 209
159
Merriam Webster, 2010, “object”. Merriam-Webster On-line
Dictionary. Retrieved on October 1, 2010. from
http://www.merriam-webster.com/dictionary/object.
160
James Rumbaugh, et. al. 1999. Op. cit.
210 UML and Data Modeling: A Reconciliation
161
James Rumbaugh, et. al. 1999. Op. cit. 379
Glossary 211
162
Lee Copland. 2000. “QuickStudy: Packet-Switched vs. Circuit-
Switched Networks” Computerworld. March 20, 2000.
212 UML and Data Modeling: A Reconciliation
163
DK Illustrated Oxford Dictionary, 1998, Op. cit. 642.
Glossary 213
164
The Free Dictionary. “Procedural Programming”. Retrieved
7/19/2011 from
http://encyclopedia2.thefreedictionary.com/procedural+language
214 UML and Data Modeling: A Reconciliation
165
World Wide Web Consortium. 2004. OWL Web Ontology Language
Overview. Retrieved 7/20/2011 from
http://www.w3.org/TR/2004/REC-owl-features-20040210/#s1.2.
Glossary 215
166
Clive Finkelstein, 1992, Information Engineering: Strategic Systems
Development (Sydney: Addison Wesley).
216 UML and Data Modeling: A Reconciliation
167
David C. Hay, 2003, Requirements Analysis: From Business Views to
Architecture. (Englewood Cliffs, NJ: Prentice Hall PTR). 432.
168
Edward Yourdon and Larry L.Constantine, 1979, Structured Design:
Fundamentals of a Discipline of Computer Program and Systems
Design (Englewood Cliffs, NJ: Prentice Hall). (Widely circulated as
an unpublished manuscript in 1976.)
Glossary 217
169
Merriam Webster, 2010, “object”. Merriam-Webster On-line
Dictionary. Retrieved on October 1, 2010. from
http://www.merriam-
webster.com/dictionary/subject?show=0&t=1285963736
218 UML and Data Modeling: A Reconciliation
170
Wikipedia. 2011. “TCP/IP Model”. Retrieved 7/27/2011 from
http://en.wikipedia.org/wiki/TCP/IP_model.
171
Barry M. Leiner, et. al. 2011. Op. cit.
Glossary 219
172
Merriam Webster. “auxiliary”. Retrieved on September 28, 2010
from http://www.merriam-webster.com/dictionary/auxiliary.
220 UML and Data Modeling: A Reconciliation
173
Wikipedia. 2011. “World Wide Web”. Retrieved 8/4/2011 from
http://en.wikipedia.org/wiki/World_wide_web.
Glossary 221
223
224 UML and Data Modeling: A Reconciliation
231
232 UML and Data Modeling: A Reconciliation