Relational Database Design: David Wesley, MD
Relational Database Design: David Wesley, MD
ORIGINAL ARTICLE
Relational databases are the predominant method for storing repet- Address: General and Cologne Life
itive data in computers because they allow efficient and flexible stor- Re, PO Box 300, Financial Centre,
age of that data. While medical directors and underwriters are more 695 Main St, Stamford, CT 06904-
likely to use a spreadsheet than a database program to analyze their 0300.
business, the data they wish to study are often stored in corporate
Correspondent: David Wesley, MD.
databases. Or the data may be complex enough to require being
keyed into or downloaded into a personal computer (PC) database Key words: Relational database, da-
program for storage, even if the data are then output to a spread- tabase design.
sheet for numerical analysis. In many circumstances, one can benefit
Received: October 28, 1999.
from an understanding of efficient database design. After a brief
overview, the reader is led step-by-step through a practical expla- Accepted: January 30, 2000.
nation of database design, from a flat file to a relational model.
63
JOURNAL OF INSURANCE MEDICINE
64
WESLEY—RELATIONAL DATABASE DESIGN
there are repeating fields (Imp 1, Imp 2, Imp case can have more than one impairment and
3), because each column must be queried sep- thus more than one record in the table (so,
arately. This would be a very slow query. The Case # alone cannot be key). Also, each im-
database query engine would make three sep- pairment can appear in more than one case,
arate passes through the table, finding match- so it cannot be the sole key. But there is no
ing records in each case. Putting the ‘‘diabe- reason for the combination of a case # and a
tes mellitus’’ search criteria in each of the im- particular impairment to appear twice. Thus,
pairment fields on the same line does not the multikey will be unique.
work. This would define an AND selection The impairments are split into the separate
(ie, only those records in which diabetes mel- table and linked to the UW Cases table via
litus is entered in all three fields; none in this the case number. Both tables are narrower
case) when what is desired is an OR selection. than the original table, and the Impairments
table is longer (Figure 5).
Long, narrow tables are more efficient than
FIRST NORMAL FORM RULE—
short, wide tables. Query operations that use
ELIMINATE REPEATING GROUPS
the key fields will be particularly fast. Note
For each set of related fields, make a table also, that now we can easily add a fourth im-
and give that table a primary key. pairment, hypertension, for case number
The data are split into two tables, with the 101229.
definitions shown in Figure 4. Now to perform the query described above,
Note the Impairments table has a multikey Access automatically links the two tables
index consisting of the concatenation of both when you add them to the query design,
the Case # and the Impairment fields. Each which looks like Figure 6.
65
JOURNAL OF INSURANCE MEDICINE
66
WESLEY—RELATIONAL DATABASE DESIGN
Codes table limits the possible ways to des- case is assigned a unique case number that
ignate an impairment. Notice the tables are stands for that case. But each case is also as-
getting even narrower, with less duplication signed to an underwriter, and there are fields
of wide field values (see Figure 8). for the underwriter’s name and level. But
The query is now created as in Figure 9. Level is not an attribute of the case, it is an
This query processes as quickly as the pre- attribute of the underwriter. A better design
vious one, even though it uses one more table would be to leave a UW ID for the under-
than before, because the links are established writer in the UW Cases table (an appropriate
on key fields. Relational databases maintain attribute of that data object) and split Level
indexes of key fields (in the background, so and any other pertinent information about
you don’t see them), and these are used to the underwriter (region, supervisor, etc.) into
accelerate processes based on keyed links be- a separate table.
tween tables. The result of this query in the The third (and second) forms can defeat a
very abbreviated example database is two side effect called update anomalies. Should
rows with B. Fife and A. Griffith as entries in an underwriter marry and change her name,
the first column. there is no need to go through the error-
prone process of changing her name in all her
THIRD NORMAL FORM RULE— assigned case records (any unchanged re-
ELIMINATE FIELDS NOT DEPENDENT cords would be update anomalies). All that is
ON KEY required is a single change to her record in
the Underwriter table.
The third normal form rule is similar to the One can also separate name, date of birth,
second, but it specifically addresses relation- and any other personal information about the
ships in tables with only one key field. Con- proposed insured. This is particularly impor-
sider again the key fields defined for the ta- tant to medical directors who are concerned
bles thus far. Because each entry in a key field about confidentiality of underwriting infor-
(or concatenated entries in multifield keys) mation. The name of the insured can be kept
must be unique, each record in that table in a locked table, viewable only by the med-
must have its own key and thus the key field ical director, while the rest of the database is
represents the ‘‘data object’’ of that table. available to anyone with a business need to
In the UW Cases table in Figure 8, each study the data.
67
JOURNAL OF INSURANCE MEDICINE
Is the Rating field an attribute of the UW fields do not contribute to a description of the
Cases table data object? Here, it is. However, table’s key, they should be placed in a sepa-
if debits were recorded by impairment, a bet- rate table, shown in Figure 10. The UW ID
ter database model would have a Rating or a field is introduced for the same reason noted
Debit field in the Impairment table. above; when duplication is necessary, make
The third normal form rule specifies that if the duplicated values as small as possible.
68
WESLEY—RELATIONAL DATABASE DESIGN
The PI ID field allows for anonymity for the derwriter table and extract the names asso-
proposed insured. The sample diabetes mel- ciated with the underwriter IDs.
litus query now requires four tables (see Fig- With the third normal form rule enforced,
ure 11). the information in each table pertains to the
This query can be interpreted as follows: key (or the data object for which it stands) of
Using the Impairment Codes table, find the that table only. This whole example database
code that corresponds to diabetes mellitus. is now contained in 5 tables with a total of
Then go to the Impairment table and find out 17 fields, instead of 1 table with 10 fields. Be-
which case numbers have this impairment cause operations such as queries are per-
code assigned to them. With case numbers, formed on keys, processing speed is actually
go to the UW Cases table and find the un- faster. You can assign as many impairments
derwriter ID for each case. Then go to the Un- as you need to each case, with no wasted
69
JOURNAL OF INSURANCE MEDICINE
space for those with few or no impairments. If 20,000 or more records need to be queried,
And underwriter and proposed insured de- a good relational design will allow fast com-
tails are isolated in their own tables. pletion where a flat design query may never
finish. The results of a database query can be
further analyzed or reported in a spreadsheet
CONCLUSIONS
or other program with which the user is more
One always creates a database with certain familiar.
goals or questions in mind, and these must Relational database design has a solid the-
be met as a bare minimum. But new purposes oretical underpinning. While some of the
or questions arise during the life of a data- steps may seem unnecessary and complicat-
base and solid design at the outset will pro- ed, the resulting database is fast, reliable, and
vide the flexibility to meet the new challeng- flexible.
es. Considering that data collection is often
difficult and/or a one-shot opportunity, care- REFERENCES
ful database design is time well spent. * Roman S. Access Database Design & Programming.
The benefit of the first rule of database nor- 2nd ed. Sebastopol, CA: O’Reilly & Associates, 1999.
malization is fairly obvious. There is no other * Ehrmann D. Relational Database Design in Para-
dox—Technical Paper. Chicago, Ill: Kallistra Inc,
way to provide for indeterminate numbers of 1990.
repeating groups. The second and third nor-
mal form rules help preserve data integrity. * These are general references to the article and do not
But the most important effect is on queries. relate to any specific citation.
70