0% found this document useful (0 votes)
85 views8 pages

Relational Database Design: David Wesley, MD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views8 pages

Relational Database Design: David Wesley, MD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

JOURNAL OF INSURANCE MEDICINE

Copyright 䊚 2000 Journal of Insurance Medicine


J Insur Med 2000;32:63–70

ORIGINAL ARTICLE

Relational Database Design


David Wesley, MD

Relational databases are the predominant method for storing repet- Address: General and Cologne Life
itive data in computers because they allow efficient and flexible stor- Re, PO Box 300, Financial Centre,
age of that data. While medical directors and underwriters are more 695 Main St, Stamford, CT 06904-
likely to use a spreadsheet than a database program to analyze their 0300.
business, the data they wish to study are often stored in corporate
Correspondent: David Wesley, MD.
databases. Or the data may be complex enough to require being
keyed into or downloaded into a personal computer (PC) database Key words: Relational database, da-
program for storage, even if the data are then output to a spread- tabase design.
sheet for numerical analysis. In many circumstances, one can benefit
Received: October 28, 1999.
from an understanding of efficient database design. After a brief
overview, the reader is led step-by-step through a practical expla- Accepted: January 30, 2000.
nation of database design, from a flat file to a relational model.

D atabases store information about entities.


In the example tables that follow, entity
classes include underwriters, underwriting
in its entity class. Keys in one table and
matching foreign keys in another table serve
to link related entities from different entity
cases, impairments, and proposed insureds. classes.
Entities have attributes such as level for each Relational databases are so named because
underwriter and a code for each impairment. they implement an entity-relationship model
In the typical table model for a database, the for the subject data, and this allows for better
column headings are attribute classes and queries. Queries are the reason for the exis-
each row of specific attribute entries defines tence of databases. In the business world,
a specific entity. queries can generate mailing labels or billing
Attributes can do three things: statements. For the researcher, queries pro-
vide tomographic slices of the data. Both the
1. provide information about a particular
accuracy and the flexibility of these tomo-
entity,
graphic views depend on the trueness of the
2. uniquely identify a specific entity, and
entity-relationship model.
3. link individual entities from separate
Experts in relational database design have
entity classes.
identified six properties called normal forms
The utility of an information attribute is ob- that characterize efficiency. The forms are
vious. The second and third capabilities pro- numbered one through five (1NF–5NF) and
vide means to relate different entities. A key the sixth is called the Boyce Codd normal
is one or more attributes that uniquely distin- form. The easiest way to understand relation-
guish an entity from all other possible entities al database design is to take a flat file and

63
JOURNAL OF INSURANCE MEDICINE

Figure 1. Table design, flat file.

Figure 2. Data, flat file.

sequentially apply normal forms to it. In my ranged in a two-dimensional grid, as seen in


experience, the first three matter the most, Figure 2. Each row entry represents a record,
and even if only 1NF is applied, the database and the column headers designate the fields
is much improved. Views from Microsoft威 in each record. Key fields (Case # in this ex-
Access娂* (Microsoft Corp, Redmond, WA) ample) require unique values and thus help
will be used to demonstrate the process, but avoid duplicate records.
the concepts apply to any relational database. The flat-file definition given above assumes
that no person has more than three medical
FLAT FILES impairments, each of which will be entered
in full or the field left blank. The underwriter
Spreadsheets can be used like database
and the underwriter’s level are entered in full
programs, and they often have special data-
for each case. But what happens if a case has
base features. But spreadsheet databases are
more than three impairments? One would
limited to a flat-file format. There is a natural
have to add a new field, ‘‘Imp 4,’’ for just that
tendency to place data into flat files, because
it is much like logging information with a pen person. The already inefficient use of data
and paper. A spreadsheet or other flat-file da- storage will be made worse since the vast ma-
tabase used to study medical underwriting jority of cases in the database will have less
data could result in a table structure much than three impairments.
like that shown in Figure 1. Here is another problem. What if you want
In flat files, as well as in the individual ta- to run a query to select case records for each
bles of a relational database, data are ar- underwriter where diabetes mellitus is one of
the impairments? The query would look like
* Microsoft威 is copyrighted and Access娂 is a trade- Figure 3.
mark of Microsoft. Notice how the query has as many rows as

64
WESLEY—RELATIONAL DATABASE DESIGN

Figure 3. Query design, flat file.

Figure 4. Table design, 1NF.

there are repeating fields (Imp 1, Imp 2, Imp case can have more than one impairment and
3), because each column must be queried sep- thus more than one record in the table (so,
arately. This would be a very slow query. The Case # alone cannot be key). Also, each im-
database query engine would make three sep- pairment can appear in more than one case,
arate passes through the table, finding match- so it cannot be the sole key. But there is no
ing records in each case. Putting the ‘‘diabe- reason for the combination of a case # and a
tes mellitus’’ search criteria in each of the im- particular impairment to appear twice. Thus,
pairment fields on the same line does not the multikey will be unique.
work. This would define an AND selection The impairments are split into the separate
(ie, only those records in which diabetes mel- table and linked to the UW Cases table via
litus is entered in all three fields; none in this the case number. Both tables are narrower
case) when what is desired is an OR selection. than the original table, and the Impairments
table is longer (Figure 5).
Long, narrow tables are more efficient than
FIRST NORMAL FORM RULE—
short, wide tables. Query operations that use
ELIMINATE REPEATING GROUPS
the key fields will be particularly fast. Note
For each set of related fields, make a table also, that now we can easily add a fourth im-
and give that table a primary key. pairment, hypertension, for case number
The data are split into two tables, with the 101229.
definitions shown in Figure 4. Now to perform the query described above,
Note the Impairments table has a multikey Access automatically links the two tables
index consisting of the concatenation of both when you add them to the query design,
the Case # and the Impairment fields. Each which looks like Figure 6.

65
JOURNAL OF INSURANCE MEDICINE

Figure 5. Data, 1NF.

The query result is again a list of under-


writers who have underwritten cases with di-
abetes mellitus. But this query will use less
memory and less disk space, and it will run
faster and more reliably than the first one.

SECOND NORMAL FORM RULE—


ELIMINATE REDUNDANT DATA
There are still problems with this layout.
Some users may enter ‘‘diabetes’’ or ‘‘DM’’
instead of ‘‘diabetes mellitus.’’ There needs to
be a way to validate data entries. Also,
lengthy impairment descriptions such as ‘‘ce-
rebrovascular disease’’ will be repeated hun-
dreds or even thousands of times depending
on the size of the database.
The second design rule for relational data- Figure 6. Query design, 1NF.
bases, which is to eliminate redundant data,
helps avoid these problems. A short ID field desired terminology, for example, ‘‘diabetes
can be substituted for the full information mellitus’’, and disallow variations.
field. The full information field can then be Note the impairment descriptions are sep-
moved to another table, conserving drive arated into their own table, keyed by Code,
space because the information appears only which is also used in the Impairment table as
once. Microsoft Access tables can be designed part of the key. This table structure facilitates
to prompt the data entry user to select the entry validation because the Impairment

66
WESLEY—RELATIONAL DATABASE DESIGN

Figure 7. Table design 2NF.

Codes table limits the possible ways to des- case is assigned a unique case number that
ignate an impairment. Notice the tables are stands for that case. But each case is also as-
getting even narrower, with less duplication signed to an underwriter, and there are fields
of wide field values (see Figure 8). for the underwriter’s name and level. But
The query is now created as in Figure 9. Level is not an attribute of the case, it is an
This query processes as quickly as the pre- attribute of the underwriter. A better design
vious one, even though it uses one more table would be to leave a UW ID for the under-
than before, because the links are established writer in the UW Cases table (an appropriate
on key fields. Relational databases maintain attribute of that data object) and split Level
indexes of key fields (in the background, so and any other pertinent information about
you don’t see them), and these are used to the underwriter (region, supervisor, etc.) into
accelerate processes based on keyed links be- a separate table.
tween tables. The result of this query in the The third (and second) forms can defeat a
very abbreviated example database is two side effect called update anomalies. Should
rows with B. Fife and A. Griffith as entries in an underwriter marry and change her name,
the first column. there is no need to go through the error-
prone process of changing her name in all her
THIRD NORMAL FORM RULE— assigned case records (any unchanged re-
ELIMINATE FIELDS NOT DEPENDENT cords would be update anomalies). All that is
ON KEY required is a single change to her record in
the Underwriter table.
The third normal form rule is similar to the One can also separate name, date of birth,
second, but it specifically addresses relation- and any other personal information about the
ships in tables with only one key field. Con- proposed insured. This is particularly impor-
sider again the key fields defined for the ta- tant to medical directors who are concerned
bles thus far. Because each entry in a key field about confidentiality of underwriting infor-
(or concatenated entries in multifield keys) mation. The name of the insured can be kept
must be unique, each record in that table in a locked table, viewable only by the med-
must have its own key and thus the key field ical director, while the rest of the database is
represents the ‘‘data object’’ of that table. available to anyone with a business need to
In the UW Cases table in Figure 8, each study the data.

67
JOURNAL OF INSURANCE MEDICINE

Figure 8. Data, 2NF.

Figure 9. Query design, 2 NF.

Is the Rating field an attribute of the UW fields do not contribute to a description of the
Cases table data object? Here, it is. However, table’s key, they should be placed in a sepa-
if debits were recorded by impairment, a bet- rate table, shown in Figure 10. The UW ID
ter database model would have a Rating or a field is introduced for the same reason noted
Debit field in the Impairment table. above; when duplication is necessary, make
The third normal form rule specifies that if the duplicated values as small as possible.

68
WESLEY—RELATIONAL DATABASE DESIGN

Figure 10. Table design, 3NF.

Figure 11. Query design, 3NF.

The PI ID field allows for anonymity for the derwriter table and extract the names asso-
proposed insured. The sample diabetes mel- ciated with the underwriter IDs.
litus query now requires four tables (see Fig- With the third normal form rule enforced,
ure 11). the information in each table pertains to the
This query can be interpreted as follows: key (or the data object for which it stands) of
Using the Impairment Codes table, find the that table only. This whole example database
code that corresponds to diabetes mellitus. is now contained in 5 tables with a total of
Then go to the Impairment table and find out 17 fields, instead of 1 table with 10 fields. Be-
which case numbers have this impairment cause operations such as queries are per-
code assigned to them. With case numbers, formed on keys, processing speed is actually
go to the UW Cases table and find the un- faster. You can assign as many impairments
derwriter ID for each case. Then go to the Un- as you need to each case, with no wasted

69
JOURNAL OF INSURANCE MEDICINE

space for those with few or no impairments. If 20,000 or more records need to be queried,
And underwriter and proposed insured de- a good relational design will allow fast com-
tails are isolated in their own tables. pletion where a flat design query may never
finish. The results of a database query can be
further analyzed or reported in a spreadsheet
CONCLUSIONS
or other program with which the user is more
One always creates a database with certain familiar.
goals or questions in mind, and these must Relational database design has a solid the-
be met as a bare minimum. But new purposes oretical underpinning. While some of the
or questions arise during the life of a data- steps may seem unnecessary and complicat-
base and solid design at the outset will pro- ed, the resulting database is fast, reliable, and
vide the flexibility to meet the new challeng- flexible.
es. Considering that data collection is often
difficult and/or a one-shot opportunity, care- REFERENCES
ful database design is time well spent. * Roman S. Access Database Design & Programming.
The benefit of the first rule of database nor- 2nd ed. Sebastopol, CA: O’Reilly & Associates, 1999.
malization is fairly obvious. There is no other * Ehrmann D. Relational Database Design in Para-
dox—Technical Paper. Chicago, Ill: Kallistra Inc,
way to provide for indeterminate numbers of 1990.
repeating groups. The second and third nor-
mal form rules help preserve data integrity. * These are general references to the article and do not
But the most important effect is on queries. relate to any specific citation.

70

You might also like