0% found this document useful (0 votes)
65 views

Relational Database Design: Bill Woolfolk Public Health Sciences University of Virginia Woolfolk@virginia - Edu

The document provides an overview of relational database design. It defines what a database is, discusses how databases are used to store and manipulate information, and describes relational database management systems. It also outlines the design process for relational databases, including identifying the purpose, reviewing existing data, creating tables and fields, identifying relationships, and normalization. The goal of good design is to reduce data redundancy, improve data integrity, and make the database more flexible.

Uploaded by

hinasana
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Relational Database Design: Bill Woolfolk Public Health Sciences University of Virginia Woolfolk@virginia - Edu

The document provides an overview of relational database design. It defines what a database is, discusses how databases are used to store and manipulate information, and describes relational database management systems. It also outlines the design process for relational databases, including identifying the purpose, reviewing existing data, creating tables and fields, identifying relationships, and normalization. The goal of good design is to reduce data redundancy, improve data integrity, and make the database more flexible.

Uploaded by

hinasana
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 67

Relational Database Design

Bill Woolfolk Public Health Sciences University of Virginia [email protected]

Objectives
Understand definition of modern relational database Understand and be able to apply a practical method for designing databases Recognize and avoid common pitfalls of database design

Whats a database?

A collection of logically-related information stored in a consistent fashion


Phone book Bank records (checking statements, etc) Library card catalog Soccer team roster

The storage format typically appears to users as some kind of tabular list (table, spreadsheet)

What Does a Database Do?


Stores information in a highly organized manner Manipulates information in various ways, some of which are not available in other applications or are easier to accomplish with a database Models some real world process or activity through electronic means

Often called modeling a business process Often replicates the process only in appearance or end result

Databases and the Systems which manage them


Modern electronic databases are created and managed through means of RDBMS: Relational DataBase Management Systems An individual data storage structure created with an RDBMS is typically called a database A database and its attendant views, reports, and procedures is called an application

Database Applications
Database (the actual DB with its attendant storage structure) SQL Engine - interprets between the database and the interface/application Interface or application the part the user gets to see and use

Relational Database Management Systems

Low-end, proprietary, specific purpose


Email: Outlook, Eudora, Mulberry Bibliographic: Ref. Mgr., EndNote, ProCite

Mid-level
Microsoft Access, Lotus Approach, Borlands Paradox More or less total control of design allows custom builds

High-end
Oracle, Microsoft SQL Server, Sybase, IBM DB2 Professional level DBs: Banks, e-commerce, secure Amazon.com, Ebay.com, Yahoo.com

Problems with Bad Design


Early computers were slow and had limited storage capacity Redundant or repeating data slowed operations and took up too much precious storage space Poor design increased chance of data errors, lost or orphaned information

Benefits of Good Design


Computers today are faster and possess much larger storage devices Rigid structure of modern relational databases helped codify problems and solutions Design problems are still possible, because the DBMS software wont protect you from poor practices Good design still increases efficiency of data processes, reduces waste of storage, and helps eliminate data entry errors

Codds Rules

Edgar F. Codd
Mathematician and Researcher at IBM Devised the relational data model in 1970 Published 12 rules in 1985 defining ideal relational database, added 6 more in 1990
E. F. Codd: A Relational Model of Data for Large Shared Data Banks. CACM 13(6): 377-387 (1970) (http://www.acm.org/classics/nov95/toc.html) Codd, E. (1985). "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" ComputerWorld, October 14 and October 21.

Modification Anomalies
Customers_Orders_Inventory Customer
General Tool General Toll General Tool Co.

OrderNum ItemNum
07456 08622 08622 2246 3145 3967

Item
Pentium Computer HP Printer 17 monitor

Totally Toys
TOTALLY TOYS XYZ Inc.

06755
08134 09010

2246
3145 0446

Pentium computer
Hewlett-Packard Printer Dot Matrix Printer

A search for General Tool Co. would miss General Tool and General Toll. A case-sensitive search for Totally Toys would miss TOTALLY TOYS

Insertion Anomalies
Customers_Orders_Inventory Customer
General Tool General Toll General Tool Co.

OrderNum ItemNum
07456 08622 08622 2246 3145 3967

Item
Pentium Computer HP Printer 17 monitor

Totally Toys
TOTALLY TOYS XYZ Inc.

06755
08134 09010

2246
3145 0446

Pentium computer
Hewlett-Packard Printer Dot Matrix Printer

How would you enter a new item into your inventory if no one had ordered it yet?

Deletion Anomalies
Customers_Orders_Inventory Customer
General Tool General Toll General Tool Co.

OrderNum ItemNum
07456 08622 08622 2246 3145 3967

Item
Pentium Computer HP Printer 17 monitor

Totally Toys
TOTALLY TOYS XYZ Inc.

06755
08134 09010

2246
3145 0446

Pentium computer
Hewlett-Packard Printer Dot Matrix Printer

If you wanted to stop selling dot matrix printer and remove it from your inventory, you would have to delete the order and customer info for XYZ Inc.

The Fix Order_Items


OrderNum 06755 07456 08134 08622 08622 ItemNum 2246 2246 3145 3145 3967

Orders CustomerNum OrderNum 7822 8755 8755 9123 9123 Products ItemNum Item 0446 2246 3145 3967 Dot Matrix Printer Pentium Computer Hewlett-Packard printer 17 monitor 09010 06755 08134 07456 08622

09010 0446 Customers CustomerNum Customer

7822
8755

XYZ Inc.
Totally Toys

9123

General Tool Co.

The Design Process


1)

2)
3) 4) 5) 6) 7) 8)

Identify the purpose of the database Review existing data Make a preliminary list of fields Make a preliminary list of tables and enter fields Identify the key fields Draft the table relationships Enter sample data and normalize the data/tables Review and finalize the design

Database Modeling
Refers to various, more-or-less formal methods for designing a database Some provide precision steps and tools

Ex.: Entity-Relationship (E-R) Modeling


Widely used, especially by high-end database designers who cant afford to miss things Fairly complex process Extremely precise

1. Identify purpose of the DB


Clients can tell you what information they want but have no idea what data they need.
We need to keep track of inventory We need an order entry system I need monthly sales reports We need to provide our product catalog on the Web

Be sure to Limit the Scope of the

2. Review Existing Data

Electronic
Legacy database(s) Spreadsheets Web forms

Manual
Paper forms Receipts and other printed output

3. Make Preliminary Field List

Make sure fields exist to support needs


Ex. if client wants monthly sales reports, you need a date field for orders. Ex. To group employees by division, you need a division identifier

Make sure values are atomic


Ex. First and Last names stored separately Ex. Addresses broken down to Street, City, State, etc.

Do not store values that can be calculated from other values


Ex. Age can be calculated from Date of Birth

4. Make Preliminary Tables


(and insert the fields into them)

Each table holds info about one subject Dont worry about the quantity of tables Look for logical groupings of information Use a consistent naming convention

Naming Conventions

Rules of thumb
Table names must be unique in DB; should be plural Field names must be unique in the table(s) Clearly identify table subject or field data Be as brief as possible Avoid abbreviations and acronyms Use less than 30 characters, Use letters, numbers, underscores (_) Do not use spaces or other special characters

Naming Conventions (contd)

Leszynski Naming Convention (LNC)


Example: tblEmployees, qryPartNum tbl, qry = tag Employees, PartNum = basename

LNC at Microsoft Developers Network

5. Identify the Key Fields

Primary Key(s)
Can never be Null; must hold unique values Automatically indexed in most RDBMSs Values rarely (if ever) change Try to include as few fields as possible

Multi-field Primary Key


Combination of two or more fields that uniquely identify an individual record

Candidate Key
Field or fields that qualify as a primary key Important in Third and Boyce-Codd Normal Forms

6. Identify Table Relationships


Based

on business rules being modeled Examples:


each customer can place many orders all employees belong to a department each TA is assigned to one course

Relationship Terminology

Relationship Type
One-to-one: expressed as 1:1 One-to-Many: expressed as 1:N or 1:M or 1: Many-to-Many: expressed as N:N or M:M

Primary or Parent Table


Table on the left side of 1:N relationship

Related or Child Table


Table on the right side of 1:N relationship

Relational Schema
Diagram of table relationships in database

Relationship Terminology (contd)


Join
Definition of how related records are returned

Join Line
Visual relationship indicators in schema

Key fields
Primary Key: the linking field on the one side of a 1:N relationship Foreign Key: the primary key from one table that is added to another table so the records can be related Non-Key Fields: any field that is not part of a primary key, multi-field primary key, or foreign key

One-to-One (1:1)
Each record in Table A relates to one, and only one, record in Table B, and vice versa. Either table can be considered the Primary, or Parent Table Can usually be combined into one table, although may not be most efficient design

One-to-Many (1:N)
Each record in Table A may relate to zero, one or many records in Table B, but each record in Table B relates to only one record in Table A. The potential relationship is whats important: there might be no related records, or only one, but there could be many. The table on the One (or left) side of a 1:N relationship is considered the Primary Table.

Many-to-Many (N:N)
A record in Table A can relate to many records in Table B, and a record in Table B can relate to many records in Table A. Most RDBMSs do not support N:N relationships, requiring the use of a linking (or intersection or bridge) table that breaks the N:N relationship down into two 1:N relationships with the linking table being on the Many side of both new relationships.

Relational Schema
Table 1 Field1_1 Field1_2 Field1_3 Field1_4 Table 2 N Field2_1 Field1_1 Field2_2 Field2_3

7. Normalization
Normal Forms (NF): design standards based on database design theory Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage. Each successive NF applies an increasingly stringent set of rules

First Normal Form (1NF)


A table is in first normal form if there are no repeating groups. Repeating Groups : a set of logically related fields or values that occur multiple times in one record

1: non-atomic value, or multiple values, stored in a field 2: multiple fields in the same table that hold logically similar values

Sample 1NF Violation - 1


Employee_Projects_Time EmployeeID Name EN1-26 Sean OBrien Project Time 0.25, 0.40, 0.30

EN1-33

EN1-35

30-452-T3, 30457-T3, 32244-T3 Amy Guya 30-452-T3, 30382-TC, 32244-T3 Steven Baranco 30-452-T3, 31238-TC

0.05, 0.35, 0.60

0.15, 0.80

Sample 1NF Violation - 2


Employee_Projects_Time EmpID EN1-26

Last Name

First Name

Proj1

Time1

Proj2

Time2

OBrien Sean

30-452- 0.25 T3 30-452- 0.05 T3

30-457- 0.40 T3 30-328- 0.35 TC

EN1-33

Guya

Amy

Tables in 1NF
Employees

*EmployeeID LastName EN1-26


EN1-33 EN1-35 *ProjNum 30-328-TC 30-452-T3 30-452-T3

FirstName Sean
Amy Steven

OBrien
Guya Baranco

Employees_Projects

EmployeeID Time EN1-33 EN1-26 EN1-33 0.35 0.25 0.05

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF and each non-key field is functionally dependent on the entire primary key. Functional dependency: a relationship between fields such that the value in one field determines the one value that can be contained in the other field. Determinant: a field in which the value determines the value in another field. Example Airport City Dulles Washington, DC

Sample 2NF Violation


Employees_Projects *EmpID Lname EN1-25 EN1-25 EN1-25 OBrien OBrien OBrien Fname Sean Sean Sean *ProjNum 30-452-T3 30-457-T3 31-124-T3 ProjTitle STAR Manual ISO Procedures Employee Handbook

EN1-33
EN1-33

Guya
Guya

Amy
Amy

30-452-T3
30-482-TC

STAR Manual
Web site

Tables in 2NF
Employees

*EmployeeID LastName EN1-26 OBrien

FirstName Sean

EN1-33
Employees_Projects

Guya

Amy
Projects

*EmployeeID *ProjNum EN1-26 EN1-33 30-452-T3 30-457-T3

*ProjNum Title
30-452-T3 30-457-T3 STAR manual ISO procedure

Third Normal Form (3NF)


A table is in 3NF when it is in 2NF and there are no transitive dependencies. Transitive Dependency: a type of functional dependency in which the value of a non-key field is determined by the value in another non-key field and that field is not a candidate key.

Sample 3NF Violation


Projects_Managers

*ProjNum ProjTitle 30-452-T3 STAR Manual 30-457-T3 ISO Procedures 30-482-TC Web Site 31-124-T3 STAR prototype 35-272-TC Order System

ProjMgr Garrison Jacanda Friedman Garrison Jacanda

Phone 2756 2954 2846 2756 2954

Tables in 3NF
Projects

*ProjNum 30-452-T3
30-457-T3

ProjTitle STAR manual

Manager Garrison

ISO procedures Jacanda


Project Managers

*Manager Garrison Jacanda

Phone 2846 2756

Boyce-Codd Normal Form (BCNF)


A table is in BCNF when it is in 3NF and all determinants are candidate keys. Developed to cover situations that 3NF did not address. Applies to situations where you have overlapping candidate keys.

Sample Business Rules

Business Rules:
Each course can have many students Each student can take many courses Each course can have multiple teaching assistants (TAs) Each TA is associated with only one course For each course, each student has one TA

Sample BCNF Violation


Course_Students_TAs

CourseNum ENG101 ENG101

Student Jones Grayson

TA Clark Chen

ENG101
MAT350 MAT350

Samara
Grayson Jones

Chen
Powers OShea

MAT350

Berg

Powers

Tables in BCNF
Courses

*CourseNum *Student ENG101 Jones

MAT350
Students

Grayson
TAs

*Student Jones Grayson

*TA Clark Chen

*CourseNum *TA ENG101 MAT350 Clark Chen

Fourth Normal Form (4NF)


A table is in 4NF when it is in BCNF and there are no multi-valued dependencies. Multi-valued Dependency: occurs when, for each value in field A, there is a set of values for field B and a set of values for field C, but B and C are not related. Occurs when the table contains fields that are not logically related.

Sample 4NF Violation - 1


Movies *Movie Once Upon a Time Once Upon a Time *Star Judy Garland Mickey Rooney *Producer Alfred Brown Alfred Brown

Once Upon a Time


Once Upon a Time Moonlight

Judy Garland
Mickey Rooney Humphrey Bogart

Muriel Hemingway
Muriel Hemingway Alfred Brown

Moonlight

Judy Garland

Alfred Brown

Tables in 4NF - 1
Stars
*Movie Once Upon a Time Once Upon a Time Moonlight Moonlight *Star Judy Garland Mickey Rooney Humphrey Bogart Judy Garland

Producers
*Movie Once Upon a Time Once Upon a Time Moonlight *Producer Alfred Brown Muriel Hemingway Alfred Brown

Sample 4NF Violation - 2


Projects_Equipment
Dept Code IS IS AC AC AC TW TW TW 30-452-T3 30-457-T3 31-124-T3 EN1-10 EN1-15 EN1-15 Laser Printer 109 36-152-TC EN1-15 Dot matrix printer Calculator w/tape 486 PC 358 239 275 ProjNum 36-272-TC ProjMgrID EN1-15 Equip CD-ROM VGA monitor PropID 657 305

Tables in 4NF - 2
Equipment *PropID 657 Equip CD-ROM DeptCode IS

305
358 Projects *ProjNum 30-452-T3

VGA monitor

IS

Dot matrix printer AC

ProjMgrID EN1-15

DeptCode IS

30-457-T3
35-152-TC

EN1-15
EN1-10

AC
TW

Fifth Normal Form (5NF)


A table is in 5NF when it is in 4NF and there are no cyclic dependencies. Cyclic Dependency: occurs when there is a multi-field primary key with three or more fields (ex. A, B, C) and those fields are related in pairs AB, BC and AC. Can occur only with a multi-field primary key of three or more fields

Sample 5NF Violation


BUYING

*Buyer Chris Chris Chris Lori

*Product Jeans Jeans Shirts Jeans

*Company Levi Wrangler Levi Levi

Do the math

Our sample is two buyers, two products and two companies, so 2 x 2 x 2 = 8 total records

But, what if our store has 20 buyers, 50 products and 100 companies? 20 x 50 x 100 = 100,000 total records

A Tempting Solution
Buyers
*Buyer
Chris Chris Lori

*Product
Jeans Shirts Jeans

Products
*Product Jeans Jeans Shirts *Company Wrangler Levi Levi

The Correct Solution


Buyers
*Buyer *Product
Chris Chris Lori Jeans Shirts Jeans

Products
*Product *Company Jeans Jeans Shirts Wrangler Levi Levi

Companies
*Buyer Chris Chris Lori *Company Wrangler Levi Levi

Check the Math, Again


If

our company has 20 buyers, 50 products and 100 companies? Buyers = 20 x 50 = 1000 Products = 50 x 100 = 5000 Companies = 20 x 100 = 2000

8,000 total records instead of 100,000!

8. Finalizing the Design


Double-check to ensure good, principle-based design Evaluate design in light of business model and determine desired deviations from design principles

Process efficiency Security concerns

Thats it for Table Design


Watch for repeating values and fields Check against the Normal Forms Make new tables when necessary Re-check all tables against the NFs Remember the business rules Use common sense, but check anyway!

Ensuring Data Integrity

Placing constraints on how and when and where data can be entered Done after or along with table design Part of design process because many constraints are established at the database and table levels

Referential Integrity
True relational databases support Referential Integrity: every non-null foreign key value must match an existing primary key value. In other words, every record in a related table must have a matching record in the primary table. Preserves the validity of foreign key values. Enforced at database level.

Cascading Updates
When a primary key value changes, Cascade Update changes the corresponding values in the related records, so no records get orphaned. Usually only one level deep

Foreign key is not usually primary key of related table (except in 1:1 relationships) hence no other tables are usually related to it

Cascade Deletes
When a primary table record is deleted, all matching records in any related table are also deleted Can propagate through multiple tables if Cascade Delete is turned on in all relationships between those tables Another protection against orphan records, only this time by eradicating them instead!

Levels of Enforcement
Referential Integrity enforced at database level because it affects relationship between two tables. Many other business rules enforced at field and table level to ensure data integrity. Business rule implementation should be documented: how and where it is enforced in the design. Some rules cant be enforced at table or field level; must be enforced in the application level.

Testing of Business Rules

Always test business rule implementation


What happens when rule is met? What happens when rule is violated?

Not much good as a data entry constraint if it doesnt constrain properly Good application or interface design will provide feedback when user violates a constraint or rule

Field Level Integrity

Constraining by use of field properties


Data type: text, number, Yes/No, Date/Time Field size Formats

Entry and editing constraints


Required Indexed, with or without duplicates Input masks Default value Validation Rule

Table Level Integrity

Field Comparisons
Compare value in one field to value in another Comparison performed before record is saved Violations could display an error message or force constraint of available values

Validation or Lookup Tables


Store generally static set of values Stored values used to populate new records to ensure accuracy of data entry

Documentation
A good design deserves good documentation Data Dictionary for database/table design

Table and field names Table and field properties Relationships, including primary and foreign keys Indexes

Provide reasons for design features, especially if they intentionally violate

You might also like