Relational Database Design: Bill Woolfolk Public Health Sciences University of Virginia Woolfolk@virginia - Edu
Relational Database Design: Bill Woolfolk Public Health Sciences University of Virginia Woolfolk@virginia - Edu
Objectives
Understand definition of modern relational database Understand and be able to apply a practical method for designing databases Recognize and avoid common pitfalls of database design
Whats a database?
The storage format typically appears to users as some kind of tabular list (table, spreadsheet)
Often called modeling a business process Often replicates the process only in appearance or end result
Database Applications
Database (the actual DB with its attendant storage structure) SQL Engine - interprets between the database and the interface/application Interface or application the part the user gets to see and use
Mid-level
Microsoft Access, Lotus Approach, Borlands Paradox More or less total control of design allows custom builds
High-end
Oracle, Microsoft SQL Server, Sybase, IBM DB2 Professional level DBs: Banks, e-commerce, secure Amazon.com, Ebay.com, Yahoo.com
Codds Rules
Edgar F. Codd
Mathematician and Researcher at IBM Devised the relational data model in 1970 Published 12 rules in 1985 defining ideal relational database, added 6 more in 1990
E. F. Codd: A Relational Model of Data for Large Shared Data Banks. CACM 13(6): 377-387 (1970) (http://www.acm.org/classics/nov95/toc.html) Codd, E. (1985). "Is Your DBMS Really Relational?" and "Does Your DBMS Run By the Rules?" ComputerWorld, October 14 and October 21.
Modification Anomalies
Customers_Orders_Inventory Customer
General Tool General Toll General Tool Co.
OrderNum ItemNum
07456 08622 08622 2246 3145 3967
Item
Pentium Computer HP Printer 17 monitor
Totally Toys
TOTALLY TOYS XYZ Inc.
06755
08134 09010
2246
3145 0446
Pentium computer
Hewlett-Packard Printer Dot Matrix Printer
A search for General Tool Co. would miss General Tool and General Toll. A case-sensitive search for Totally Toys would miss TOTALLY TOYS
Insertion Anomalies
Customers_Orders_Inventory Customer
General Tool General Toll General Tool Co.
OrderNum ItemNum
07456 08622 08622 2246 3145 3967
Item
Pentium Computer HP Printer 17 monitor
Totally Toys
TOTALLY TOYS XYZ Inc.
06755
08134 09010
2246
3145 0446
Pentium computer
Hewlett-Packard Printer Dot Matrix Printer
How would you enter a new item into your inventory if no one had ordered it yet?
Deletion Anomalies
Customers_Orders_Inventory Customer
General Tool General Toll General Tool Co.
OrderNum ItemNum
07456 08622 08622 2246 3145 3967
Item
Pentium Computer HP Printer 17 monitor
Totally Toys
TOTALLY TOYS XYZ Inc.
06755
08134 09010
2246
3145 0446
Pentium computer
Hewlett-Packard Printer Dot Matrix Printer
If you wanted to stop selling dot matrix printer and remove it from your inventory, you would have to delete the order and customer info for XYZ Inc.
Orders CustomerNum OrderNum 7822 8755 8755 9123 9123 Products ItemNum Item 0446 2246 3145 3967 Dot Matrix Printer Pentium Computer Hewlett-Packard printer 17 monitor 09010 06755 08134 07456 08622
7822
8755
XYZ Inc.
Totally Toys
9123
2)
3) 4) 5) 6) 7) 8)
Identify the purpose of the database Review existing data Make a preliminary list of fields Make a preliminary list of tables and enter fields Identify the key fields Draft the table relationships Enter sample data and normalize the data/tables Review and finalize the design
Database Modeling
Refers to various, more-or-less formal methods for designing a database Some provide precision steps and tools
Electronic
Legacy database(s) Spreadsheets Web forms
Manual
Paper forms Receipts and other printed output
Each table holds info about one subject Dont worry about the quantity of tables Look for logical groupings of information Use a consistent naming convention
Naming Conventions
Rules of thumb
Table names must be unique in DB; should be plural Field names must be unique in the table(s) Clearly identify table subject or field data Be as brief as possible Avoid abbreviations and acronyms Use less than 30 characters, Use letters, numbers, underscores (_) Do not use spaces or other special characters
Primary Key(s)
Can never be Null; must hold unique values Automatically indexed in most RDBMSs Values rarely (if ever) change Try to include as few fields as possible
Candidate Key
Field or fields that qualify as a primary key Important in Third and Boyce-Codd Normal Forms
Relationship Terminology
Relationship Type
One-to-one: expressed as 1:1 One-to-Many: expressed as 1:N or 1:M or 1: Many-to-Many: expressed as N:N or M:M
Relational Schema
Diagram of table relationships in database
Join
Definition of how related records are returned
Join Line
Visual relationship indicators in schema
Key fields
Primary Key: the linking field on the one side of a 1:N relationship Foreign Key: the primary key from one table that is added to another table so the records can be related Non-Key Fields: any field that is not part of a primary key, multi-field primary key, or foreign key
One-to-One (1:1)
Each record in Table A relates to one, and only one, record in Table B, and vice versa. Either table can be considered the Primary, or Parent Table Can usually be combined into one table, although may not be most efficient design
One-to-Many (1:N)
Each record in Table A may relate to zero, one or many records in Table B, but each record in Table B relates to only one record in Table A. The potential relationship is whats important: there might be no related records, or only one, but there could be many. The table on the One (or left) side of a 1:N relationship is considered the Primary Table.
Many-to-Many (N:N)
A record in Table A can relate to many records in Table B, and a record in Table B can relate to many records in Table A. Most RDBMSs do not support N:N relationships, requiring the use of a linking (or intersection or bridge) table that breaks the N:N relationship down into two 1:N relationships with the linking table being on the Many side of both new relationships.
Relational Schema
Table 1 Field1_1 Field1_2 Field1_3 Field1_4 Table 2 N Field2_1 Field1_1 Field2_2 Field2_3
7. Normalization
Normal Forms (NF): design standards based on database design theory Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage. Each successive NF applies an increasingly stringent set of rules
1: non-atomic value, or multiple values, stored in a field 2: multiple fields in the same table that hold logically similar values
EN1-33
EN1-35
30-452-T3, 30457-T3, 32244-T3 Amy Guya 30-452-T3, 30382-TC, 32244-T3 Steven Baranco 30-452-T3, 31238-TC
0.15, 0.80
Last Name
First Name
Proj1
Time1
Proj2
Time2
OBrien Sean
EN1-33
Guya
Amy
Tables in 1NF
Employees
FirstName Sean
Amy Steven
OBrien
Guya Baranco
Employees_Projects
A table is in 2NF if it is in 1NF and each non-key field is functionally dependent on the entire primary key. Functional dependency: a relationship between fields such that the value in one field determines the one value that can be contained in the other field. Determinant: a field in which the value determines the value in another field. Example Airport City Dulles Washington, DC
EN1-33
EN1-33
Guya
Guya
Amy
Amy
30-452-T3
30-482-TC
STAR Manual
Web site
Tables in 2NF
Employees
FirstName Sean
EN1-33
Employees_Projects
Guya
Amy
Projects
*ProjNum Title
30-452-T3 30-457-T3 STAR manual ISO procedure
*ProjNum ProjTitle 30-452-T3 STAR Manual 30-457-T3 ISO Procedures 30-482-TC Web Site 31-124-T3 STAR prototype 35-272-TC Order System
Tables in 3NF
Projects
*ProjNum 30-452-T3
30-457-T3
Manager Garrison
Business Rules:
Each course can have many students Each student can take many courses Each course can have multiple teaching assistants (TAs) Each TA is associated with only one course For each course, each student has one TA
TA Clark Chen
ENG101
MAT350 MAT350
Samara
Grayson Jones
Chen
Powers OShea
MAT350
Berg
Powers
Tables in BCNF
Courses
MAT350
Students
Grayson
TAs
Judy Garland
Mickey Rooney Humphrey Bogart
Muriel Hemingway
Muriel Hemingway Alfred Brown
Moonlight
Judy Garland
Alfred Brown
Tables in 4NF - 1
Stars
*Movie Once Upon a Time Once Upon a Time Moonlight Moonlight *Star Judy Garland Mickey Rooney Humphrey Bogart Judy Garland
Producers
*Movie Once Upon a Time Once Upon a Time Moonlight *Producer Alfred Brown Muriel Hemingway Alfred Brown
Tables in 4NF - 2
Equipment *PropID 657 Equip CD-ROM DeptCode IS
305
358 Projects *ProjNum 30-452-T3
VGA monitor
IS
ProjMgrID EN1-15
DeptCode IS
30-457-T3
35-152-TC
EN1-15
EN1-10
AC
TW
Do the math
Our sample is two buyers, two products and two companies, so 2 x 2 x 2 = 8 total records
But, what if our store has 20 buyers, 50 products and 100 companies? 20 x 50 x 100 = 100,000 total records
A Tempting Solution
Buyers
*Buyer
Chris Chris Lori
*Product
Jeans Shirts Jeans
Products
*Product Jeans Jeans Shirts *Company Wrangler Levi Levi
Products
*Product *Company Jeans Jeans Shirts Wrangler Levi Levi
Companies
*Buyer Chris Chris Lori *Company Wrangler Levi Levi
our company has 20 buyers, 50 products and 100 companies? Buyers = 20 x 50 = 1000 Products = 50 x 100 = 5000 Companies = 20 x 100 = 2000
Placing constraints on how and when and where data can be entered Done after or along with table design Part of design process because many constraints are established at the database and table levels
Referential Integrity
True relational databases support Referential Integrity: every non-null foreign key value must match an existing primary key value. In other words, every record in a related table must have a matching record in the primary table. Preserves the validity of foreign key values. Enforced at database level.
Cascading Updates
When a primary key value changes, Cascade Update changes the corresponding values in the related records, so no records get orphaned. Usually only one level deep
Foreign key is not usually primary key of related table (except in 1:1 relationships) hence no other tables are usually related to it
Cascade Deletes
When a primary table record is deleted, all matching records in any related table are also deleted Can propagate through multiple tables if Cascade Delete is turned on in all relationships between those tables Another protection against orphan records, only this time by eradicating them instead!
Levels of Enforcement
Referential Integrity enforced at database level because it affects relationship between two tables. Many other business rules enforced at field and table level to ensure data integrity. Business rule implementation should be documented: how and where it is enforced in the design. Some rules cant be enforced at table or field level; must be enforced in the application level.
Not much good as a data entry constraint if it doesnt constrain properly Good application or interface design will provide feedback when user violates a constraint or rule
Field Comparisons
Compare value in one field to value in another Comparison performed before record is saved Violations could display an error message or force constraint of available values
Documentation
A good design deserves good documentation Data Dictionary for database/table design
Table and field names Table and field properties Relationships, including primary and foreign keys Indexes