Normalization
Normalization
About You
How many of you
Currently
use SQL? Another RDBMS? Are responsible for database design? Will be in the future? Know about database normalization?
2
Introduction What Is Database Normalization? What are the Benefits of Database Normalization? What are the Normal Forms? First Normal Form Second Normal Form Forming Relationships Third Normal Form Joining Tables De-Normalization Conclusion
3
the Spread Sheet Syndrome i.e. Null Fields Store only the minimal amount of information. Remove redundancies. Restructure data.
4
Database Normalization
Database normalization is the process of removing redundant data from your tables in to improve storage efficiency, data integrity, and scalability. In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them. Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.
Decreased storage requirements! 1 VARCHAR(20) converted to 1 TINYINT UNSIGNED in a table of 1 million rows is a savings of ~20 MB Faster search performance! Smaller file for table scans. More directed searching. Improved data integrity!
Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)
7
Table 1
Title
Database System Concepts
Author1
Author2 ISBN
0072958863
Subject
MySQL, Computers
Pages Publisher
1168 McGrawHill
0471694665
Computers
944
McGrawHill
Remove horizontal redundancies No two columns hold the same information No single column holds more than a single item Each row must be unique Use a primary key Benefits Easier to query/sort the data More scalable Each row can be identified for updating
10
0072958863
Computers
1168
McGraw-Hill
Henry F. Korth
0471694665
Computers
944
McGraw-Hill
Abraham Silberschatz
0471694665
Computers
944
McGraw-Hill
11
We
now have two rows for a single book. Additionally, we would be violating the Second Normal Form A better solution to our problem would be to separate the data into separate tables- an Author table and a Subject table to store our information, removing that information from the Book table:
12
Subject Table Subject_ID 1 2 Subject MySQL Computers Author Table Author_ Last Name ID First Name
1
2 Book Table
Silberschatz
Korth
Abraham
Henry
ISBN
0072958863
Title
Database System Concepts Operating System Concepts
Pages
1168
Publisher
McGraw-Hill
0471694665
944
McGraw-Hill
13
Each
table has a primary key, used for joining tables together when querying the data. A primary key value must be unique with in the table (no two books can have the same ISBN number), And a primary key is also an index, which speeds up data retrieval based on the primary key. Now to define relationships between the tables
14
Forming Relationships
Three Forms One to (zero or) One One to (zero or) Many Many to Many One to One Same Table? One to Many Place PK of the One in the Many Many to Many Create a joining table
15
Relationships
Book_Author Table Book_Subject Table
ISBN
0072958863 2
0471694665 2
0471694665 2
16
Table must be in First Normal Form Remove vertical redundancy The same value should not repeat across rows Composite keys All columns in a row must refer to BOTH parts of the key Benefits Increased storage efficiency Less data repetition
17
2NF Table
Publisher Table Publisher_ID Publisher Name 1 McGraw-Hill
Book Table
ISBN
0072958863
Title
Pages
Publisher_ID
1
0471694665
18
2NF
Here we have a one-to-many relationship between the book table and the publisher. A book has only one publisher, and a publisher will publish many books. When we have a one-to-many relationship, we place a foreign key in the Book Table, pointing to the primary key of the Publisher Table. The other requirement for Second Normal Form is that you cannot have any data in a table with a composite key that does not relate to all portions of the composite key.
19
Table must be in Second Normal Form If your table is 2NF, there is a good chance it is 3NF All columns must relate directly to the primary key Benefits - No extraneous data Third normal form (3NF) requires that there are no functional dependencies of non-key attributes on something other than a candidate key. A table is in 3NF if all of the non-primary key attributes are mutually independent There should not be transitive dependencies 20
21
BCNF requires that the table is 3NF and only determinants are the candidate keys The determinant column is one which some of the columns are fully functionally dependant. it is more rigorous version of 3NF deal with relational tables that had 1. Multiple candidate keys 2. Composite candidate keys 3. candidate keys that overlapped In BCNF, it may not be possible to preserve dependencies
22
Example
J
J1
K
K1
L
L1
J2
J3
K1
K1
L1
L1
Null
k2
l2
23
4NF
If relation is in BCNF and all multivalued dependencies are also functional dependencies. Multivalued dependencies R.A -- >>R.B
Student_id skill
705 705 705 Analysis Design Program
24
De-Normalizing Tables
Use with caution Normalize first, then de-normalize Use only when you cannot optimize Try temp tables, UNIONs, VIEWs, subselects first
25
Conclusion
http://dev.mysql.com/tech-resources/articles/intro-tonormalization.html MySQL Database Design and Optimization Jon Stephens & Chad Russell Chapter 3 ISBN 1-59059-332-4 http://www.openwin.org/mike/books http://www.openwin.org/mike/presentations http://www.openwin.org/mike/presentations/ http://dev.mysql.com/tech-resources/articles/intro-tonormalization.html
26
QUESTIONS?
27