Normalization
Normalization
Course DP DP DP DP DP DP MMC MMC Silberschatz Nederpelt Silberschatz Nederpelt Silberschatz Nederpelt Silberschatz Silberschatz
Lecturer
DP DP DP DP DP DP
DP DP DP DP DP DP
What to do ?
Represent guidelines for record design. meaningful even if one is not using a Relational database system.[Date] The normalization rules are designed to prevent update anomalies (refer previous lectures) and data in-consistencies. With respect to performance tradeoffs, these guidelines are biased toward the assumption
that all non-key fields will be updated frequently. tend to penalize retrieval, since data which may have been retrievable from one record in an un-normalized design may have to be retrieved from several records in the normalized form. There is no obligation to fully normalize all records when actual performance requirements are taken into account.
C J Date .
Normalisation 3
First normal form deals with the "shape" of a record type. [codd] Under first normal form, all occurrences of a record type must contain the same number of fields. First normal form excludes variable repeating fields and groups. This is not so much a design guideline as a matter of definition. Relational database theory doesn't deal with records having a variable number of fields.
OR A table that qualifies as a relation is in 1NF
Normalisation 4
First Normal Form table must have at least one candidate key and make sure that the table dont have any duplicate record. In First Normal Form repeating groups are not allowed, that is no attributes which occur a different number of times on different records.
Normalisation
Second and third normal forms deal with the relationship between non-key and key fields. Under 2 and 3 normal forms,
a non-key field must provide a fact about the key, the whole key, and nothing but the key. In addition, the record must satisfy first normal form.
We deal now only with "single-valued" facts. The fact could be a one-to-many relationship, such as the department of an employee, or a one-to-one relationship, such as the spouse of an employee. Thus the phrase "Y is a fact about X" signifies a one-toone or one-to-many relationship between Y and X. In the general case, Y might consist of one or more fields, and so might X. In the following example, QUANTITY is a fact about the combination of PART and WAREHOUSE.
Normalisation
A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key A relation is in 3NF if it is in 2NF and has no determinants except the primary key Boyce-Codd Normal Form (BCNF) A relation is in BCNF if every determinant is a candidate key
Normalisation
Second normal form is violated when a nonkey field is a fact about a subset of a key. It is only relevant when the key is composite, i.e., consists of several fields. Consider the following inventory record:
PART WAREHOUSE QUANTITY WAREHOUSEADDRESS
The key here is PART and WAREHOUSE, but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone.
Normalisation
PART_NO
WAREHOUSE
QUANTITY
The warehouse address is repeated in every record that refers to a part stored in that warehouse. If the address of the warehouse changes, every record referring to a part stored in that warehouse must be updated. (why ?) Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse. If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouse's address.
Normalisation 9
WAREHOUSEADDRESS
PART_NO
WAREHOUSE
QUANTITY
WAREHOUSEADDRESS
PART_NO
WAREHOUSE
QUANTITY
PART_NO
WAREHOUSE -ADDRESS
The normalized design enhances the integrity of the data, by minimizing redundancy and inconsistency, but at some possible performance cost for certain retrieval applications. Consider an application that wants the addresses of all warehouses stocking a certain part. In the unnormalized form, the application searches one record type. With the normalized design, the application has to search two record types, and connect the appropriate pairs.
Normalisation 10
Second Normal Form deals with the relationships of tables like composite keys and non-key columns. In Second Normal Form subset of data is removed and is organized in separate tables. This process is applied to multiple rows of a table till the duplicity get reduced.
Normalisation
11
Third normal form is violated when a non-key field is a fact about another non-key field, as in
EMPLOYEE DEPARTEMENT LOCATION
The EMPLOYEE is the key. If each department is located in one place, then the LOCATION field is a fact about the DEPARTMENT in addition to being a fact about the EMPLOYEE. The problems with this design are the same as those caused by violations of second normal form
Normalisation 12
EMPLOYEE
DEPARTEMENT
LOCATION
The department's location is repeated in the record of every employee assigned to that department. If the location of the department changes, every such record must be updated. Because of the redundancy, the data might become inconsistent, with different records showing different locations for the same department. If a department has no employees, there may be no record in which to keep the department's location.
Normalisation
EMPLOYEE
DEPARTEMENT
LOCATION
EMPLOYEE
DEPARTEMENT
EMPLOYEE
LOCATION
A record is in second and third normal forms if every field is either part of the key or provides a (single-valued) fact about exactly the whole key and nothing else. Or A relation is in 3NF if it is in 2NF and has no determinants except the primary key
Normalisation 14
The Third Normal Form can be achieved only when a table is in the Second Normal Form. We can make the 3NF by eliminating all transitive dependencies lying among the fields of a record. In Third Normal Form, all columns should depend on the primary key only i.e. remove the column, which is not dependent upon the primary key.
Normalisation
15
In relational database theory, second and third normal forms are defined in terms of functional dependencies, which correspond approximately to our single-valued facts. A field Y is "functionally dependent" on a field (or fields) X if
it is invalid to have two records with the same X-value but different Y-values.
That is, a given X-value must always occur with the same Yvalue. When X is a key, then all fields are by definition functionally dependent on X in a trivial way, since there can't be two records having the same X value.
Normalisation
16
There is a slight technical difference between functional dependencies and single-valued facts as we have presented them. Functional dependencies only exist when the things involved have unique and singular identifiers (representations). For example,
A employees' address is a single-valued fact, i.e., a person has only one address. If we don't provide unique identifiers for people, then there will not be a functional dependency in the data
Normalisation 17
ADDRESS 123 Main St., New York 321 Center St., San Francisco
Although each person has a unique address, a given name can appear with several different addresses. Hence we do not have a functional dependency corresponding to our singlevalued fact.
Normalisation
18
ADDRESS 123 Main St., New York 321 Center St., San Francisco
Similarly, the address has to be spelled identically in each occurrence in order to have a functional dependency. In the following case the same person appears to be living at two different addresses, again precluding a functional dependency.
We do wish to point out, however, that functional dependencies and the various normal forms are really only defined for situations in which there are unique and singular identifiers. Thus the design guidelines as we present them are a bit stronger than those implied by the formal definitions of the normal forms.
Normalisation
19
single-valued fact about a non-key field, and hence the design is susceptible to all the update anomalies mentioned earlier.
EMPLOYEE Art Smith Tom Smith Toy Smith PERSON John Smith John Smith John Smith ADDRESS 123 Main St., New York 321 Center St., San Francisco 123 Main Street, NYC
However, in formal terms, there is no functional dependency here between FATHER'S-ADDRESS and FATHER, and hence no violation of third normal form.
Normalisation
20
Boyce-Codd Normal Form (BCNF) needs a table to meet the Third Normal Form. Here every non-trivial functional dependency must be a dependent on a superkey.
Normalisation
21
Normalisation
22
When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. 3NF does not deal well in case of a relation with overlapping candidate keys BCNF is based on the concept of a determinant.
i.e. composite candidate keys with at least one attribute in common. A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key.
Normalisation
23
Consider the following relation and determinants. R(a,b,c,d) a,c -> b,d a,d -> b the first determinant suggests that the primary key of R could be changed from a,b to a,c.
the second determinant indicates that a,d determines b, but a,d could not be the key of R as a,d does not determine all of the non key attributes of R (it does not determine c). We would say that the first determinate is a candidate key, but the second determinant is not a candidate key, and thus this relation is not in BCNF (but is in 3rd normal form).
RECALL:
In Third Normal Form, all columns should depend on the primary key only.
Normalisation 24
If this change was done all of the non-key attributes present in R could still be determined, and therefore this change is legal.
Patient No 1 2 3
4
5
Robert
Zane
0
1
13:00
14:00
Killer
Zorro
1. This depicts a special dieting clinic where each patient has 4 appointments. 2. On the first they are weighed, 3. the second they are exercised, 4. the third their fat is removed by surgery, and on 5. the fourth their mouth is stitched closed 6. Not all patients need all four appointments! 7. If the Patient Name begins with a letter before P they get a morning appointment, otherwise they get an afternoon appointment. 8. Appointment 1 is either 09:00 or 13:00, 9. appointment 2 10:00 or 14:00, and so on. 10.From this (hopefully) make-believe scenario we can extract the following determinants:
Normalisation
25
Now we have to decide what the primary key of DB is going to be. From the information we have, we could chose:
DB(Patno,PatName,appNo,time,doctor) (example 1a) or DB(Patno,PatName,appNo,time,doctor) (example 1b)
Normalisation
26
Normalisation
27
BCNF Every determinant is a candidate key DB(Patno,appNo,time,doctor) R1(Patno,PatName) Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the right hand attributes are also present in the relation.
Normalization
28
All LHS present, and time and doctor also present, so relevant. Is this a candidate key? Patno,appNo IS the key, so this is a candidate key. Thus this is OK for BCNF compliance.
Time is present, and so is appNo, so relevant. Is this a candidate key. If it was then we could rewrite DB as: DB(Patno,appNo,time,doctor) This will not work, as you need both time and Patno together to form a unique key. Thus this determinate is not a candidate key, and therefore DB is not in BCNF. We need to fix this. BCNF: rewrite to DB(Patno,time,doctor) time is enough to work out the R1(Patno,PatName) appointment number of a patient. Now R2(time,appNo)
29
Normalisation
30
Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the right hand attributes are also present in the relation.
Patno is present in DB, but not PatName, so not relevant.
Normalisation
31
Normalisation
32
Functional dependencies StudNo -> StudName CourseNo -> Ctitle,InstrucName InstrucName -> InstrucLocn StudNo,CourseNo,Major -> Grade StudNo,Major -> Advisor Advisor -> Major Unnormalised
Grade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade))) 1NF Remove repeating groups Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)
Normalisation
33
Student : only determinant is StudNo StudCourse: only determinant is StudNo,Major Course: only determinant is CourseNo Instructor: only determinant is InstrucName StudMajor: the determinants are StudNo,Major, or Adviser Only StudNo,Major is a candidate key. BCNF
Student(StudNo,StudName) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName) Instructor(InstructName,InstructLocn) StudMajor(StudNo,Advisor) Adviser(Adviser,Major)
Normalisation
34
This can be achieved by ensuring that every tuple defines a single entity by containing only atomic values OR
Normalisation
36