0% found this document useful (0 votes)
158 views

Normalization

The document discusses different forms of database normalization including first, second, third normal form and Boyce-Codd normal form. It explains that normalization aims to prevent data inconsistencies by reducing redundancy. The key points covered are that first normal form requires each field to have a fixed number of values, second normal form removes non-key attributes dependent on a subset of the primary key, and third normal form removes attributes dependent on other non-key attributes. Boyce-Codd normal form further requires that every determinant must be a candidate key.

Uploaded by

Smilie Chawla
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views

Normalization

The document discusses different forms of database normalization including first, second, third normal form and Boyce-Codd normal form. It explains that normalization aims to prevent data inconsistencies by reducing redundancy. The key points covered are that first normal form requires each field to have a fixed number of values, second normal form removes non-key attributes dependent on a subset of the primary key, and third normal form removes attributes dependent on other non-key attributes. Boyce-Codd normal form further requires that every determinant must be a candidate key.

Uploaded by

Smilie Chawla
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Normalisation

Course DP DP DP DP DP DP MMC MMC Silberschatz Nederpelt Silberschatz Nederpelt Silberschatz Nederpelt Silberschatz Silberschatz

Book John D William M William M John D Christian G Christian G John D William M

Lecturer

DP DP DP DP DP DP

Silberschatz Nederpelt Silberschatz Nederpelt Silberschatz Nederpelt

DP DP DP DP DP DP

John D William M William M John D Christian G Christian G


2

What to do ?

Represent guidelines for record design. meaningful even if one is not using a Relational database system.[Date] The normalization rules are designed to prevent update anomalies (refer previous lectures) and data in-consistencies. With respect to performance tradeoffs, these guidelines are biased toward the assumption

that all non-key fields will be updated frequently. tend to penalize retrieval, since data which may have been retrievable from one record in an un-normalized design may have to be retrieved from several records in the normalized form. There is no obligation to fully normalize all records when actual performance requirements are taken into account.
C J Date .
Normalisation 3

First normal form deals with the "shape" of a record type. [codd] Under first normal form, all occurrences of a record type must contain the same number of fields. First normal form excludes variable repeating fields and groups. This is not so much a design guideline as a matter of definition. Relational database theory doesn't deal with records having a variable number of fields.
OR A table that qualifies as a relation is in 1NF
Normalisation 4

First Normal Form table must have at least one candidate key and make sure that the table dont have any duplicate record. In First Normal Form repeating groups are not allowed, that is no attributes which occur a different number of times on different records.

Normalisation

Second and third normal forms deal with the relationship between non-key and key fields. Under 2 and 3 normal forms,

a non-key field must provide a fact about the key, the whole key, and nothing but the key. In addition, the record must satisfy first normal form.

We deal now only with "single-valued" facts. The fact could be a one-to-many relationship, such as the department of an employee, or a one-to-one relationship, such as the spouse of an employee. Thus the phrase "Y is a fact about X" signifies a one-toone or one-to-many relationship between Y and X. In the general case, Y might consist of one or more fields, and so might X. In the following example, QUANTITY is a fact about the combination of PART and WAREHOUSE.

Normalisation

A relation is in 2NF if all of its nonkey attributes are dependent on all of the primary key A relation is in 3NF if it is in 2NF and has no determinants except the primary key Boyce-Codd Normal Form (BCNF) A relation is in BCNF if every determinant is a candidate key

Normalisation

Second normal form is violated when a nonkey field is a fact about a subset of a key. It is only relevant when the key is composite, i.e., consists of several fields. Consider the following inventory record:
PART WAREHOUSE QUANTITY WAREHOUSEADDRESS

The key here is PART and WAREHOUSE, but WAREHOUSE-ADDRESS is a fact about the WAREHOUSE alone.

Normalisation

PART_NO

WAREHOUSE

QUANTITY

The warehouse address is repeated in every record that refers to a part stored in that warehouse. If the address of the warehouse changes, every record referring to a part stored in that warehouse must be updated. (why ?) Because of the redundancy, the data might become inconsistent, with different records showing different addresses for the same warehouse. If at some point in time there are no parts stored in the warehouse, there may be no record in which to keep the warehouse's address.
Normalisation 9

WAREHOUSEADDRESS

PART_NO

WAREHOUSE

QUANTITY

WAREHOUSEADDRESS

PART_NO

WAREHOUSE

QUANTITY

PART_NO

WAREHOUSE -ADDRESS

decomposed into (replaced by) the two records

The normalized design enhances the integrity of the data, by minimizing redundancy and inconsistency, but at some possible performance cost for certain retrieval applications. Consider an application that wants the addresses of all warehouses stocking a certain part. In the unnormalized form, the application searches one record type. With the normalized design, the application has to search two record types, and connect the appropriate pairs.
Normalisation 10

Second Normal Form deals with the relationships of tables like composite keys and non-key columns. In Second Normal Form subset of data is removed and is organized in separate tables. This process is applied to multiple rows of a table till the duplicity get reduced.

Normalisation

11

Third normal form is violated when a non-key field is a fact about another non-key field, as in
EMPLOYEE DEPARTEMENT LOCATION

The EMPLOYEE is the key. If each department is located in one place, then the LOCATION field is a fact about the DEPARTMENT in addition to being a fact about the EMPLOYEE. The problems with this design are the same as those caused by violations of second normal form
Normalisation 12

EMPLOYEE

DEPARTEMENT

LOCATION

The department's location is repeated in the record of every employee assigned to that department. If the location of the department changes, every such record must be updated. Because of the redundancy, the data might become inconsistent, with different records showing different locations for the same department. If a department has no employees, there may be no record in which to keep the department's location.
Normalisation

Same as we talked in 2nd form


13

EMPLOYEE

DEPARTEMENT

LOCATION

EMPLOYEE

DEPARTEMENT

EMPLOYEE

LOCATION

A record is in second and third normal forms if every field is either part of the key or provides a (single-valued) fact about exactly the whole key and nothing else. Or A relation is in 3NF if it is in 2NF and has no determinants except the primary key
Normalisation 14

The Third Normal Form can be achieved only when a table is in the Second Normal Form. We can make the 3NF by eliminating all transitive dependencies lying among the fields of a record. In Third Normal Form, all columns should depend on the primary key only i.e. remove the column, which is not dependent upon the primary key.

Normalisation

15

In relational database theory, second and third normal forms are defined in terms of functional dependencies, which correspond approximately to our single-valued facts. A field Y is "functionally dependent" on a field (or fields) X if
it is invalid to have two records with the same X-value but different Y-values.
That is, a given X-value must always occur with the same Yvalue. When X is a key, then all fields are by definition functionally dependent on X in a trivial way, since there can't be two records having the same X value.

Normalisation

16

There is a slight technical difference between functional dependencies and single-valued facts as we have presented them. Functional dependencies only exist when the things involved have unique and singular identifiers (representations). For example,
A employees' address is a single-valued fact, i.e., a person has only one address. If we don't provide unique identifiers for people, then there will not be a functional dependency in the data
Normalisation 17

PERSON John Smith John Smith

ADDRESS 123 Main St., New York 321 Center St., San Francisco

Although each person has a unique address, a given name can appear with several different addresses. Hence we do not have a functional dependency corresponding to our singlevalued fact.

Normalisation

18

PERSON John Smith John Smith

ADDRESS 123 Main St., New York 321 Center St., San Francisco

Similarly, the address has to be spelled identically in each occurrence in order to have a functional dependency. In the following case the same person appears to be living at two different addresses, again precluding a functional dependency.

We do wish to point out, however, that functional dependencies and the various normal forms are really only defined for situations in which there are unique and singular identifiers. Thus the design guidelines as we present them are a bit stronger than those implied by the formal definitions of the normal forms.

Normalisation

19

As designers we know that in the following example there is a

single-valued fact about a non-key field, and hence the design is susceptible to all the update anomalies mentioned earlier.
EMPLOYEE Art Smith Tom Smith Toy Smith PERSON John Smith John Smith John Smith ADDRESS 123 Main St., New York 321 Center St., San Francisco 123 Main Street, NYC

However, in formal terms, there is no functional dependency here between FATHER'S-ADDRESS and FATHER, and hence no violation of third normal form.

Normalisation

20

Boyce-Codd Normal Form (BCNF) needs a table to meet the Third Normal Form. Here every non-trivial functional dependency must be a dependent on a superkey.

Normalisation

21

Normalisation

22

When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. 3NF does not deal well in case of a relation with overlapping candidate keys BCNF is based on the concept of a determinant.

i.e. composite candidate keys with at least one attribute in common. A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key.

Normalisation

23

Consider the following relation and determinants. R(a,b,c,d) a,c -> b,d a,d -> b the first determinant suggests that the primary key of R could be changed from a,b to a,c.

the second determinant indicates that a,d determines b, but a,d could not be the key of R as a,d does not determine all of the non key attributes of R (it does not determine c). We would say that the first determinate is a candidate key, but the second determinant is not a candidate key, and thus this relation is not in BCNF (but is in 3rd normal form).
RECALL:
In Third Normal Form, all columns should depend on the primary key only.
Normalisation 24

If this change was done all of the non-key attributes present in R could still be determined, and therefore this change is legal.

Patient No 1 2 3

Patient Name John Kerr Adam

Appointment Id Time 0 0 1 09:00 09:00 10:00

Doctor Zorro Killer Zorro

4
5

Robert
Zane

0
1

13:00
14:00

Killer
Zorro

1. This depicts a special dieting clinic where each patient has 4 appointments. 2. On the first they are weighed, 3. the second they are exercised, 4. the third their fat is removed by surgery, and on 5. the fourth their mouth is stitched closed 6. Not all patients need all four appointments! 7. If the Patient Name begins with a letter before P they get a morning appointment, otherwise they get an afternoon appointment. 8. Appointment 1 is either 09:00 or 13:00, 9. appointment 2 10:00 or 14:00, and so on. 10.From this (hopefully) make-believe scenario we can extract the following determinants:

Normalisation

25

Now we have to decide what the primary key of DB is going to be. From the information we have, we could chose:
DB(Patno,PatName,appNo,time,doctor) (example 1a) or DB(Patno,PatName,appNo,time,doctor) (example 1b)

Normalisation

26

DB(Patno,PatName,appNo,time,doctor) 1NF Eliminate repeating groups.


Do we have any ?

2NF Eliminate partial key dependencies


DB(Patno, appNo, time,doctor) R1(Patno,PatName)

3NF Eliminate transitive dependencies


None: so just as 2NF

Normalisation

27

BCNF Every determinant is a candidate key DB(Patno,appNo,time,doctor) R1(Patno,PatName) Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the right hand attributes are also present in the relation.

Normalization

28

Patno -> PatName

Patno is present in DB, but not PatName, so not relevant.

Patno,appNo -> Time,doctor


All LHS present, and time and doctor also present, so relevant. Is this a candidate key? Patno,appNo IS the key, so this is a candidate key. Thus this is OK for BCNF compliance.

Time -> appNo

Time is present, and so is appNo, so relevant. Is this a candidate key. If it was then we could rewrite DB as: DB(Patno,appNo,time,doctor) This will not work, as you need both time and Patno together to form a unique key. Thus this determinate is not a candidate key, and therefore DB is not in BCNF. We need to fix this. BCNF: rewrite to DB(Patno,time,doctor) time is enough to work out the R1(Patno,PatName) appointment number of a patient. Now R2(time,appNo)

BCNF is satisfied, and the final relations shown are in BCNF.


Normalisation

29

DB(Patno,PatName,appNo,time,doctor) 1NF Eliminate repeating groups.


None:

2NF Eliminate partial key dependencies


DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo)

3NF Eliminate transitive dependencies


None: so just as 2NF

Normalisation

30

DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo) Patno -> PatName

Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the right hand attributes are also present in the relation.
Patno is present in DB, but not PatName, so not relevant.

Patno,appNo -> Time,doctor Time -> appNo

Not all LHS present, so not relevant.


Time is present, and so is appNo, so relevant. This is a candidate key. However, Time is currently the key for R2, so satisfies the rules for BCNF.

BCNF: as 3NF DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo)

Normalisation

31

This example has demonstrated three things:


BCNF is stronger than 3NF, relations that are in 3NF are not necessarily in BCNF BCNF is needed in certain situations to obtain full understanding of the data model there are several routes to take to arrive at the same set of relations in BCNF. Unfortunately there are no rules as to which route will be the easiest one to take.

Normalisation

32

Example 2 Grade_report(StudNo,StudName,(Major,Adviser, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))

Functional dependencies StudNo -> StudName CourseNo -> Ctitle,InstrucName InstrucName -> InstrucLocn StudNo,CourseNo,Major -> Grade StudNo,Major -> Advisor Advisor -> Major Unnormalised
Grade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade))) 1NF Remove repeating groups Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)

2NF Remove partial key dependencies Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName,InstructLocn)


3NF Remove transitive dependencies Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName) Instructor(InstructName,InstructLocn)

Normalisation

33

Student : only determinant is StudNo StudCourse: only determinant is StudNo,Major Course: only determinant is CourseNo Instructor: only determinant is InstrucName StudMajor: the determinants are StudNo,Major, or Adviser Only StudNo,Major is a candidate key. BCNF
Student(StudNo,StudName) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName) Instructor(InstructName,InstructLocn) StudMajor(StudNo,Advisor) Adviser(Adviser,Major)

Normalisation

34

1st Normal Form conversion

This can be achieved by ensuring that every tuple defines a single entity by containing only atomic values OR

REFERENCE for this page: http://coronet.iicm.tugraz.at/Dbase1/ scripts/rdbh04.htm


Normalisation 35

Normalisation

36

You might also like