20 22 Normalization
20 22 Normalization
Course Leaders:
Gp Cpt N Rath VSM
Ami Rai E.
1
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Objectives
2
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Topics
• Normalization of Relations
• Definitions of Keys and Attributes Participating in Keys
• First Normal Form
• Second Normal Form
• Third Normal Form
• BCNF (Boyce-Codd Normal Form)
• Multivalued Dependencies and Fourth Normal Form
• Join Dependencies and Fifth Normal Form
• Inclusion dependencies
3
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization of Relations
• The normalization process, as first proposed by Codd
• Normalization
– The process of decomposing unsatisfactory "bad" relations by breaking up their
attributes into smaller relations
• Normalization of data
– A process of analyzing the given relation schemas based on their FDs and primary keys to
achieve the desirable properties of
1. minimizing redundancy and
2. minimizing the insertion, deletion, and update Anomalies
4
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Anomalies - Examples
5
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normal Forms
• Normal form:
– Condition using keys and FDs of a relation to certify whether a relation schema is in a
particular normal form
6
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normal Forms contd.
• Definition
The normal form of a relation refers to the highest normal form condition that
it meets, and hence indicates the degree to which it has been normalized
7
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Practical Use of Normal Forms
• Normalization is carried out in practice so that
– the resulting designs are of high quality and meet the desirable properties
• The practical utility of these normal forms becomes questionable when the
constraints on which they are based are hard to understand or to detect
• The database designers need not normalize to the highest possible normal form
– normalization only up to 3NF, BCNF, or at most 4NF
• Denormalization:
– the process of storing the join of higher normal form relations as a base relation, which
is in a lower normal form
8
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
First Normal Form (1NF)
• 1NF disallows
– composite attributes
– multivalued attributes
– multivalued attributes that are themselves composite (nested relations)
• It states that the domain of an attribute must include only atomic (simple,
indivisible) values
• The value of any attribute in a tuple must be a single value from the domain of that
attribute
• The only attribute values permitted by 1NF are single atomic (or indivisible) values
9
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 1NF
• (a) A relation schema that is not in 1NF
10
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 1NF contd.
• Using 1NF, data redundancy increases, as there may be many columns with the
same data in multiple rows
• But each row as a whole will be unique
11
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 1NF contd.
• To achieve first normal form for such a relation
– Remove the attribute Dlocations that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT
– The primary key of this relation is the combination {Dnumber, Dlocation}
12
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Super Key, Candidate Key and Primary Key
• Super key • Consider the instance of the Book table
– specifies that no different tuples may BookID Name Author
have the same SK B1 XYZ A1
– May have redundant attributes B2 ABC A1
• Candidate key B3 XYZ A2
– A super key without redundancy B4 PQR A3
– Not reducible B5 RST A1
• Primary key B6 ABC A3
– Attribute which are not present in the right hand side will be in candidate list
– (AE) is in the candidate key list
– Find (AE) + -attributes which can be determined from this set = {AECDBF}
– All the attributes can be determined from (AE). So (AE) is a candidate key
14
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Definitions of Attributes Participating in Keys
15
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Second Normal Form
• Definition - A relation schema R is in 2NF if every nonprime attribute A in R is fully
functionally dependent on the primary key of R
• The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key
• If the primary key contains a single attribute, the test need not be applied at all 16
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Full Dependency and Partial Dependency
• Consider the primary key {Ssn,Pnumber}, we get
• FD4: {Ssn,Pnumber} -> Hours
• FD5: {Ssn,Pnumber} -> Ename
• FD6: {Ssn,Pnumber} -> Pname
• FD7: {Ssn,Pnumber} -> Plocation
• Prime attributes - Ssn,Pnumber
• Nonprime attributes –Hours, Ename, Pname , • Hours is fully dependent on the primary key, {Ssn,
Plocation Pnumber}
– neither Ssn → Hours nor Pnumber→Hours holds
• FD1: {Ssn,Pnumber} -> Hours • Ename is partially dependent on the primary key,
• FD2: Ssn -> Ename {Ssn, Pnumber}
• FD3: Pnumber -> {Pname,Plocation} same as – because Ssn→Ename holds
Pnumber -> Pname
Pnumber -> Plocation • Pname ?
• Plocation ?
17
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
The Testing for Second Normal Form
• The EMP_PROJ relation in Figure in the previous slide is in 1NF but is not in 2NF
– The nonprime attribute Ename violates 2NF because of FD2, as do the nonprime
attributes Pname and Plocation because of FD3
– The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially
dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test
18
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
The Testing for Second Normal Form contd.
• If a relation schema is not in 2NF, it can be second normalized or 2NF normalized
into a number of 2NF relations in which nonprime attributes are associated only
with the part of the primary key on which they are fully functionally dependent
19
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Third Normal Form
• Definition - A relation schema R is in 3NF if it satisfies 2NF and no nonprime
attribute of R is transitively dependent on the primary key
20
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Third Normal Form - Example
21
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalizing into Third Normal Form
• Normalize EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and
ED2 as shown in the Figure
• ED1 and ED2 represent independent entity facts about employees and departments
22
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Summary of Normal Forms
23
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
General Definitions of 2NF and 3NF
• Definition of 2NF - A relation schema R is in second normal form (2NF) if every
nonprime attribute A in R is not partially dependent on any key of R
24
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 2NF and 3NF - Example
• Consider the relation schema LOTS shown in the Figure, which describes parcels of
land for sale in various counties of a state
• Suppose that there are two candidate keys
1. Property_id# - Property_id# numbers are unique across counties for the entire state
2. {County_name, Lot#} - lot numbers are unique only within each county
25
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 2NF
26
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 2NF and 3NF - Example contd.
27
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Normalization into 2NF and 3NF - Example contd.
28
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
BCNF (Boyce-Codd Normal Form)
29
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Trivial and Non Trivial FD
• A functional dependency FD: X → Y is called trivial if Y is a subset of X
• A functional dependency FD: X → Y is called non trivial if Y is not a subset of X
• Example
• An employee table with three attributes: emp_id, emp_name, emp_address
31
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Achieving the BCNF by Decomposition
32
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
BCNF - Example
33
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Multivalued Dependency
• Constraint specified by Multivalued dependency (MVD)
– If we have two or more multivalued independent attributes in the same relation schema,
we get into a problem of having to repeat every value of one of the attributes with every
value of the other attribute to keep the relation state consistent and to maintain the
independence among the attributes involved
35
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Multivalued Dependency - Example
• On several projects and may have several dependents, and the employee’s projects
and dependents are independent of one another
• To keep the relation state consistent, and to avoid any spurious relationship
between the two independent attributes, we must have a separate tuple to
represent every combination of an employee’s dependent and an employee’s
project
• This constraint is specified as a multivalued dependency on the EMP relation
• Informally, whenever two independent 1:N relationships A:B and A:C are mixed in
the same relation, R(A, B, C), an MVD may arise
36
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Fourth Normal Form
• Definition. A relation schema R is in 4NF with respect to a set of dependencies F
(that includes functional dependencies and multivalued dependencies) if, for every
nontrivial multivalued dependency X →→ Y in F+, X is a superkey for R
37
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Fourth Normal Form - Example
• The EMP relation in the previous Figure is not in 4NF
– because in the nontrivial MVDs Ename→→ Pname and Ename →→ Dname
– Ename is not a superkey of EMP
• Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS
38
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Join Dependencies
• Definition - A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on
relation schema R, specifies a constraint on the states r of R. The constraint states
that every legal state r of R should have a nonadditive (lossless) join decomposition
into R1, R2, ..., Rn. Hence, for every such r we have
∗ (πR1(r), πR2(r), ..., πRn(r)) = r
• A join dependency JD(R1, R2, ..., Rn), specified on relation schema R, is a trivial JD if
one of the relation schemas Ri in JD(R1, R2, ..., Rn) is equal to R
39
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Fifth Normal Form
• Normalization into 5NF is very rarely done in practice
• Definition. A relation schema R is in fifth normal form (5NF) (or project-join normal
form (PJNF)) with respect to a set F of functional, multivalued, and join
dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is,
implied by F), every Ri is a superkey of R
40
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Fifth Normal Form - Example
• (c)The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3)
• (d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, R3
41
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Summary
• The normalization process is used for achieving good designs by testing relations for
undesirable types of problematic functional dependencies
• The treatment of successive normalization is based on a predefined primary key in
each relation
• Second normal form (2NF) and third normal form (3NF) take all candidate keys of a
relation into account
• Using the general definition of 3NF a given relation may be analyzed and
decomposed to eventually yield a set of relations in 3NF
• Boyce-Codd normal form (BCNF) is a stronger form of 3NF
• The fourth normal form is based on multivalued dependencies that typically arise
due to mixing independent multivalued attributes into a single relation
42
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Summary contd.
• The fifth normal form, is based on join dependency, and which identifies a peculiar
constraint that causes a relation to be decomposed into several components so that
they always yield the original relation back after a join
• In practice, most commercial designs have followed the normal forms up to BCNF
• Need for decomposing into 5NF rarely arises in practice, and join dependencies are
difficult to detect for most practical situations, making 5NF more of theoretical value
43
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences