0% found this document useful (0 votes)
39 views

Int 306 Normalization

Normalization in DBMS involves organizing data to minimize redundancy and eliminate anomalies. It divides larger tables into smaller tables and links them using relationships. The document discusses various normal forms including 1NF, 2NF and 3NF. 1NF ensures each attribute contains a single value. 2NF requires non-prime attributes to depend on the whole primary key. 3NF eliminates transitive dependencies to further reduce redundancy.

Uploaded by

sai karthik
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Int 306 Normalization

Normalization in DBMS involves organizing data to minimize redundancy and eliminate anomalies. It divides larger tables into smaller tables and links them using relationships. The document discusses various normal forms including 1NF, 2NF and 3NF. 1NF ensures each attribute contains a single value. 2NF requires non-prime attributes to depend on the whole primary key. 3NF eliminates transitive dependencies to further reduce redundancy.

Uploaded by

sai karthik
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 66

Normalization in DBMS

• Normalization is the process of organizing the


data in the database.
• Normalization is used to minimize the
redundancy from a relation or set of relations. It
is also used to eliminate the undesirable
characteristics like Insertion, Update and
Deletion Anomalies.
• Normalization divides the larger table into the
smaller table and links them using relationship.
• The normal form is used to reduce redundancy
from the database table.
emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004


• Update anomaly: In the above table we have
two rows for employee Rick as he belongs to
two departments of the company. If we want
to update the address of Rick then we have to
update the same in two rows or the data will
become inconsistent. If somehow, the correct
address gets updated in one department but not
in other then as per the database, Rick would
be having two different addresses, which is not
correct and would lead to inconsistent data.
• Insert anomaly: Suppose a new employee
joins the company, who is under training and
currently not assigned to any department then
we would not be able to insert the data into the
table if emp_dept field doesn’t allow nulls.
• Delete anomaly: Suppose, if at a point of time
the company closes the department D890 then
deleting the rows that are having emp_dept as
D890 would also delete the information of
employee Maggie since she is assigned only to
this department.
Types of normal form
First Normal Form (1NF)

• A relation will be 1NF if it contains an atomic


value.
• It states that an attribute of a table cannot
hold multiple values. It must hold only single-
valued attribute.
• First normal form disallows the multi-valued
attribute, composite attribute, and their
combinations.
• There should be no multi-valued attribute
• Student table
• This table is not in first normal form
Roll no Name Course
1 Sai c/c++
2 Harsh JAVA
3 Onkar C/DBMS
How to convert to first normal form
Roll No Name Course
1 Sai C
1 Sai C++
2 Harsh Java
3 Onkar C
3 Onkar DBMS

Primary key= Roll No + Course ---- Composite Key


Second solution
Roll No Name Course1 Course2
1 Sai C C++

2 Harsh Java NULL


3 Onkar C DBMS

Primary Key: Roll No


Third Solution
• Divide the tables into number of tables
Roll No(Primary Key) Name
1 Sai
2 Harsh
3 Onkar

Roll No(Foreign Key) Course


1 C
1 C++
2 JAVA
3 C
3 DBMS
• Closure method: Method to find all candidate keys in a
table
• R(ABCD)
• FD{A->B,B->C,C->D}

• A+ -> ABCD
• B+ -> BCD
• C+ -> CD
• D+ -> D So, candidate key (A)

• Transitive Property
• AB+ -> ABCD
• A candidate key, but AB cannot be
• So, A is only candidate key.
• If you have B with it then it is superkey(B).

• Prime attribute: A
• Non Prime attribute: B,C,D
Another Example
• R(ABCD)
• FD ={A->B,B->C,C->D,D->A}
• A+ -> {ABCD}
• B+ -> {BCDA}
• C+ -> {CDAB}
• D+ -> {ABCD}
• Candidate Key{A,B,C,D}
• Prime Attributes: attributes that help in making
primary key .So, {A,B,C,D} are all prime attributes
Functional Dependency
• Method to describe relationship between attributes
• Like X -> Y --- X determines Y or Y is determined by X
• X- Determinant
• Y- Dependent

• Example: Sid -> Sname


• Like I have Sname – Alok,Alok
• How will I distinguish?
• Trivial FD: X-> Y then Y is subset of X
• Example : Sid -> Sid
• These are always true, as attribute to be
determined is subset.
• Another example: Sid Sname -> Sid. Its
intersection cannot be NULL
• Non trivial dependency
• X-> Y X intersection Y = null
• Sid -> sname
• Sid -> semester
• Sid -> Phone
• Intersection of all these is always NULL
Trivial functional dependency

• A → B has trivial functional dependency if B is a subset of A.


• The following dependencies are also trivial like: A → A, B → B
• Example:
• Consider a table with two columns Employee_Id and Employe
e_Name.  
• {Employee_id, Employee_Name}   →    Employee_Id is a trivial
 functional dependency as   
• Employee_Id is a subset of {Employee_Id, Employee_Name}.  
• Also, Employee_Id → Employee_Id and Employee_Name   →  
  Employee_Name are trivial dependencies too.  
Non-trivial functional dependency

• A → B has a non-trivial functional dependency


if B is not a subset of A.
• When A intersection B is NULL, then A → B is
called as complete non-trivial.
• Example:
• ID   →    Name,  
• Name   →    DOB  
Inference Rule (IR)

• The Armstrong's axioms are the basic inference rule.


• Armstrong's axioms are used to conclude functional
dependencies on a relational database.
• The inference rule is a type of assertion. It can apply to a set
of FD(functional dependency) to derive other FD.
• Using the inference rule, we can derive additional functional
dependency from the initial set.
• The Functional dependency has 6 types of inference rule:
 Reflexive Rule (IR1)

• In the reflexive rule, if Y is a subset of X, then


X determines Y.
• If X ⊇ Y then X  →    Y 
• Any attribute determining itself 
• Example:
• X = {a, b, c, d, e}  
• Y = {a, b, c}  
Augmentation Rule (IR2)

• The augmentation is also called as a partial dependency. In


augmentation, if X determines Y, then XZ determines YZ for
any Z.
• If X    →  Y then XZ   →   YZ   
• Example:
• For R(ABCD),  if A   →   B then AC  →   BC 
• Example
• Sid -> sname
• Sid Phoneno -> Sname Phoneno
 
Transitive Rule (IR3)

• In the transitive rule, if X determines Y and Y


determine Z, then X must also determine Z.
• If X   →   Y and Y  →  Z then X  →   Z    
Union Rule (IR4)

• Union rule says, if X determines Y and X determines Z, then X


must also determine Y and Z.
• If X    →  Y and X   →  Z then X  →    YZ     
• Proof:
• 1. X → Y (given)
2. X → Z (given)
3. X → XY (using IR2 on 1 by augmentation with X. Where XX =
X)
4. XY → YZ (using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
Decomposition Rule (IR5)

• Decomposition rule is also known as project rule. It


is the reverse of union rule.
• This Rule says, if X determines Y and Z, then X
determines Y and X determines Z separately.
• If X   →   YZ then X   →   Y and X  →    Z   
• Proof:
• 1. X → YZ (given)
2. YZ → Y (using IR1 Rule)
3. X → Y (using IR3 on 1 and 2)
Pseudo transitive Rule (IR6)

• In Pseudo transitive Rule, if X determines Y and YZ determines


W, then XZ determines W.
• If X   →   Y and YZ   →   W then XZ   →   W   
• Proof:
• 1. X → Y (given)
2. WY → Z (given)
3. WX → WY (using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

Second Normal Form
• Table must be in first normal form
• There should be no partial dependency
• All non prime attributes should be fully
dependent on candidate key
• Non prime attributes: Attributes that are not
participating in formation of candidate key
Customer ID Store ID Location
1 1 Delhi
1 3 Mumbai
2 1 Delhi
3 2 Banglore
4 3 Mumbai

Prime attribute: CustomerID,StoreID


Non Prime Attribute: Location
Location is determined by store id
Convert to second normal form
• Divide the table
Customer id StoreID
1 1
1 3
2 1
3 2
4 3

Store id location
1 Delhi
2 Banglore
3 mumbai
Second Normal Form (2NF)

• In the 2NF, relational must be in 1NF.


• In the second normal form, all non-key attributes are fully functional
dependent on the primary key
• Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one
subject.
Third Normal Form (3NF)

• A relation will be in 3NF if it is in 2NF and not contain any


transitive partial dependency.
• 3NF is used to reduce the data duplication. It is also used to
achieve the data integrity.
• If there is no transitive dependency for non-prime attributes, then
the relation must be in third normal form.
• A relation is in third normal form if it holds atleast one of the
following conditions for every non-trivial function dependency X
→ Y.
• X is a super key.
• Y is a prime attribute, i.e., each element of Y is part of some
candidate key.
• Super key in the table above:
• {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so o
n  
• Candidate key: {EMP_ID}
• Non-prime attributes: In the given table, all attributes except EMP_ID are
non-prime.
• Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third
normal form.
• That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
Boyce Codd normal form (BCNF)

• BCNF is the advance version of 3NF. It is stricter than


3NF.
• A table is in BCNF if every functional dependency X
→ Y, X is the super key of the table.
• For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.
• Example: Let's assume there is a company where
employees work in more than one department.
• In the above table Functional dependencies
are as follows:
• EMP_ID  →  EMP_COUNTRY  
• EMP_DEPT  →   {DEPT_TYPE, EMP_DEPT_NO}  
• Candidate key: {EMP-ID, EMP-DEPT}
• The table is not in BCNF because neither
EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we
decompose it into three tables:
• Functional dependencies:
• EMP_ID   →    EMP_COUNTRY  
• EMP_DEPT   →   {DEPT_TYPE, EMP_DEPT_NO}  
• Candidate keys:
• For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
• Now, this is in BCNF because left side part of both the
functional dependencies is a key.
Fourth normal form (4NF)

• A relation will be in 4NF if it is in Boyce Codd


normal form and has no multi-valued
dependency.
• For a dependency A → B, if for a single value
of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
So to make the above table into 4NF, we can
decompose it into two tables:
Fifth normal form (5NF)

• A relation is in 5NF if it is in 4NF and not


contains any join dependency and joining
should be lossless.
• 5NF is satisfied when all the tables are broken
into as many tables as possible in order to
avoid redundancy.
• 5NF is also known as Project-join normal form
(PJ/NF).
So to make the above table into 5NF, we can
decompose it into three relations P1, P2 & P3:
Relational Decomposition

• When a relation in the relational model is not in appropriate


normal form then the decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead
to problems like loss of information.
• Decomposition is used to eliminate some of the problems of
bad design like anomalies, inconsistencies, and redundancy.
Types of decomposition
Lossless Decomposition

• If the information is not lost from the relation


that is decomposed, then the decomposition
will be lossless.
• The lossless decomposition guarantees that the
join of relations will result in the same relation
as it was decomposed.
• The relation is said to be lossless decomposition
if natural joins of all the decomposition give the
original relation
The above relation is decomposed into two relations EMPLOYEE
and DEPARTMENT
Now, when these two relations are joined on the common
column "EMP_ID", then the resultant relation will look like:
Employee ⋈ Department

Hence decomposition is looseless join decompostion


Dependency Preserving

• It is an important constraint of the database.


• In the dependency preservation, at least one decomposed
table must satisfy every dependency.
• If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must
be derivable from the combination of functional
dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
Multivalued Dependency

• Multivalued dependency occurs when two


attributes in a table are independent of each
other but, both depend on a third attribute.
• A multivalued dependency consists of at least
two attributes that are dependent on a third
attribute that's why it always requires at least
three attributes.
• In this case, these two columns can be called as
multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown
below:
• BIKE_MODEL   →  →  MANUF_YEAR  
• BIKE_MODEL   →  →  COLOR  
• This can be read as "BIKE_MODEL multidetermined
MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
Join Dependency

• Join decomposition is a further generalization of Multivalued


dependencies.
• If the join of R1 and R2 over C is equal to relation R, then we can say that
a join dependency (JD) exists.
• Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a
given relations R (A, B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R.
• A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a
lossless-join decomposition.
• The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is
equal to the relation R.
• Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on
are a JD of R.
Properties of functional
dependency

You might also like