Normalization in DBMS
Normalization in DBMS
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.
The main reason for normalizing the relations is removing these anomalies. Failure
to eliminate anomalies leads to data redundancy and can cause data integrity and
other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
Advantages of Normalization
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher
degree.
o Careless decomposition may lead to a bad database design, leading to
serious problems.
Example
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
SECOND NORMAL FORM:
To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the table.
Example
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2NF tries to reduce the redundant data getting stored in memory. For instance, if
there are 100 students taking C1 course, we don’t need to store its Fee as 1000 for
all the 100 records, instead, once we can store it in the second table as the course
fee for C1 is 1000.
THIRD NORMAL FORM
Transitive dependency – If A->B and B->C are two FDs then A->C is called
transitive dependency.
Example:
A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is
super key. A relation is in BCNF iff in every non-trivial functional dependency X –
> Y, X is a super key.
Example:
We find that in the above Student_detail relation, Stud_ID is the key and only
prime key attribute. We find that City can be identified by Stud_ID as well as Zip
itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stud_ID
→ Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two
relations as follows –
In the above relationships Stud_id is the super key in the relation Student_detail
and Zip is the Super key in the relation ZipCodes.
So
And
Zip - City
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
Example:
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and
HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is
a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of
data.
So to make the above table into 4NF, we can decompose it into two tables:
o A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example:
o In the above table, John takes both Computer and Math class for Semester 1
but he doesn't take Math class for Semester 2. In this case, combination of
all these fields required to identify a valid data.
o Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject
as NULL. But all three columns together acts as a primary key, so we can't
leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
P3={Semester, Lecturer}
****************