0% found this document useful (0 votes)
11 views

9-DBMS- Normalization

The document discusses data normalization in database management systems, focusing on functional dependencies and their types, such as trivial and non-trivial dependencies. It outlines the objectives of normalization, including eliminating redundancy and ensuring data integrity, while also detailing the different normal forms (1NF, 2NF, 3NF, BCNF) and their characteristics. Additionally, it explains how to achieve these normal forms through examples and the importance of well-structured relations to avoid anomalies.

Uploaded by

pofomax827
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

9-DBMS- Normalization

The document discusses data normalization in database management systems, focusing on functional dependencies and their types, such as trivial and non-trivial dependencies. It outlines the objectives of normalization, including eliminating redundancy and ensuring data integrity, while also detailing the different normal forms (1NF, 2NF, 3NF, BCNF) and their characteristics. Additionally, it explains how to achieve these normal forms through examples and the importance of well-structured relations to avoid anomalies.

Uploaded by

pofomax827
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

DBMS: Data Normalization

Functional Dependencies
Akhilesh Arya

Akhilesh Deep Arya: 9460508551


Determinant/ Dependent
Functional Dependency

EmpNum  EmpEmail

Attribute on the LHS is known as the determinant and the one on


the RHS is Known as dependent
• EmpNum is a determinant of EmpEmail
•And we can say that EmpEmail is dependent on EmpNum

Akhilesh Deep Arya: 9460508551


Functional Dependency
FD: x  y R.No. Name Subject Marks Dept Course
101 Ajay Python 78 CSE C1
If t1.x = t2.x
102 Amit Java 60 ME C1
Then t1.y = t2.y 103 Abhay CPP 65 CSE C2
104 Ajay C# 78 EEE C3
Check: 105 Aniket DBMS 60 EEE C3
R.No  Name 106 Chanchal Java 92 CSE C2
Name  R.No 107 Chitvan Python 87 ME C4
R.No  Subject 108 Deepak DBMS 65 Civil C5
Dept  Course 109 Pateek TOC 55 CSE C6
Course  Dept 110 Aniket ML 60 ME C4
Marks  Dept
Names  Marks
R.No, Name  Subject
Akhilesh Deep Arya: 9460508551
Cont..
Defination
We say an attribute, B, has a functional dependency on another
attribute, A, if for any two records, which have
the same value for A, then the values for B in these two records must
be the same. We illustrate this as:
A B
Example: Suppose we keep track of employee email addresses, and we
only track one email address for each employee. Suppose each employee is
identified by their unique employee number. We say there is a functional
dependency of email address on employee number:

employee number  email address

Akhilesh Deep Arya: 9460508551


Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:


EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname
must exist.

Akhilesh Deep Arya: 9460508551


Functional Dependencies
EmpNum  EmpEmail
EmpNum  EmpFname 3 different ways
EmpNum  EmpLname you might see FDs
depicted
EmpEmail
EmpNum EmpFname
EmpLname

EmpNum EmpEmail EmpFname EmpLname

Akhilesh Deep Arya: 9460508551


Attribute Closer or Closer Set
• Attribute closure, also known as functional dependency
closure, is a property that determines the set of attributes that
can be determined or inferred from a given set of attributes in
a relational database.
• It is used to identify all the attributes that are functionally
dependent on a particular set of attributes.

Akhilesh Deep Arya: 9460508551


Example:
• For example, consider a database schema with attributes
– (A, B, C, D, E) and
– functional dependencies (A → B, B → C, C → D, D → E).
• The attribute closure of A, denoted as A+, would be {A, B, C, D,
E}, because all attributes B, C, D and E can be determined from
the value of A through the given functional dependencies.
• Find the closer of B and AD?

Akhilesh Deep Arya: 9460508551


Cont..
• Consider a relation R(A, B, C, D, E)
• And functional dependency as fd{A → B, D → E}
– Find the closer of:
• ABCDE
• ABDE
• ACDE
• ACD
• Find candidate key and super key from these closer sets.

Akhilesh Deep Arya: 9460508551


Cont..
• Find all the candidate key in the relation given
– R(A, B, C, D, E, F)
– F = {AB → C, C → DE, E → F, D → A, C → B}

Akhilesh Deep Arya: 9460508551


Trivial functional dependency
• Trivial Functional Dependency: A functional dependency X -> Y
is considered trivial if Y is a subset of X.
Y⊂X
• In other words, Y can be determined directly from X without
any additional information.
• Trivial functional dependencies are not considered interesting
or useful because they do not provide any meaningful
constraint on the data.

Akhilesh Deep Arya: 9460508551


Example:
• FirstName  FirstName is a ID FirstName LastName Age Department

trivial dependency 101 Ajay Kumar 32 CSE

• ID, FirstName  FirstName is 102 Amit Jain 24 ECE

also a trivial dependency as 103 Abhay Singhvi 35 ME

FirstName is subset of ID, 104 Chanchal Ahuja 25 ECE

FirstName 105 Prateek Surana 27 ME

Akhilesh Deep Arya: 9460508551


Non- Trivial functional dependency
• Non-trivial Functional Dependency: A functional dependency X
-> Y is considered non-trivial if Y is not a subset of X.
Y∩X=∅
• In other words, there is nothing common in X and Y.
• In such case we have to check with the table that even the
functional dependency will exist or not.

Akhilesh Deep Arya: 9460508551


Partial dependency
A partial dependency exists when an attribute C is functionally dependent on
an attribute B, and B is a component of a multipart candidate key (AB).

InvNum LineNum Qty InvDate

Candidate keys: {InvNum, LineNum} InvDate is


partially dependent on {InvNum, LineNum} as
InvNum is a determinant of InvDate and InvNum is
part of a candidate key
Akhilesh Deep Arya: 9460508551
Transitive dependency
Transitive dependency
Consider attributes A, B, and C, and where
A  B and B  C.
Functional dependencies are transitive, which means that we also
have the functional dependency A  C
We say that C is transitively dependent on A through B.

Akhilesh Deep Arya: 9460508551


Transitive dependency
EmpNum  DeptNum

EmpNum EmpEmail DeptNum DeptNname

DeptNum  DeptName

EmpNum EmpEmail DeptNum DeptNname

DeptName is transitively dependent on EmpNum via DeptNum


EmpNum  DeptName

Akhilesh Deep Arya: 9460508551


The main objectives of normalization are:

• Eliminating redundancy: Redundancy refers to the repetition of data


in a database. By normalizing a database, redundant data is
eliminated, and data is stored in a structured and efficient manner.
This helps in reducing storage space and improves data consistency.

• Eliminating data anomalies: Data anomalies are inconsistencies or


irregularities that may occur in a database due to redundant or
poorly structured data. Normalization helps to eliminate these
anomalies, such as insertion anomalies, deletion anomalies, and
update anomalies, by organizing the data in a systematic way.

Akhilesh Deep Arya: 9460508551


• Ensuring data integrity: Data integrity refers to the accuracy, consistency,
and reliability of data in a database. Normalization helps in maintaining
data integrity by ensuring that data is stored in a consistent and non-
redundant manner, and by establishing relationships between tables using
primary keys and foreign keys.

• Improving performance: Normalized databases are usually more efficient


in terms of storage and retrieval of data compared to denormalized
databases. Normalization helps in improving the performance of a
database by reducing the amount of redundant data, optimizing queries,
and simplifying database operations.

• Facilitating database maintenance and scalability: Normalization makes it


easier to maintain and update a database, as changes to data only need to
be made in one place. It also facilitates scalability, as new data can be
added without disrupting the existing structure
Akhilesh Deep Arya: 9460508551
Well-Structured Relations
• A relation that contains minimal data redundancy and allows users
to insert, delete, and update rows without causing data
inconsistencies
• Goal is to avoid anomalies
– Insertion Anomaly – adding new rows forces user to create duplicate data
– Deletion Anomaly – deleting rows may cause a loss of data that would be
needed for other future rows
– Modification Anomaly – changing data in a row forces changes to other
rows because of duplication

Akhilesh Deep Arya: 9460508551


Example – Figure 5.2b

Question – Is this a relation? Answer – Yes: unique rows and no multivalued


attributes
Question – What’s the primary key? Answer – Composite: Emp_ID, Course_Title

Akhilesh Deep Arya: 9460508551


Anomalies in this Table
• Insertion – can’t enter a new employee without
having the employee take a class
• Deletion – if we remove employee 140, we lose
information about the existence of a Tax Acc class
• Modification – giving a salary increase to employee
100 forces us to update multiple records

Why do these anomalies exist?


Because we’ve combined two themes (entity types)
into one relation. This results in duplication, and an
unnecessary dependency between the entities
Akhilesh Deep Arya: 9460508551
Normalization Types
We discuss four normal forms: first, second, third, and Boyce-Codd normal
forms
1NF, 2NF, 3NF, and BCNF

Normalization is a process that “improves” a database design by generating


relations that are of higher normal forms.

The objective of normalization:


“to create relations where every dependency is on the key, the whole key, and
nothing but the key”.

Akhilesh Deep Arya: 9460508551


Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest

Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.

Akhilesh Deep Arya: 9460508551


Normalization

1NF a relation in BCNF, is also


in 3NF
2NF a relation in 3NF is also in
2NF
3NF
a relation in 2NF is also in
1NF
BCNF

Akhilesh Deep Arya: 9460508551


Normalization
•We consider a relation in BCNF to be fully normalized.

•The benefit of higher normal forms is that update semantics for the affected data
are simplified.

•This means that applications required to maintain the database are simpler.

•A design that has a lower normal form than another design has more redundancy.
Uncontrolled redundancy can lead to data integrity problems.

Akhilesh Deep Arya: 9460508551


Steps in normalization

Akhilesh Deep Arya: 9460508551


First Normal Form
The key characteristics of a table in 1NF are:

No duplicate rows: Each row in the table must be unique. This is achieved
by having a primary key column that uniquely identifies each row in the
table. No two rows should be identical.

Atomic values: Each column in a table should contain only atomic values,
which are indivisible and cannot be further broken down. This means that a
column should not contain multiple values or arrays of values. If a column
contains multiple values, it should be split into separate columns.

Composite attributes: Each column in a table should contain non


composite attributes only

Akhilesh Deep Arya: 9460508551


First Normal Form
The following in not in 1NF

EmpNum EmpPhone EmpDegrees


123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

EmpDegrees is a multi-valued field:


employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD

Akhilesh Deep Arya: 9460508551


First Normal Form
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

To obtain 1NF relations we must, without loss of


information, replace the above with two relations.

Akhilesh Deep Arya: 9460508551


First Normal Form
EmployeeDegree
Employee
EmpNum EmpDegree
EmpNum EmpPhone
333 BA
123 233-9876
333 BSc
333 233-1231
333 PhD
679 233-1231
679 BSc
679 MSc

An outer join between Employee and EmployeeDegree will


produce the information we saw before

Akhilesh Deep Arya: 9460508551


Second Normal Form
• 1NF plus every non-key attribute is fully functionally
dependent on the ENTIRE primary key
– Every non-key attribute must be defined by the entire
key, not by only part of the key
– No partial functional dependencies

Akhilesh Deep Arya: 9460508551


– Functional Dependencies in EMPLOYEE2

Dependency on entire primary key

EmpID CourseTitle Name DeptName Salary DateCompleted

Dependency on only part of the key

EmpID, CourseTitle  DateCompleted


EmpID  Name, DeptName, Salary

Therefore, NOT in 2nd Normal Form!!


Akhilesh Deep Arya: 9460508551
Getting it into 2nd Normal Form
• Decomposed into two separate relations

Both are full


functional
EmpID Name DeptName Salary
dependencies

EmpID CourseTitle DateCompleted

Akhilesh Deep Arya: 9460508551


Second Normal Form
Consider this Project table (in 1NF):
Ecode ProjCode Dept Hours
Ecode, ProjCode Hours There are two candidate keys.
Ecode Dept

Project is not 2NF since there is a partial


dependency of Dept on Ecode

Akhilesh Deep Arya: 9460508551


Second Normal Form
Project
Ecode ProjCode Dept Hours
• The department of a particular employee cannot be
recorded until the employee is assigned a project
• If an employee is shifted to another dept this information
should be recorded at multiple instances
• If the employee completes work on the project his/her
record will be deleted the information regarding the
department the employee belongs to will also be lost
Ecode ProjCode Hours

Ecode Dept
Akhilesh Deep Arya: 9460508551
Third Normal Form
Third Normal Form
• A relation is in 3NF if the relation is in 2NF and all determinants of non-
key attributes are candidate keys
That is, for any functional dependency: X  Y, where Y is a non-key
attribute (or a set of non-key attributes), X is a candidate key.
• This definition of 3NF differs from BCNF only in the specification of
non-key attributes - 3NF is weaker than BCNF. (BCNF requires all
determinants to be candidate keys.)
• A relation in 3NF will not have any transitive dependencies
of non-key attribute on a candidate key through another non-key attribute.

Akhilesh Deep Arya: 9460508551


Figure 5-24 -- Relation with transitive dependency

(a) SALES relation with simple data

Akhilesh Deep Arya: 9460508551


Figure 5-24(b) Relation with transitive dependency

CustID  Name
CustID  Salesperson BUT
CustID  Region
CustID  Salesperson  Region
All this is OK Transitive dependency
(2nd NF) (not 3rd NF)
Akhilesh Deep Arya: 9460508551
Figure 5.25 -- Removing a transitive dependency

(a) Decomposing the SALES relation

Akhilesh Deep Arya: 9460508551


Figure 5.25(b) Relations in 3NF

Salesperson  Region

CustID  Name
CustID  Salesperson

Now, there are no transitive dependencies…


Both relations are in 3rd NF
Akhilesh Deep Arya: 9460508551
Third Normal Form
Consider this Employee relation Candidate keys
are? …

EmpNum EmpName DeptNum DeptName

EmpName, DeptNum, and DeptName are non-key attributes.


DeptNum determines DeptName, a non-key attribute, and
DeptNum is not a candidate key.

Is the relation in 3NF? … no Is the relation in BCNF? … no

Akhilesh Deep Arya: 9460508551


Third Normal Form
EmpNum EmpName DeptNum DeptName

We correct the situation by decomposing the original relation


into two 3NF relations. Note the decomposition is lossless.

EmpNum EmpName DeptNum DeptNum DeptName

Verify these two relations are in 3NF.

Akhilesh Deep Arya: 9460508551


Other Normal Forms
• Boyce-Codd NF
– All determinants are candidate keys…there is no determinant
that is not a unique identifier
• 4th NF
– No multivalued dependencies
• 5th NF
– No “lossless joins”
• Domain-key NF
– The “ultimate” NF…perfect elimination of all possible
anomalies

Akhilesh Deep Arya: 9460508551


Boyce-Codd Normal Form (BCNF)
– A table is in Boyce-Codd normal form (BCNF) if every determinant
in the table is a candidate key.

(A determinant is any attribute whose value


determines other values with a row.)
– If a table contains only one candidate key, the 3NF and the BCNF
are equivalent.
– BCNF is a special case of 3NF.

Akhilesh Deep Arya: 9460508551


A Table That Is In 3NF
But Not In BCNF

Akhilesh Deep Arya: 9460508551


The Decomposition of a Table Structure
to Meet BCNF Requirements

Akhilesh Deep Arya: 9460508551


In 3NF, but not in BCNF:

Instructor teaches one


course only.
student_no course_no instr_no
Student takes a course
and has one instructor.

{student_no, course_no}  instr_no


instr_no  course_no

since we have instr_no  course-no, but instr_no is not a


Candidate key.

Akhilesh Deep Arya: 9460508551


student_no course_no instr_no

student_no instr_no

course_no instr_no

{student_no, instr_no}  student_no


{student_no, instr_no}  instr_no
instr_no  course_no
Akhilesh Deep Arya: 9460508551
Sample Data for a BCNF Conversion

Akhilesh Deep Arya: 9460508551


Decomposition into BCNF

Akhilesh Deep Arya: 9460508551


Denormalization

• Normalization is only one of many database design goals.


• Normalized (decomposed) tables require additional processing,
reducing system speed.
• Normalization purity is often difficult to sustain in the modern
database environment. The conflict between design efficiency,
information requirements, and processing speed are often
resolved through compromises that include denormalization.

Akhilesh Deep Arya: 9460508551

You might also like