DBMS_UNIT4
DBMS_UNIT4
Data redundancy in databases refers to the unnecessary duplication of data. It can arise from poor
database design or lack of proper normalization. Redundancy can cause several issues:
1. Wasted Storage
2. Data Anomalies
Update Anomalies: When you have the same piece of data stored in multiple places,
updating it in one place can lead to inconsistency if it's not updated everywhere.
Insertion Anomalies: You might have to insert redundant data in multiple places, leading to
inconsistencies.
Deletion Anomalies: Deleting data in one table might unintentionally remove necessary data
that's needed elsewhere.
3. Increased Complexity
4. Performance Issues
Duplicate data can slow down search, update, and insert operations.
Decomposition In DBMS
Decomposition refers to the division of tables into multiple tables to produce consistency in the data.
In this article, we will learn about the Database concept. This article is related to the concept of
Decomposition in DBMS. It explains the definition of Decomposition, types of Decomposition in
DBMS, and its properties.
When we divide a table into multiple tables or divide a relation into multiple relations, then this
process is termed Decomposition in DBMS. We perform decomposition in DBMS when we want to
process a particular data set. It is performed in a database management system when we need to
ensure consistency and remove anomalies and duplicate data present in the database. When we
perform decomposition in DBMS, we must try to ensure that no information or data is lost.
Decomposition in DBMS
Types of Decomposition
Lossless Decomposition
Lossy Decomposition
Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of joins from the
multiple relations formed after decomposition. This process is termed as lossless decomposition. It is
used to remove the redundant data from the database while retaining the useful information. The
lossless decomposition tries to ensure following things:
If we perform join operation on the sub-divided relations, we must get the original relation.
Example:
A B C
55 16 27
48 52 89
R1(A, B)
A B
55 16
48 52
R2(B, C)
B C
16 27
52 89
After performing the Join operation we get the same original relation
A B C
55 16 27
48 52 89
Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation on the sub-
relations it doesn't result to the same relation which was decomposed. After the join operation, we
always found some extraneous tuples. These extra tuples genrates difficulty for the user to identify
the original tuples.
Example:
A B C
1 2 1
2 5 3
3 3 3
R1(A, B)
A B
1 2
2 5
3 3
R2(B, C)
B C
2 1
5 3
3 3
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
Properties of Decomposition
Lossless: All the decomposition that we perform in Database management system should be
lossless. All the information should not be lost while performing the join on the sub-relation
to get back the original relation. It helps to remove the redundant data from the database.
1. Loss of Information
Non-loss decomposition: When a relation is decomposed into two or more smaller relations,
and the original relation can be perfectly reconstructed by taking the natural join of the
decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy
decomposition."
Once tables are decomposed, certain functional dependencies might not be preserved,
which can lead to the inability to enforce specific integrity constraints.
Example: If you have the functional dependency `A → B` in the original table, but in the
decomposed tables, there is no table with both `A` and `B`, this functional dependency can't
be preserved.
3. Increased Complexity
Decomposition leads to an increase in the number of tables, which can complicate queries
and maintenance tasks. While tools and ORM (Object-Relational Mapping) libraries can
mitigate this to some extent, it still adds complexity.
4. Redundancy
Incorrect decomposition might not eliminate redundancy, and in some cases, can even
introduce new redundancies.
5. Performance Overhead
An increased number of tables, while aiding normalization, can also lead to more complex
SQL queries involving multiple joins, which can introduce performance overheads.
Functional Dependency
A functional dependency occurs when one attribute uniquely determines another attribute within a
relation. It is a constraint that describes how attributes in a table relate to each other. If attribute A
functionally determines attribute B we write this as the A→B.
Functional dependencies are used to mathematically express relations among database entities and
are very important to understanding advanced concepts in Relational Database Systems.
Example:
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of
fields name, dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
name → dept_name Students with the same name can have different dept_name, hence
this is not a valid functional dependency.
Example:
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial
functional dependency.
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e.
If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example:
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is called
a multivalued functional dependency.
For example,
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
For example,
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
enrol_no name dept building_no
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.
In partial functional dependency a non key attribute depends on a part of the composite key, rather
than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is
non key attribute. Then X->Z is a partial functional dependency in RBDMS.
Functional dependencies having numerous applications in the field of database management system.
Here are some applications listed below:
1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize redundancy
and increase data integrity. Functional dependencies play an important part in data normalization.
With the help of functional dependencies we are able to identify the primary key, candidate key in a
table which in turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the tables
and the necessary attributes need to be projected to retrieve the required data from the tables. This
helps in query optimization and improves performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies or
inconsistencies that may exist in the data. Functional dependency ensures that the changes made in
one attribute does not affect inconsistency in another set of attributes thus it maintains the
consistency of the data in database.
Functional dependencies ensure that the data in the database to be accurate, complete and
updated. This helps to improve the overall quality of the data, as well as it eliminates errors and
inaccuracies that might occur during data analysis and decision making, thus functional dependency
helps in improving the quality of data in database.
First Normal Form (1NF)
o It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
EMPLOYEE table:
7272826385,
14 John UP
9064738238
7390372389,
12 Sam Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
o In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper
subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-
trivial function dependency X → Y.
1. X is a super key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It
violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
o A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
Advertisement
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence,
there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and
two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
Example
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be
taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a
primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Decomposition in DBMS.
When we divide a table into multiple tables or divide a relation into multiple relations, then this
process is termed Decomposition in DBMS. We perform decomposition in DBMS when we want to
process a particular data set. It is performed in a database management system when we need to
ensure consistency and remove anomalies and duplicate data present in the database. When we
perform decomposition in DBMS, we must try to ensure that no information or data is lost.
Decomposition in DBMS
Types of Decomposition
Lossless Decomposition
Lossy Decomposition
Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of joins from the
multiple relations formed after decomposition. This process is termed as lossless decomposition. It
is used to remove the redundant data from the database while retaining the useful information.
The lossless decomposition tries to ensure following things:
If we perform join operation on the sub-divided relations, we must get the original
relation.
Example:
A B C
55 16 27
48 52 89
Now we decompose this relation into two sub relations R1 and R2
R1(A, B)
A B
55 16
48 52
R2(B, C)
B C
16 27
52 89
After performing the Join operation we get the same original relation
A B C
55 16 27
48 52 89
Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation on the sub-
relations it doesn't result to the same relation which was decomposed. After the join operation,
we always found some extraneous tuples. These extra tuples genrates difficulty for the user to
identify the original tuples.
Example:
A B C
1 2 1
2 5 3
3 3 3
R1(A, B)
A B
1 2
2 5
3 3
R2(B, C)
B C
2 1
5 3
3 3
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
Properties of Decomposition
Lossless: All the decomposition that we perform in Database management system should
be lossless. All the information should not be lost while performing the join on the sub-
relation to get back the original relation. It helps to remove the redundant data from the
database.
Dependency Preservation: Dependency Preservation is an important technique in database
management system. It ensures that the functional dependencies between the entities is
maintained while performing decomposition. It helps to improve the database efficiency,
maintain consistency and integrity.
Data Redundancy is generally termed as duplicate data or repeated data. This property
states that the decomposition performed should not suffer redundant data. It will help us
to get rid of unwanted data and focus only on the useful data or information.