Join Dependency in Database

Join Dependency means re-creating the original Table by joining multiple sub-tables of the given Table. It is a further generalization of MVD(multi-valued Dependencies).

When a relation R can be obtained by joining the R1, R2, R3..., Rn where R1, R2, R3..., Rn are sub-relations of R, it is called a Join Dependency.
R1,R2,R3...Rn are the sub-relations composed or derived from the relation R.

Mathematical Representation of Join Dependency

R=(R1 ⨝ R2 ⨝ R3 ⨝ .........Rn) where R1,R2,R3.....Rn are sub-relation of R and ⨝ is Natural Join Operator.

Natural join is used to join the table based on common columns present in both columns.
when all the sub-relations do the natural join and the obtained table is equal to the original table then it is called a non-loss decomposition (Loss-less decomposition).
Join dependency is used for the loss-less decomposition of the Data.

Example

Let's take an example where we have a table R as given below. It has 3 attributes I. e. X, Y, and Z as shown below.

Table R

X	Y	Z
x1	y1	z2
x1	y2	z1
x2	y1	z1
x1	y1	z1

So in the above table, all three attributes are a part of the Primary key. Now let us decompose the given table into R1 which has 2 columns i.e. X and Y and R2 which has 2 columns i.e. Y and Z.

Table R1

X	Y
x1	y1
x1	y2
x2	y1

After composing table R1, the repeated column I.e. (x1, y1) is removed from table R1 resulting in no redundancy in table R1.

All three columns present in R1 indicate all values from the R without repeating the redundant tuples.

Table R2

Y	Z
y1	z2
y2	z1
y1	z1

Similarly, in R2 all values of attributes Y and Z are represented in Table R2 and repeated tuple i.e. (y1, z1) is shown only once in the table. This results in no redundancy in table R2.

Now let's combine R1 and R2 tables using Natural Join.

Table (R1 ⨝ R2)

X	Y	Z
x1	y1	z2
x1	y1	z1
x1	y2	z1
x2	y1	z2
x2	y1	z1

After composing the R1 and R2 which are made by R only, we noticed that the newly formed table has one extra tuple i.e. 4th entry in the above table (x2 y1 z2) which is not present in the original Table. So this indicates the inconsistency in the data.

This results in the lossless decomposition of the table. This happened because both R1 and R2 have column Y as a common column but if we see the data in column Y, they are not unique. That's why after combining both R1 and R2 we got unnecessary tuples which are also known as Spurious Tuple.

To avoid this type of spurious tuple, we have to create another table R3 which will include columns X and Z.

Table R3

X	Z
x1	z2
x1	z1
x2	z1

Here in table R3, the tuples in the original table which can be repeated in the R3 table are avoided and shown only once.

Now if we apply composition(i.e. Natural Join ) on tables R1 , R2, and R3 we will get

Table (R1 ⨝ R2⨝ R3)

X	Y	Z
x1	y1	z1
x1	y1	z2
x1	y2	z1
x2	y1	z1

The original table and the above-obtained table have the same data. The additional tuple which was created due to the natural join of R1 and R2 is removed in the natural join of R1, R2, and R3.

In the above example,

(R1⨝R2⨝R3)=R holds true.

This is how decomposition can be performed in the table and join dependency can be checked.

Use of Join Dependency

Join dependency helped in reducing the redundant data.
It ensures data integrity and consistency as no spurious tuples get generated
It makes the databases normalized
It is used in the 5th normalization form to maintain the data consistency and normalization in the database.

Limitations of Join Dependency

Complexity: The database becomes more and more complex as we have to create multiple sub-tables/sub-relations which leads to an increase in the number of tables/relations
Cost: For large databases, the decomposition can be quite costly
Reduced Performance: Due to the large number of tables, query performance will be reduced as it will take time to traverse through multiple tables/relations present in the database.