0% found this document useful (0 votes)
18 views

Normalisation Formated Unit 4

Uploaded by

rudraghankute07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Normalisation Formated Unit 4

Uploaded by

rudraghankute07
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Advance Database

BCA III SEM SECOND YEAR

SEMESTER - III MODULE - 4

INTRODUCTION TO DBMS
MIT-ADT UNIVERSITY
[email protected], [email protected]

Compiled
BY

Dr Ashwin Tomar
[MCA. PhD(Computer Science.), MBA]

For MSc(CA) Students Page 1


Advance Database

SYLLABUS
Model 3. (5 Hrs) Entity-Relationship model: Basic concepts, Design
process, constraints, Keys, Design issues, E-R diagrams, weak entity
sets, extended E-R features – generalization, specialization,
aggregation, reduction to E-R database schema

Model 4. (5 Hrs) Relational Database design :Functional Dependency


– definition, trivial and non- trivial FD, closure of FD set, closure of
attributes, irreducible set of FD, Normalization – 1Nf, 2NF, 3NF,
Decomposition using FD- dependency preservation, BCNF,
Multi- valued dependency, 4NF, Join dependency and 5NF

Relational Database design 3 Hours

4.1 Database Design – ER to Relational

4.2 Functional dependencies

4.3 Normalization Normal forms based on primary keys

(1 NF, 2 NF, 3 NF, BCNF, 4 NF, 5 NF)

4.4 Loss less joins and dependency preserving decomposition

For MSc(CA) Students Page 2


Advance Database

4.1 Reduction of ER diagram to Table


The database can be represented using the notations, and these notations can be reduced to a
collection of tables.

In the database, every entity set or relationship set can be represented in tabular form.

The ER diagram is given below:

There are some points for converting the ER diagram to the table:

o Entity type becomes a table.

In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms individual
tables.

o All single-valued attribute becomes a column for the table.

In the STUDENT entity, STUDENT_NAME and STUDENT_ID form the column of


STUDENT table. Similarly, COURSE_NAME and COURSE_ID form the column of
COURSE table and so on.

For MSc(CA) Students Page 3


Advance Database

o A key attribute of the entity type represented by the primary key.

In the given ER diagram, COURSE_ID, STUDENT_ID, SUBJECT_ID, and LECTURE_ID


are the key attribute of the entity.

o The multivalued attribute is represented by a separate table.

In the student table, a hobby is a multivalued attribute. So it is not possible to represent


multiple values in a single column of STUDENT table. Hence we create a table
STUD_HOBBY with column name STUDENT_ID and HOBBY. Using both the column, we
create a composite key.

o Composite attribute represented by components.

In the given ER diagram, student address is a composite attribute. It contains CITY, PIN,
DOOR#, STREET, and STATE. In the STUDENT table, these attributes can merge as an
individual column.

o Derived attributes are not considered in the table.

In the STUDENT table, Age is the derived attribute. It can be calculated at any point of time
by calculating the difference between current date and Date of Birth.

Using these rules, you can convert the ER diagram to tables and columns and assign the
mapping between the tables. Table structure for the given ER diagram is as below:

For MSc(CA) Students Page 4


Advance Database

Figure: Table structure

4.2 Functional Dependency


The functional dependency is a relationship that exists between two attributes. It
typically exists between the primary key and non-key attribute within a table.

X → Y

Determinant(left side) → Dependent (right side)

Dependent is FD on Determinant

Notation: we say “ x is functionally determines Y “ .

We can also say that “ Y is functionally determined by A” i.e “Y is FD on X”.

The left side of FD is known as a determinant, the right side of the production is known
as a dependent.

Functional Dependency(FD): A functionally is an association between two attributes


of the same table/relation. One is called determinant and other as dependent.

An attribute is said to be functionally dependent on another attribute in the table if it can


take only one value for a given value of the attribute upon which is it is functionally
dependent.

Supplier Table (SP)

Sno Sname Status City


S1 Ajay 11 Delhi
S2 Mayur 12 Guargaon
S3 Rahul 13 Delhi

One value of determinant is associated with one and only one value of the determind so
after combining all one can write

SP.Sno SP(Sname, City, Status)

One can draw Dependency diagram or FD diagram based on the relation.

Fully Functional Dependent (FFD):

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

For MSc(CA) Students Page 5


Advance Database

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency

1. Trivial functional dependency


o A → B has trivial functional dependency if B is a subset of A. A
o The following dependencies are also trivial like: A → A, B → B B

Example:

Consider a table with two columns Employee_Id and Employee_Name.

{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as


Employee_Id is a subset of {Employee_Id, Employee_Name}.

Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial


dependencies too.

2. Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,
Name → DOB A
B

For MSc(CA) Students Page 6


Advance Database

Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
o Normalisation is 2 step process
i) data is put in tabular form by removing repeating groups
ii) duplicate data is removed from relational tables.

Normalisation is defined as process of decomposing the redundant


schemas by breaking up their attributes into smaller relation schemas
that posses desirable properties. The method of splitting is known as
Projections.

Goal is to have only primary keys on left hand side of a functional


dependency.

o Decomposition and joining of tables should be lossless.


o Entire process of normalisation is based on the analysis of relations, their
schemas, their primary keys and their functional dependencies.
o The process of Normalisation was proposed by Dr E F Codd. A relation is said to
be in particular normal forms if it satisfies a certain specifies set of constraints.

Objectives of Normalisation:

o i) To create a formal framework for analyzing relation schemas based on their


keys.
o ii) To obtain powerful relational retrieval algorithim based on a collection of
primitive relational operators.
o iii) To free relations from undesirable insertion, update, delete anomalies
o iv) to reduce the need for restricting the relations as new data types are
introduced

For MSc(CA) Students Page 7


Advance Database

o v) To carry out a series of test on individual relation schema so that the


relational database can be normalised to some degree.

Types of Normal Forms


There are the 5 types of normal forms:

There is 4NF & 5NF forms

Un-Normalised Tables : Tables entries that have more than one value are
called as multivalue entries. Such tables with multivalue entries are known as
un-normalised tables.
 Normalisation: Normalisation is 2 step process i) data is put in tabular form
by removing repeating groups ii) duplicate data is removed from relational
tables.
 Normalisation is defined as process of decomposing the redundant schemas by
breaking up their attributes into smaller relation schemas that posses desirable
properties. The method of splitting is known as Projections.
 Goal is to have only primary keys on left hand side of a functional dependency.

For MSc(CA) Students Page 8


Advance Database

Normal
Description
Form

A relation is in 1NF if it contains an atomic value


1NF (Single/ indivisible/no duplication).

A relation (Eg R) will be in 2NF if it is in 1NF and all non-key


2NF attributes (Say A) are fully functional dependent on the primary
key.

A relation will be in 3NF if it is in 2NF (fully functionally


dependent on primary key) and no (non key attributes is) transition
dependency exists (on primary key).
3NF i.e No non-prime attribute is transitively dependent on the key or
no non prime attribute functionally determines any other non prime
attribute.

A relation will be in BCNF if it is in 3NF if and only if for


BCNF determinant should be candidate key.

A relation will be in 4NF if it is in BCNF/3NF and contains/ has no


4NF multi-valued dependency (MVD)

A relation is in 5NF if it is in 4NF and not contains any join


5NF dependency (JD) and joining should be lossless.

In as all above normal forms is non-loss decomposed. It was


possible by decomposing a single table into 2 separate tables. It was
possible because of availability of join operator in this relational
model. But in 5NF we try to achieve non-loss decomposition only if
we try to decompose it into three or more separate tables. But this
decomposition is not always possible.

For MSc(CA) Students Page 9


Advance Database

First Normal Form (1NF) ----Atomic value


 A relation will be 1NF if it contains an atomic value.
 It states that an attribute of a table cannot hold multiple values. It must
hold only single-valued attribute.
 First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued


attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

For MSc(CA) Students Page 10


Advance Database

Second Normal Form (2NF) --- primary key


 In the 2NF, relational must be in 1NF.
 In the second normal form, all non-key attributes are fully functional
dependent on the primary key

Example: A school can store the data of teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.

TEACHER table:

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

For MSc(CA) Students Page 11


Advance Database

Third Normal Form (3NF) -- Transitive Dependency


 A relation will be in 3NF if it is in 2NF and not contain any transitive
partial dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the
data integrity.
 If there is no transitive dependency for non-prime attributes, then the
relation must be in third normal form.

A relation is in third normal form if it holds at least one of the following


conditions for every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate
key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:

 {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on


example of surper key set
 Candidate key: {EMP_ID}
 Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
 Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID.

The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on


super key (EMP_ID). It violates the rule of third normal form.

For MSc(CA) Students Page 12


Advance Database

3. That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

For MSc(CA) Students Page 13


Advance Database

Different Keys Used:


Primary Key:
It is a candidate key that is chosen by the database designer to identify entities with in an
entity set.

E.g: In the ER diagram primary key is represented by underlining the primary key attribute.
Ideally a primary key is composed of only a single attribute. But it is possible to have a
primary key composed of more than one attribute.

Candidate Keys
Candidate Keys are super keys for which no proper subset is a super key. In other words
candidate keys are minimal super keys.

Super Keys: Super key stands for superset of a key.


A Super Key is a set of one or more attributes that are taken collectively and can identify all
other attributes uniquely.

Composite Key:
Composite key consists of more than one attributes.

Example: Consider a Relation or Table R1. Let A,B,C,D,E are the attributes of this relation.

R(A,B,C,D,E)
A→BCDE This means the attribute 'A' uniquely determines the other attributes B,C,D,E.
BC→ADE This means the attributes 'BC' jointly determines all the other attributes A,D,E in
the relation.
Composite Key

Super Key

Candidate key

Primary key

For MSc(CA) Students Page 14


Advance Database

Examples:
Super key
Super Key in DBMS: A super key is a set of one or more attributes (columns), which can
uniquely identify a row in a table.

How candidate key is different from super key?


Candidate keys are selected from the set of super keys, the only thing we take care while
selecting candidate key is: It should not have any redundant attribute. That’s the reason they
are also termed as minimal super key.

Table: Employee

Emp_SSN Emp_Number Emp_Name


123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert

Super keys: The above table has following super keys. All of the following sets of super key
are able to uniquely identify a row of the employee table.

 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}

Candidate Keys: A candidate key is a minimal super key with no redundant attributes. The
following two set of super keys are chosen from the above sets as there are no redundant
attributes in these sets.

 {Emp_SSN}
 {Emp_Number}

Only these two sets are candidate keys as all other sets are having redundant attributes that
are not necessary for unique identification.

For MSc(CA) Students Page 15


Advance Database

Super key vs Candidate Key


1. First you have to understand that all the candidate keys are super keys. This is because the
candidate keys are chosen out of the super keys.

2. How we choose candidate keys from the set of super keys?

In the above example, we have not chosen {Emp_SSN, Emp_Name} as candidate key
because {Emp_SSN} alone can identify a unique row in the table and Emp_Name is
redundant.

Primary key:
A Primary key is selected from a set of candidate keys. This is done by database admin or
database designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a
primary key for the table Employee.

Candidate Key in DBMS


Definition of Candidate Key in DBMS: A super key with no redundant attribute is known
as candidate key. Candidate keys are selected from the set of super keys, the only thing we
take care while selecting candidate key is that the candidate key should not have any
redundant attributes. That’s the reason they are also termed as minimal super key.

Candidate Key Example


Table “Employee” has three attributes: Emp_Id, Emp_Number & Emp_Name.

Here Emp_Id & Emp_Number will be having unique values and Emp_Name can have
duplicate values as more than one employees can have same name.

Emp_Id Emp_Number Emp_Name


------------------------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

How many super keys the above table can have?


1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}
5. {Emp_Id, Emp_Number, Emp_Name}
6. {Emp_Number, Emp_Name}

Lets select the candidate keys from the above set of super keys.

For MSc(CA) Students Page 16


Advance Database

1. {Emp_Id} – No redundant attributes


2. {Emp_Number} – No redundant attributes
3. {Emp_Id, Emp_Number} – Redundant attribute. Either of those attributes can be a
minimal super key as both of these columns have unique values.
4. {Emp_Id, Emp_Name} – Redundant attribute Emp_Name.
5. {Emp_Id, Emp_Number, Emp_Name} – Redundant attributes. Emp_Id or Emp_Number
alone are sufficient enough to uniquely identify a row of Employee table.
6. {Emp_Number, Emp_Name} – Redundant attribute Emp_Name.

The candidate keys we have selected are:


{Emp_Id}
{Emp_Number}

Note: A primary key is selected from the set of candidate keys. That means we can either
have Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database
administrator)

Foreign key in DBMS


Definition: Foreign keys are the columns of a table that points to the primary key of another
table. They act as a cross-reference between tables.

For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.

Course_enrollment table:

Course_Id Stu_Id
C01 101
C02 102
C03 101
C05 102
C06 103
C07 102

Student table:

Stu_Id Stu_Name Stu_Age


101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21

For MSc(CA) Students Page 17


Advance Database

Note: Practically, the foreign key has nothing to do with the primary key tag of another table,
if it points to a unique column (not necessarily a primary key) of another table then too, it
would be a foreign key. So, a correct definition of foreign key would be: Foreign keys are the
columns of a table that points to the candidate key of another table.

Composite key in DBMS


Definition of Composite key: A key that has more than one attributes is known as composite
key. It is also known as compound key.

Note: Any key such as super key, primary key, candidate key etc. can be called composite
key if it has more than one attributes.

Composite key Example


Consider a table Sales. This table has four columns (attributes) – cust_Id, order_Id,
product_code & product_count.

Table – Sales

cust_Id order_Id product_code product_count


-----------------------------------------
C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42

None of these columns alone can play a role of key in this table.

Column cust_Id alone cannot become a key as a same customer can place multiple orders,
thus the same customer can have multiple entires.

Column order_Id alone cannot be a primary key as a same order can contain the order of
multiple products, thus same order_Id can be present multiple times.

Column product_code cannot be a primary key as more than one customers can place order
for the same product.

Column product_count alone cannot be a primary key because two orders can be placed for
the same product count.

Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}

This is a composite key as it is made up of more than one attributes.

For MSc(CA) Students Page 18


Advance Database

Alternate/Secondary key in DBMS


Candidate keys, only one key gets selected as primary key, the remaining keys
are known as alternative or secondary keys.

Alternate Key Example


Table Employee, this table has three attributes: Emp_Id, Emp_Number&Emp_Name.

Table: Employee/strong>

Emp_Id Emp_Number Emp_Name


------------------------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

There are two candidate keys in the above table:


{Emp_Id}
{Emp_Number}

DBA (Database administrator) can choose any of the above key as primary key. Lets say
Emp_Id is chosen as primary key.

Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be
called alternative or secondary key.

For MSc(CA) Students Page 19


Advance Database

Boyce Codd normal form (BCNF) --- Candidate key


 BCNF is the advance version of 3NF. It is stricter than 3NF.
 It should be in 3NF and left hand side i.e determinant
 A relation will be in BCNF if it is in 3NF if and only if for determinant
should be candidate key

Example:

Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
264 India

For MSc(CA) Students Page 20


Advance Database

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies
is a key.

For MSc(CA) Students Page 21


Advance Database

Fourth normal form (4NF) ---MVD


 A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
 For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.

Example

STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and
Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on
STU_ID, which leads to unnecessary repetition of data.

For MSc(CA) Students Page 22


Advance Database

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

For MSc(CA) Students Page 23


Advance Database

Fifth normal form (5NF) --JD


 A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but
he doesn't take Math class for Semester 2. In this case, combination of all these
fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the
subject and who will be taking that subject so we leave Lecturer and Subject as
NULL. But all three columns together acts as a primary key, so we can't leave
other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations
P1, P2 & P3:

For MSc(CA) Students Page 24


Advance Database

P1

SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math

P2

SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen

P3

SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

For MSc(CA) Students Page 25


Advance Database

Difference Between Super Key and Candidate Key

Two basic keys of any database that is super key and candidate key. Every candidate key is
a super key but, every super key may or may not be a candidate key.

Comparison Chart
Basis for
Super Key Candidate Key
Comparison

A single attribute or a set of attributes


A proper subset of a super key, which is also
Basic that uniquely identifies all attributes in
a super key is a candidate key.
a relation is super key.

One in It is not compulsory that all super keys


All candidate keys are super keys.
other will be candidate keys.

The set of super keys forms the base for The set of candidate keys form the base for
Selection
selection of candidate keys. selection of a single primary key.

There are comparatively more super There are comparatively less candidate keys
Count
keys in a relation. in a relation.

Definition of Super key

A super key is a basic key of any relation. It is defined as a key that can identify all other
attributes in a relation.

Super key can be a single attribute or a set of attributes. Two entities do not have the same
values for the attributes that compose a super key. There is at least one or more that one super
keys in a relation.

A minimal super key is also called candidate key. So we can say some of the super keys get
verified for being a candidate key.

Let us take a relation R (A, B, C, D, E, F); we have following dependencies for a relation R,
and we have checked each for being super key.

For MSc(CA) Students Page 26


Advance Database

Using key, AB we are able to identify rest of the attributes of the table i.e. CDEF. Similarly,
using keys CD, ABD, DF, and DEF we can identify remaining attributes of the table R. So
all these are super keys.

But using a key CB we can only find values for attribute D and F, we can not find the value
for attributes A and E. Hence, CB is not a super key. Same is the case with key D we can not
find the values of all attributes in a table using key D. So, D is not a super key.

Definition of Candidate Key

A super key that is a proper subset of another super key of the same relation is called a
minimal super key. The minimal super key is called Candidate key. Like super key, a
candidate key also identifies each tuple in a table uniquely. A candidate key’s attribute can
accept NULL value.

One of the candidate keys is chosen as primary key by DBA. Provided, that the key attribute
values must be unique and does not contain NULL. The attributes of Candidate key is called
prime attributes.

In above example, we have found the Super keys for relation R. Now, let us check all the
super keys for being Candidate key.

Super key AB is a proper subset of super key ABD. So, when a minimal super key AB alone,
is capable of identifying all attributes in a table, then we do not need bigger key ABD. Hence,
super key AB is a candidate key while ABD will only be super key.
Similarly, a super key DF is also a proper subset of super key DEF. So, when DF is alone
capable of identifying all attributes in a relation why do we need DEF. Hence, super key
DF becomes a candidate key while DEF is only a super key.

The super key CD is not a proper subset of any other super key. So, we can say CD is a
minimal super key that identifies all attributes in a relation. Hence, CD is a candidate key.
For MSc(CA) Students Page 27
Advance Database

Whereas key CB and D are not super key so, they cannot be the candidate key even. Viewing
above table you can conclude that each candidate key is a super key but the inverse is not
true.

Key Differences Between Super Key and Candidate Key


1. A single attribute or a set of attributes that can uniquely identify all attributes of a
particular relation is called Super key. On the other hands, a super key that is a proper
subset of another super key is called candidate key.
2. All candidate keys are super keys but the inverse is not true.
3. The set of super keys is verified to find candidate keys whereas, the set of the candidate keys
is verified to select a single primary key.
4. Super keys are comparatively more in number than candidate keys.

Conclusion:

Super key is a basic key of any relation. They must be plotted first before recognizing other
keys for the relation as they form the base for other keys. Candidate key are important as it
helps in recognizing the most important key of any relation that is a primary key.

1. Candidate Key: are individual columns in a table that qualifies for uniqueness of all the
rows. Eg: Here in Employee table Employee ID & SSN are Candidate keys.

2. Primary Key: is the columns you choose to maintain uniqueness in a table.


Eg: Here in Employee table you can choose either Employee ID or SSN columns,
EmployeeID is preferable choice, as SSN is a secure value.

3. Alternate Key: Candidate column other the Primary column, like if EmployeeID is PK
then SSN would be the Alternate key.

4. Super Key: If you add any other column/attribute to a Primary Key then it become a super
key, like EmployeeID + FullName is a Super Key.

5. Composite Key: If a table do have a single columns that qualifies for a Candidate key,
then you have to select 2 or more columns to make a row unique. Like if there is no
Employee ID or SSN columns, then you can make Full Name + Date Of Birth as Composite
primary Key. But still there can be a narrow chance of duplicate row.

Functional Dependency and Attribute Closure


Functional Dependency

For MSc(CA) Students Page 28


Advance Database

A functional dependency A->B in a relation holds if two tuples having same value of
attribute A also have same value for attribute B. For Example, in relation STUDENT shown
in table 1, Functional Dependencies
STUD_NO->STUD_NAME, STUD_NO->STUD_ADDR hold
but
STUD_NAME->STUD_ADDR do not hold

How to find functional dependencies for a relation?


Functional Dependencies in a relation are dependent on the domain of the relation. Consider
the STUDENT relation given in Table 1.
 STUD_NO is unique for each student. So STUD_NO->STUD_NAME,
STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO-
>STUD_COUNTRY and STUD_NO -> STUD_AGE all will be true.

 Similarly, STUD_STATE->STUD_COUNTRY will be true as if two records have


same STUD_STATE,
they will have same STUD_COUNTRY as well.
 For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will be true as
two records with same COURSE_NO will have same COURSE_NAME.

Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of
all FDs present in the relation.
For Example, FD set for relation STUDENT shown in table 1 is:

{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE,


STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY }

Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes
which can be functionally determined from it.

How to find attribute closure of an attribute set?


To find attribute closure of an attribute set:
 Add elements of attribute set to the result set.
 Recursively add elements to the result set which can be functionally determined from
the elements of the result set.
Using FD set of table 1, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}
How to find Candidate Keys and Super Keys using Attribute Closure?
 If attribute closure of an attribute set contains all attributes of relation, the attribute set
will be super key of the relation.

For MSc(CA) Students Page 29


Advance Database

 If no subset of this attribute set can functionally determine all attributes of the
relation, the set will be candidate key as well. For Example, using FD set of table 1,
(STUD_NO, STUD_NAME)+ = {STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE}
(STUD_NO, STUD_NAME) will be super key but not candidate key because its subset
(STUD_NO)+ is equal to all attributes of the relation. So, STUD_NO will be a candidate key.

GATE Question:
Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, M} and the set of functional
dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on R.
What is the key for R? (GATE-CS-2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Answer: Finding attribute closure of all given options, we get:
{E,F}+ = {EFGIJ}
{E,F,H}+ = {EFHGIJKLMN}
{E,F,H,K,L}+ = {{EFHGIJKLMN}
{E}+ = {E}
{EFH}+ and {EFHKL}+ results in set of all attributes, but EFH is minimal. So it will be
candidate key. So correct option is (B).

How to check whether an FD can be derived from a given FD set?

To check whether an FD A->B can be derived from an FD set F,


1. Find (A)+ using FD set F.
1. If B is subset of (A)+, then A->B is true else not true.
2.

GATE Question:

In a schema with attributes A, B, C, D and E following set of functional dependencies


are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}

Which of the following functional dependencies is NOT implied by the above set?
(GATE IT 2005)
A. CD -> AC
B. BD -> CD
C. BC -> CD
D. AC -> BC
Answer: Using FD set given in question,
(CD)+ = {CDEAB} which means CD -> AC also holds true.

For MSc(CA) Students Page 30


Advance Database

(BD)+ = {BD} which means BD -> CD can’t hold true. So this FD is no implied in FD set.
So (B) is the required option.
Others can be checked in the same way.

Prime and non-prime attributes


Attributes which are parts of any candidate key of relation are called as prime attribute, others
are non-prime attributes. For Example, STUD_NO in STUDENT relation is prime attribute,
others are non-prime attribute.
GATE Question: Consider a relation scheme R = (A, B, C, D, E, H) on which the
following functional dependencies hold: {A–>B, BC–> D, E–>C, D–>A}. What are the
candidate keys of R? [GATE 2005]
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
Answer: (AE)+ = {ABECD} which is not set of all attributes. So AE is not a candidate key.
Hence option A and B are wrong.
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes. So BCH is not a candidate key. Hence
option C is wrong.
So correct answer is D.

Closure of a Set of Functional Dependencies

Assume that F is a set of functional dependencies for a relation R. The closure of F, denoted
by F+ , is the set of all functional dependencies obtained logically implied by F i.e., F+ is the
set of FD’s that can be derived from F.

Furthermore, the F+ is the smallest set of FD’s such that F+ is superset of F and no FD can be
derived from F by using the axioms that are not contained in F+ . If we have identified all the
functional dependencies in a relationship then we can easily identify superkeys, candidate
keys, and other determinants necessary for normalization.

Algorithm: To compute F+ , the closure of FD’s Input: Given a relation with a set of FD’s F.
Output: The closure of a set of FD’s F.

Step 1. Initialize F+ = F // F is the set of given FD’s

Step 2. While (changes to F+ ) do

Step 3. For each functional dependency f in F+ Apply Reflexivity and augmentation axioms
on f and add the resulting functional dependencies to F+ .

Step 4. For each pair of functional dependencies f1 and f 2 in F+ Apply transitivity axiom on
f1 and f2 If f 1 and f 2 can be combined add the resulting FD to F+ .

For MSc(CA) Students Page 31


Advance Database

Step 5. For each functional dependencies to F+ Apply Union and Decomposition axioms on f
and add the resulting functional dependencies to F+ .

Step 6. For each pair of functional dependencies f1 and f 2 in F+ Apply Pseudotransitivity


axiom on f1 and f 2 If f1 and f2 can be combined add the resulting FD’s to F+ . The
additional axioms make the process of computing F+ easier by simplifying the process used
for step 3 and 4. If we want to compute F+ only by using Armstrong’s rule than eliminate
step 5 and 6.

Example. Consider the relation schema R = {H, D, X, Y, Z} and the functional dependencies
X→YZ, DX→W, Y→H Find the closure F+ of FD’s.

Sol. Applying Decomposition on X→YZ gives X→Y and X→Z Applying Transitivity on
X→Y and Y→H gives X→H Thus the closure F+ has the FD’s X→YZ, DX→W, Y→H,
X→Y, X→Z, X→H

Example. Consider the relation schema R= {A, B, C, D, E} and Functional dependencies


A→BC, CD→E, B→D, E→A

Sol. Applying Decomposition on A→BC gives A→B and A→C. Functional Dependency and
Normalisation 203 Applying Transitivity on A→B and B→D gives A→D. Applying
Transitivity on CD→E and E→A gives CD→A Thus the closure F+ has the FD’s A→BC,
CD→E, B→D, E→A, A→B, A→C, A→D, CD→A.

For MSc(CA) Students Page 32

You might also like