0% found this document useful (0 votes)
1 views

CH-4 DBMS Normalisation

Normalization is the process of breaking down unsatisfactory relations to minimize data redundancy and avoid anomalies during database operations. It involves defining normal forms based on functional dependencies and keys, ensuring that relations are designed to avoid issues such as null values and spurious tuples. The document outlines guidelines for designing relational databases, including the importance of functional dependencies, keys, and various normal forms like 1NF, 2NF, and 3NF.

Uploaded by

shreyassupe346
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

CH-4 DBMS Normalisation

Normalization is the process of breaking down unsatisfactory relations to minimize data redundancy and avoid anomalies during database operations. It involves defining normal forms based on functional dependencies and keys, ensuring that relations are designed to avoid issues such as null values and spurious tuples. The document outlines guidelines for designing relational databases, including the importance of functional dependencies, keys, and various normal forms like 1NF, 2NF, and 3NF.

Uploaded by

shreyassupe346
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

UNIT 4

Normalization of Relations

• Normalization: The process of decomposing unsatisfactory "bad" relations by


breaking up their attributes into smaller relations

• Normal form: Condition using keys and FDs of a relation to certify whether a
relation schema is in a particular normal form
Purpose of Normalization

➢Minimize redundancy in data


➢Remove insert, delete and update anomalies during database activities.

➢Reduce the need to reorganize data when it is modified or enhanced.


➢Normalization reduces a complex user view into a number of
subgroups.
Types of Normal Forms
Informal Design Guidelines for Relational Databases

➢ Semantics of the Relation Attributes

➢ Redundant Information in Tuples and Update Anomalies

➢ Null Values in Tuples

➢ Spurious Tuples
Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity
or relationship instance. (Applies to individual relations and their attributes).

• Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be


mixed in the same relation

• Only foreign keys should be used to refer to other entities

• Entity and relationship attributes should be kept apart as much as possible.

Design a schema that can be explained easily relation by relation.


The semantics of attributes should be easy to interpret.
Redundant Information in Tuples and Update
Anomalies
GUIDELINE 2: Design a schema that does not suffer from the insertion,
deletion and update anomalies. If there are any present, then note them so
that applications can be made to take them into account.

• Mixing attributes of multiple entities may cause problems


• Information is stored redundantly wasting storage
• Problems with update anomalies
• Insertion anomalies
• Deletion anomalies
• Modification anomalies
EXAMPLE OF AN UPDATE ANOMALY

Consider the relation: EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer-
Accounting” may cause this update to be made for all 100 employees working on project
P1.

Insert Anomaly: Cannot insert a project unless an employee is assigned to .


Inversely - Cannot insert an employee unless an he/she is assigned to a project.

Delete Anomaly: When a project is deleted, it will result in deleting all the employees who
work on that project. Alternately, if an employee is the sole employee on a project,
deleting that employee would result in deleting the corresponding project.
Null Values in Tuples

GUIDELINE 3: Relations should be designed such that their tuples


will have as few NULL values as possible
• Attributes that are NULL frequently could be placed in separate
relations (with the primary key)

• Reasons for nulls:


• attribute not applicable or invalid
• attribute value unknown (may exist)
• value known to exist, but unavailable
Spurious Tuples
• Bad designs for a relational database may result in erroneous
results for certain JOIN operations
• The "lossless join" property is used to guarantee meaningful results
for join operations

GUIDELINE 4: The relations should be designed to satisfy the lossless


join condition. No spurious tuples should be generated by doing a
natural-join of any relations.
Functional Dependencies
• Functional dependencies (FDs) are used to specify formal measures
of the "goodness" of relational designs

• FDs and keys are used to define normal forms for relations

• FDs are constraints that are derived from the meaning and
interrelationships of the data attributes

• A set of attributes X functionally determines a set of attributes Y if


the value of X determines a unique value for Y

7/14/2023
Definition
• X -> Y holds if whenever two tuples have the same value for X, they must have
the same value for Y.

• For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then
t1[Y]=t2[Y]

• X -> Y in R specifies a constraint on all relation instances r(R)

• Written as X -> Y; can be displayed graphically on a relation schema as in


Figures. ( denoted by the arrow: ).

• FDs are derived from the real-world constraints on the attributes


7/14/2023
Examples of FD

• social security number determines employee name


SSN -> ENAME

• project number determines project name and location


PNUMBER -> {PNAME, PLOCATION}

• employee ssn and project number determines the hours per week that the employee
works on the project
{SSN, PNUMBER} -> HOURS

• An FD is a property of the attributes in the schema R

• The constraint must hold on every relation instance r(R)

• If K is a key of R, then K functionally determines all attributes in R (since we never have


two distinct tuples with t1[K]=t2[K])
7/14/2023
Functional Dependencies

• We say an attribute, B, has a functional dependency on another attribute, A, if for


any two records, which have
• the same value for A, then the values for B in these two records must be the
same. We illustrate this as:
A→B

• Example: Suppose we keep track of employee email addresses, and we


only track one email address for each employee. Suppose each employee is
identified by their unique employee number. We say there is a functional
dependency of email address on employee number:

employee number → email address


EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee

If EmpNum is the PK then the FDs:


EmpNum → EmpEmail
EmpNum → EmpFname
EmpNum → EmpLname
Functional Dependencies

20
Definitions of Keys and Attributes Participating in Keys

• A superkey of a relation schema R = {A1, A2, ...., An} is a set of


attributes S subset-of R with the property that no two tuples t1
and t2 in any legal relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property that removal


of any attribute from K will cause K not to be a superkey any
more.
Definitions of Keys and Attributes Participating in Keys

• If a relation schema has more than one key, each is called a


candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called
secondary keys.

• A Prime attribute must be a member of some candidate key

• A Nonprime attribute is not a prime attribute—that is, it is not


a member of any candidate key.
First Normal Form

• Disallows composite attributes, multi valued attributes, and nested relations;


attributes whose values for an individual tuple are non-atomic

• Considered to be part of the definition of relation


Example

7/14/2023
7/14/2023
Second Normal Form
Prime attribute - attribute that is member of the primary key K
Full functional dependency - a FD Y -> Z where removal of any attribute from Y
means the FD does not hold any more

Examples:- {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS
nor PNUMBER -> HOURS hold

{SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency )
since SSN -> ENAME also holds

• A relation schema R is in second normal form (2NF) if every non-prime


attribute A in R is fully functionally dependent on the primary key

• R can be decomposed into 2NF relations via the process of 2NF normalization
7/14/2023
Example

7/14/2023
7/14/2023
Third Normal Form
• Transitive functional dependency - a FD X -> Z that can be derived from two FDs X -> Y and
Y -> Z
Examples: SSN -> DMGRSSN is a transitive FD since SSN -> DNUMBER and DNUMBER ->
DMGRSSN hold SSN -> ENAME is non-transitive since there is no set of attributes X where SSN ->
X and X -> ENAME
• A relation schema R is in third normal form (3NF) if it is in 2NF and no non-
prime attribute A in R is transitively dependent on the primary key
• R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a
candidate key. When Y is a candidate key, there is no problem with the transitive dependency .
E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

7/14/2023
Thank You...!
DBMS Interview Questions And Answers

Q #1) What is DBMS used for?

Q #2) What is meant by a Database?

Q #3) Why is the use of DBMS recommended? Explain by listing some of its major
advantages.
DBMS Interview Questions And Answers

Q #4) What is the purpose of normalization in DBMS?

Q #5) What are the different types of languages that are available in the DBMS?

Q #6) What is the purpose of SQL?


DBMS Interview Questions And Answers

Q #7) Explain the concepts of a Primary key and Foreign Key.

Q #8) What are the main differences between Primary key and Unique Key?

Q #9) What is the concept of sub-query in terms of SQL?


DBMS Interview Questions And Answers

Q #10) What is the main difference between UNION and UNION ALL?

Q #11) Explain the concept of ACID properties in DBMS?

Q #12) What is Correlated Subquery in DBMS?


DBMS Interview Questions And Answers

Q #13) Explain Entity, Entity Type, and Entity Set in DBMS?

Q #14) What are the different levels of abstraction in the


DBMS?

Q #15) What integrity rules exist in the DBMS?


DBMS Interview Questions And Answers

Q #16) What is E-R model in the DBMS?

Q #17) What is a functional dependency in the DBMS?

Q #18) How is the pattern matching done in the SQL?


DBMS Interview Questions And Answers

Q #19) What is a join in the SQL?

Q #20) What are different types of joins in SQL?

Q #21) What are the different type of relationships in the


DBMS?

You might also like