0% found this document useful (0 votes)
9 views27 pages

Relational Algebra Basics in DBMS

Uploaded by

udayakn56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views27 pages

Relational Algebra Basics in DBMS

Uploaded by

udayakn56
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 4: Relational Algebra and Calculus:

Introduction of Relational Algebra in DBMS

• Relational Algebra is a formal language used to query and manipulate relational databases,
consisting of a set of operations like selection, projection, union, and join.
• It provides a mathematical framework for querying databases, ensuring efficient data retrieval
and manipulation.
• Relational algebra serves as the mathematical foundation for query SQL.
• Relational algebra simplifies the process of querying databases and makes it easier to
understand and optimize query execution for better performance.

Key Concepts in Relational Algebra


• Relations: In relational algebra, a relation is a table that consists of
rows and columns, representing data in a structured format. Each
relation has a unique name and is made up of tuples.
• Tuples: A tuple is a single row in a relation, which contains a set of
values for each attribute.
• Attributes: Attributes are the columns in a relation, each
representing a specific characteristic or property of the data. For
example, in a "Students" relation, attributes could be "Name", "Age",
and "Grade".
• Domains: A domain is the set of possible values that an attribute can
have. It defines the type of data that can be stored in each column of
a relation, such as integers, strings, or dates.

Basic Operators in Relational Algebra


Relational algebra consists of various basic operators that help us to fetch and manipulate data
from relational tables in the database to perform certain operations on relational data.
Basic operators are fundamental operations that include selection (σ), projection (π), union (U), set
difference (−), Cartesian product (×), and rename (ρ).
1. Selection(σ)

The Selection Operation is basically used to filter out rows from a given table based on certain given
condition. It basically allows us to retrieve only those rows that match the condition as per condition
passed during SQL Query.

Example: If we have a relation R with attributes A, B, and C, and we want to select tuples where C >
3, we write:

A B C

1 2 4

2 2 3

3 2 3

4 3 4

σ(c>3)(R) will select the tuples which have c more than 3.

Output:

A B C

1 2 4
A B C

4 3 4

Explanation: The selection operation only filters rows but does not display or change their order.
The projection operator is used for displaying specific columns.

2. Projection(π)

While Selection operation works on rows, similarly projection operation of relational algebra works
on columns. It basically allows us to pick specific columns from a given relational table based on the
given condition and ignoring all the other remaining columns.

Example: Suppose we want columns B and C from Relation R.

π(B,C)(R) will show following columns.

Output:

B C

2 4

2 3

3 4

Explanation: By Default, projection operation removes duplicate values.

3. Union(U)

The Union Operator is basically used to combine the results of two queries into a single result. The
only condition is that both queries must return same number of columns with same data types.
Union operation in relational algebra is the same as union operation in set theory.

Example: Consider the following table of Students having different optional subjects in their course.

FRENCH

Student_Name Roll_Number

Ram 01

Mohan 02
Student_Name Roll_Number

Vivek 13

Geeta 17

GERMAN

Student_Name Roll_Number

Vivek 13

Geeta 17

Shyam 21

Rohan 25

If FRENCH and GERMAN relations represent student names in two subjects, we can combine their
student names as follows:

π(Student_Name)(FRENCH) U π(Student_Name)(GERMAN)

Output:

Student_Name

Ram

Mohan

Vivek

Geeta
Student_Name

Shyam

Rohan

Explanation: The only constraint in the union of two relations is that both relations must have the
same set of Attributes.

4. Set Difference(-)

Set difference basically provides the rows that are present in one table, but not in another tables. Set
Difference in relational algebra is the same set difference operation as in set theory.

Example: To find students enrolled only in FRENCH but not in GERMAN, we write:

π(Student_Name)(FRENCH) - π(Student_Name)(GERMAN)

Student_Name

Ram

Mohan

Explanation: The only constraint in the Set Difference between two relations is that both relations
must have the same set of Attributes.

5. Rename(ρ)

Rename operator basically allows you to give a temporary name to a specific relational table or to its
columns. It is very useful when we want to avoid ambiguity, especially in complex Queries. Rename is
a unary operation used for renaming attributes of a relation.

Example: We can rename an attribute B in relation R to D

A B C

1 2 4

2 2 3

3 2 3
A B C

4 3 4

ρ(D/B)R will rename the attribute 'B' of the relation by 'D".

Output Table:

A D C

1 2 4

2 2 3

3 2 3

4 3 4

6. Cartesian Product(X)

The Cartesian product combines every row of one table with every row of another table, producing
all the possible combination. It's mostly used as a precursor to more complex operation like joins.
Let’s say A and B, so the cross product between A X B will result in all the attributes of A followed by
each attribute of B. Each record of A will pair with every record of B.

Relation A:

Name Age Sex

Ram 14 M

Sona 15 F

Kim 20 M

Relation B:
ID Course

1 DS

2 DBMS

Output: If relation A has 3 rows and relation B has 2 rows, the Cartesian product A × B will result in 6
rows.

Name Age Sex ID Course

Ram 14 M 1 DS

Ram 14 M 2 DBMS

Sona 15 F 1 DS

Sona 15 F 2 DBMS

Kim 20 M 1 DS

Kim 20 M 2 DBMS

Explanation: If A has 'n' tuples and B has 'm' tuples then A X B will have 'n*m' tuples.

Derived Operators in Relational Algebra

Derived operators are built using basic operators and include operations like join, intersection, and
division.

1. Join Operators

Join operations in relational algebra combine data from two or more relations based on a related
attribute, allowing for more complex queries and data retrieval. Different types of joins include:

Inner Join

An inner join combines rows from two relations based on a matching condition and only returns rows
where there is a match in both relations. If a record in one relation doesn't have a corresponding
match in the other, it is excluded from the result. This is the most common type of join.
• Conditional Join: A conditional join is an inner join where the matching condition can involve
any comparison operator like equals (=), greater than (>),
etc. Example: Joining Employees and Departments on DepartmentID where Salary >
50000 will return employees in departments with a salary greater than 50,000

• Equi Join: An equi join is a type of conditional join where the condition is specifically equality
(=) between columns from both
relations. Example: Joining Customers and Orders on CustomerID where both relations have
this column, returning only matching records.

• Natural Join: A natural join automatically combines relations based on columns with the
same name and type, removing duplicate columns in the result. It’s a more efficient way of
joining. Example: Joining Students and Enrollments where StudentID is common in both, and
the result contains only unique columns.

Outer Join

An outer join returns all rows from one relation, and the matching rows from the other relation. If
there is no match, the result will still include all rows from the outer relation with NULL values in the
columns from the unmatched relation.

• Left Outer Join: A left outer join returns all rows from the left relation and the matching rows
from the right relation. If there is no match, the result will include NULL values for the right
relation’s attributes. Example: Joining Employees with Departments using a left outer join
ensures all employees are listed, even those who aren't assigned to any department,
with NULL values for the department columns.

• Right Outer Join: A right outer join returns all rows from the right relation and the matching
rows from the left relation. If no match exists, the left relation's columns will
contain NULL values. Example: Joining Departments with Employees using a right outer join
includes all departments, even those with no employees assigned, filling unmatched
employee columns with NULL.

• Full Outer Join: A full outer join returns all rows when there is a match in either the left or
right relation. If a row from one relation does not have a match in the other, NULL values are
included for the missing side. Example: Joining Customers and Orders using a full outer join
will return all customers and orders, even if there’s no corresponding order for a customer or
no customer for an order.

2. Set Intersection(∩)

Set Intersection basically allows to fetches only those rows of data that are common between two
sets of relational tables. Set Intersection in relational algebra is the same set intersection operation in
set theory.

Example: Consider the following table of Students having different optional subjects in their course.

Relation FRENCH
Student_Name Roll_Number

Ram 01

Mohan 02

Vivek 13

Geeta 17

Relation GERMAN

Student_Name Roll_Number

Vivek 13

Geeta 17

Shyam 21

Rohan 25

From the above table of FRENCH and GERMAN, the Set Intersection is used as follows:

π(Student_Name)(FRENCH ∩ π(Student_Name)(GERMAN)

Output:

Student_Name

Vivek

Geeta

Explanation: The only constraint in the Set Difference between two relations is that both relations
must have the same set of Attributes.

3. Division (÷)
The Division Operator is used to find tuples in one relation that are related to all tuples in another
relation. It’s typically used for "for all" queries.

Student_Course (Dividend Table):

Student_ID Course_ID

101 C1

101 C2

102 C1

103 C1

103 C2

Course (Divisor Table):

Course_ID

C1

C2

Example: Query is to find students who are enrolled in all courses listed in the Course table. In this
case, students must be enrolled in both C1 and C2.

Student_Course(Student_ID, Course_ID)÷ Course(Course_ID)

Output:

Student_ID

101

103
Domain Relational Calculus in DBMS

Structured Query Language (SQL):

Domain Relational Calculus (DRC) is a non-procedural query language used to retrieve information
from a relational database.

It is based on predicate logic and allows the user to specify what data they want to retrieve, but not
how to retrieve it, making it a declarative query language.

Domain Relational Calculus (DRC)

Domain Relational Calculus (DRC) is a formal query language for relational databases. It describes
queries by specifying a set of conditions or formulas that the data must satisfy.

A general form of a DRC query is written as:

{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) }

where, <x1, x2, x3, ..., xn> represents resulting domains variables and P (x1, x2, x3, ..., xn) represents
the condition or formula equivalent to the Predicate calculus.

Key Characteristics of DRC:

• Non-procedural: Specifies what data to retrieve without describing the steps for retrieval.

• Based on Predicate Calculus: Utilizes logical expressions (predicates) to describe the query.

• Relational Database Queries: Primarily used for querying relational databases.

Components of Domain Relational Calculus (DRC)


1. Domain Variables
• Domain variables represent the attributes (fields) that will appear in the resulting
relation of the query.
2. Predicate
A predicate is a logical condition or formula that the data must satisfy. It is expressed using
comparison operators, connectives, and quantifiers.
• Comparison operators: =, >, <, >=, <=, !=.
• Connectives: AND, OR, NOT.
• Quantifiers: FOR ALL, EXISTS.
3. Quantifiers

Quantifiers are used to express the scope of a query:

• Existential quantifier (∃): Denotes that there exists at least one instance that satisfies a
condition.

• Universal quantifier (∀): Denotes that all instances in the domain satisfy a condition.

• 4. Domains and Relations


• A domain refers to a specific set of values (like integers, strings)
that a variable can take. A relation is a table in the database, and the
tuples in the relation represent data that matches the query
conditions.
Structured Query Language (SQL):

SQL Aggregate functions

• SQL Aggregate Functions are used to perform calculations on a set of rows and return a
single value.
• These functions are particularly useful when we need to summarize, analyze, or group large
datasets in SQL databases.

They are often used with the GROUP BY clause in SQL to summarize data for each group.

Commonly used aggregate functions include COUNT(), SUM(), AVG(), MIN() and MAX().

Key Features of SQL Aggregate Functions:

• Operate on groups of rows: They work on a set of rows and return a single value.

• Ignore NULLs: Most aggregate functions ignore NULL values, except for COUNT(*).

• Used with GROUP BY: To perform calculations on grouped data, you often use aggregate
functions with GROUP BY.

• Can be combined with other SQL clauses: Aggregate functions can be used
alongside HAVING, ORDER BY, and other SQL clauses to filter or sort results.

Commonly Used SQL Aggregate Functions

Below are the most frequently used aggregate functions in SQL.

1. Count()

The COUNT() function returns the number of rows that match a given condition or are present in a
column.

• COUNT(*): Counts all rows.

• COUNT(column_name): Counts non-NULL values in the specified column.

• COUNT(DISTINCT column_name): Counts unique non-NULL values in the column.

Examples:

-- Total number of records in the table

SELECT COUNT(*) AS TotalRecords FROM Employee;

-- Count of non-NULL salaries

SELECT COUNT(Salary) AS NonNullSalaries FROM Employee;

-- Count of unique non-NULL salaries

SELECT COUNT(DISTINCT Salary) AS UniqueSalaries FROM Employee;

2. SUM()
The SUM() function calculates the total sum of a numeric column.

• SUM(column_name): Returns the total sum of all non-NULL values in a column.

Examples:

-- Calculate the total salary

SELECT SUM(Salary) AS TotalSalary FROM Employee;

-- Calculate the sum of unique salaries

SELECT SUM(DISTINCT Salary) AS DistinctSalarySum FROM Employee;

3. AVG()

The AVG() function calculates the average of a numeric column. It divides the sum of the column by
the number of non-NULL rows.

• AVG(column_name): Returns the average of the non-NULL values in the column.

Examples:

-- Calculate the average salary

SELECT AVG(Salary) AS AverageSalary FROM Employee;

-- Average of distinct salaries

SELECT AVG(DISTINCT Salary) AS DistinctAvgSalary FROM Employee;

4. MIN() and MAX()

The MIN() and MAX() functions return the smallest and largest values, respectively, from a column.

• MIN(column_name): Returns the minimum value.

• MAX(column_name): Returns the maximum value.

Examples:

-- Find the highest salary

SELECT MAX(Salary) AS HighestSalary FROM Employee;

-- Find the lowest salary

SELECT MIN(Salary) AS LowestSalary FROM Employee;

Examples of SQL Aggregate Functions

Let's consider a demo Employee table to demonstrate SQL aggregate functions . This table contains
employee details such as their ID, Name, and Salary.
Id Name Salary

1 A 802

2 B 403

3 C 604

4 D 705

5 E 606

6 F NULL

1. Count the Total Number of Employees

SELECT COUNT(*) AS TotalEmployees FROM Employee;

Output:

TotalEmployees

2. Calculate the Total Salary

SELECT SUM(Salary) AS TotalSalary FROM Employee;

Output:

TotalSalary

3120

3. Find the Average Salary:


SELECT AVG(Salary) AS AverageSalary FROM Employee;

Output:

AverageSalary

624

4. Find the Highest and Lowest Salary

SELECT MAX(Salary) AS HighestSalary FROM Employee;

Output:

HighestSalary LowestSalary

802 403

Using Aggregate Functions with GROUP BY

SQL GROUP BY allows us to group rows that have the same values in specific columns. We can then
apply aggregate functions to these groups, which helps us summarize data for each group. This is
commonly used with the COUNT(), SUM(), AVG(), MIN(), and MAX() functions.

Example: Total Salary by Each Employee

SELECT Name, SUM(Salary) AS TotalSalary

FROM Employee

GROUP BY Name;

Output:

Name TotalSalary

A 802

B 403

C 604
Name TotalSalary

D 705

E 606

F -

Using HAVING with Aggregate Functions

The HAVING clause is used to filter results after applying aggregate functions, unlike WHERE, which
filters rows before aggregation. HAVING is essential when we want to filter based on the result of an
aggregate function.

Example: Find Employees with Salary Greater Than 600

SELECT Name, SUM(Salary) AS TotalSalary

FROM Employee

GROUP BY Name

HAVING SUM(Salary) > 600;

Output:

Name TotalSalary

A 802

C 604

D 705

E 606

Key Takeaways about SQL Aggregate Functions

• Aggregate functions in SQL operate on a group of values and return a single result.

• They are often used with the GROUP BY clause to summarize the grouped data.
• Aggregate function operates on non-NULL values only (except COUNT).

• Commonly used aggregate functions are - MIN(), MAX(), COUNT(), AVG(), and SUM().

SQL - Logical Operators


SQL Logical Operators are essential tools used to test the truth of conditions in SQL
queries. They return boolean values such as TRUE, FALSE, or UNKNOWN, making them
invaluable for filtering, retrieving, or manipulating data.

1. AND Operator
The AND operator is used to combine two or more conditions in an SQL query. It returns
records only when all conditions specified in the query are true. This operator is
commonly used when filtering data that must satisfy multiple criteria simultaneously.
Example
Retrieve the records of employees from the employees table who are located
in 'Allahabad' and belong to 'India', ensuring that both conditions are met.
Query:
SELECT * FROM employee WHERE emp_city = 'Allahabad' AND emp_country = 'India';
Output

output
Explanation:
In the output, both conditions (emp_city = 'Allahabad' and emp_country = 'India') are
satisfied for the listed employees, so these records are returned by the query.

NOT Operator
The NOT operator is used to reverse the result of a condition, returning TRUE when the
condition is FALSE. It is typically used to exclude records that match a specific condition,
making it useful for filtering out unwanted data.
Example
Retrieve the records of employees from the employee table whose city names do not start
with the letter 'A'.
Query:
SELECT * FROM employee WHERE emp_city NOT LIKE 'A%';
Output

output
Explanation:
In this query, the NOT operator negates the LIKE condition. The LIKE operator is used
to match patterns in string data, and the 'A%' pattern matches any city name that starts with
the letter 'A'. By using the NOT operator, we exclude cities starting with 'A' from the result
set.

OR Operator
The OR operator combines multiple conditions in a SQL query and returns TRUE if at
least one of the conditions is satisfied. It is ideal for situations where you want to retrieve
records that meet any of several possible conditions.
Example
Retrieve the records of employees from the employee table who are either
from 'Varanasi' or have 'India' as their country.
Query
SELECT * FROM employee WHERE emp_city = 'Varanasi' OR emp_country = 'India';
Output

output
Explanation:
In this case, the output includes employees from 'Varanasi' as well as those who
have 'India' as their country, even if they are from different cities. The query returns all
records where at least one of the conditions is true.

SQL, predicates are conditions used in WHERE or HAVING clauses to filter data. LIKE, BETWEEN, ALIAS,
and DISTINCT are related concepts, though ALIAS and DISTINCT are not strictly predicates in the same
way LIKE and BETWEEN are. LIKE Predicate.

The LIKE predicate is used for pattern matching in string comparisons. It allows searching for values
that match a specified pattern using wildcard characters: %: Represents zero or more characters and
_: Represents a single character.

Code

SELECT customer_name FROM customers WHERE email LIKE '%@[Link]';

BETWEEN Predicate.

The BETWEEN predicate checks if a value falls within a specified range (inclusive of both the lower and
upper bounds). It can be used with numeric, text, or date data types.

Code

SELECT product_name FROM products WHERE price BETWEEN 10.00 AND 50.00;

alias.
An ALIAS is a temporary name given to a table or a column in a SQL query. Aliases make queries more
readable and can simplify complex queries, especially when dealing with joins or long table/column
names. ALIAS is not a predicate itself but a naming convention.

Code

SELECT c.customer_name AS Name, o.order_date AS OrderDate FROM customers AS c


JOIN orders AS ON c.customer_id = o.customer_id;

distinct.

The DISTINCT keyword is used with the SELECT statement to eliminate duplicate rows from the result
set, returning only unique values. Like ALIAS, DISTINCT is not a predicate but a clause modifier that
affects the output of the query.

Code

SELECT DISTINCT department FROM employees;


Armstrong's Axioms in Functional Dependency in DBMS

Armstrong's Axioms refer to a set of inference rules, introduced by William W. Armstrong, that are
used to test the logical implication of functional dependencies. Given a set of functional dependencies
F, the closure of F (denoted as F+) is the set of all functional dependencies logically implied by F.
Armstrong's Axioms, when applied repeatedly, help generate the closure of functional dependencies.

These axioms are fundamental in determining functional dependencies in databases and are used to
derive conclusions about the relationships between attributes.

• Axiom of Reflexivity: If A is a set of attributes and B is a subset of A, then A holds B. If B⊆A


then A→B. This property is trivial property.

• Axiom of Augmentation: If A→B holds and Y is the attribute set, then AY→BY also holds. That
is adding attributes to dependencies, does not change the basic dependencies. If A→B,
then AC→BC for any C.

• Axiom of Transitivity: Same as the transitive rule in algebra, if A→B holds and B→C holds,
then A→C also holds. A→B is called A functionally which determines B. If X→Y and Y→Z,
then X→Z.

Example:

Let’s assume the following functional dependencies:

{A} → {B}
{B} → {C}
{A, C} → {D}

1. Reflexivity: Since any set of attributes determines its subset, we can immediately infer the following:

• {A} → {A} (A set always determines itself).

• {B} → {B}.
• {A, C} → {A}.

2. Augmentation: If we know that {A} → {B}, we can add the same attribute (or set of attributes) to
both sides:

• From {A} → {B}, we can augment both sides with {C}: {A, C} → {B, C}.

• From {B} → {C}, we can augment both sides with {A}: {A, B} → {C, B}.

3. Transitivity: If we know {A} → {B} and {B} → {C}, we can infer that:

• {A} → {C} (Using transitivity: {A} → {B} and {B} → {C}).

Although Armstrong's axioms are sound and complete, there are additional rules for functional
dependencies that are derived from them. These rules are introduced to simplify operations and make
the process easier.

What is Functional Dependency?

A functional dependency occurs when one attribute uniquely determines another attribute within a
relation. It is a constraint that describes how attributes in a table relate to each other. If attribute A
functionally determines attribute B we write this as the A→B.

Example:

roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:

• roll_no → { name, dept_name, dept_building }→ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence a valid Functional dependency

• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building

• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,


dept_building}, etc.

Types of Functional Dependencies in DBMS

1. Trivial functional dependency

2. Non-Trivial functional dependency

3. Fully Functional Dependency

1. Trivial Functional Dependency

In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and
Y is the subset of X, then it is called trivial functional dependency.

Symbolically: A→B is trivial functional dependency if B is a subset of A.

The following dependencies are also trivial: A→A & B→B

Example 1 :

• ABC -> AB

• ABC -> A

• ABC -> ABC

Example 2:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset
of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial functional
dependency.

2. Non-trivial Functional Dependency

In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e.
If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.

Example 1 :
• Id -> Name

• Name -> DOB

Example 2:

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}

What is Fully Functional Dependency?

Real-life dependencies emerge when, within one relationship, the value of one of the attributes is
capable of identifying the value of another attribute. Simply put, if A fully determines B without any
proper subset of A giving full cause to B, then A is said to fully depend on B. In other words, you cannot
derive B from a smaller part of A.

An Example of Fully Functional Dependency

To visualize a relation called "Employee” with the attributes {Employee_ID, Employee_Name,


Department, Salary}, use the following diagram: Basically, if that the Employee_ID attribute figures out
the other attributes in the table, such as the Employee_Name, Department, and Salary, the
Employee_ID attribute fully functionally depends on all these attributes.

Importance of Fully Functional Dependency

• Data Integrity: Instantaneous reference derives ones and only one value from others on the
elements, which provides data integrity.

• Normalization: It plays a role in recording large tables into smaller, structured tables, which
reduces redundancy and helps people to handle.

• Efficient Queries: Databases with well-thought-out dependencies that provide maximum


functionality are usually the ones that tend to outperform in query response.

• Update Anomalies: All major operations will be enabled by this through reduced anomalies in
updates.

Closure Of Functional Dependency : Introduction


• The Closure Of Functional Dependency means the complete set of all possible attributes that
can be functionally derived from given functional dependency using the inference rules known
as Armstrong’s Rules.

• If “F” is a functional dependency then closure of functional dependency can be denoted using
“{F}+”.

• There are three steps to calculate closure of functional dependency.

Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be
derived from the other given functional dependencies. Repeat this process until all the possible
attributes which can be derived are added in the closure.
Normal Forms in DBMS
Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and
maintaining data integrity.
What is Normalization in DBMS?
Normalization is a systematic approach to organize data within a database to reduce redundancy and
eliminate undesirable characteristics such as insertion, update, and deletion anomalies.
The process involves breaking down large tables into smaller, well-structured ones and defining
relationships between them. This not only reduces the chances of storing duplicate data but also
improves the overall efficiency of the database.
Why is Normalization Important?
• Reduces Data Redundancy: Duplicate data is stored efficiently, saving disk space and
reducing inconsistency.
• Improves Data Integrity: Ensures the accuracy and consistency of data by organizing it in a
structured manner.
• Simplifies Database Design: By following a clear structure, database designs become easier
to maintain and update.
• Optimizes Performance: Reduces the chance of anomalies and increases the efficiency of
database operations.

What are Normal Forms in DBMS?


Normalization is a technique used in database design to reduce
redundancy and improve data integrity by organizing data into tables and
ensuring proper relationships.
Let's break down the various normal forms step-by-step to understand the conditions that need to be
satisfied at each level:
1. First Normal Form (1NF): Eliminating Duplicate Records
A table is in 1NF if it satisfies the following conditions:
• All columns contain atomic values (i.e., indivisible values).
• Each row is unique (i.e., no duplicate rows).
• Each column has a unique name.
• The order in which data is stored does not matter.
Example of 1NF Violation: If a table has a column "Phone Numbers" that stores multiple phone
numbers in a single cell, it violates 1NF. To bring it into 1NF, you need to separate phone numbers into
individual rows.
2. Second Normal Form (2NF): Eliminating Partial Dependency
A relation is in 2NF if it satisfies the conditions of 1NF and additionally. No partial dependency exists,
meaning every non-prime attribute (non-key attribute) must depend on the entire primary key, not just
a part of it.
Example: For a composite key (StudentID, CourseID), if the StudentName depends only
on StudentID and not on the entire key, it violates 2NF. To normalize, move StudentName into a
separate table where it depends only on StudentID.
3. Third Normal Form (3NF): Eliminating Transitive Dependency
A relation is in 3NF if it satisfies 2NF and additionally, there are no transitive dependencies. In simpler
terms, non-prime attributes should not depend on other non-prime attributes.
Example: Consider a table with (StudentID, CourseID, Instructor). If Instructor depends
on CourseID, and CourseID depends on StudentID, then Instructor indirectly depends
on StudentID, which violates 3NF. To resolve this, place Instructor in a separate table linked
by CourseID.
4. Boyce-Codd Normal Form (BCNF): The Strongest Form of 3NF
BCNF is a stricter version of 3NF where for every non-trivial functional dependency (X → Y), X must
be a superkey (a unique identifier for a record in the table).
Example: If a table has a dependency (StudentID, CourseID) → Instructor, but neither StudentID nor
CourseID is a superkey, then it violates BCNF. To bring it into BCNF, decompose the table so that each
determinant is a candidate key.
Denormalization in Databases
Denormalization is a database optimization technique in which we add redundant data to one or more
tables. This can help us avoid costly joins in a relational database.
denormalization does not mean 'reversing normalization' or 'not to normalize'. It is an optimization
technique that is applied after normalization.
Basically, The process of taking a normalized schema and making it non-normalized is called
denormalization
How is Denormalization Different from Normalization?
Normalization and Denormalization both are the method which use in database but it works opposite to
each other. One side normalization is used for reduce or removing the redundancy which means there
will be no duplicate data or entries in the same table and also optimizes for data integrity and efficient
storage
While, Denormalization is used for add the redundancy into normalized table so that enhance the
functionality and minimize the running time of database queries (like joins operation) and optimizes for
performance and query simplicity.
Advantages of Denormalization
• Improved Query Performance: Denormalization can improve query performance by
reducing the number of joins required to retrieve data.
• Reduced Complexity: By combining related data into fewer tables, denormalization can
simplify the database schema and make it easier to manage.
• Easier Maintenance and Updates: Denormalization can make it easier to update and maintain
the database by reducing the number of tables.
• Improved Read Performance: Denormalization can improve read performance by making it
easier to access data.
• Better Scalability: Denormalization can improve the scalability of a database system by
reducing the number of tables and improving the overall performance.
Disadvantages of Denormalization
• Reduced Data Integrity: By adding redundant data, denormalization can reduce data integrity
and increase the risk of inconsistencies.
• Increased Complexity: While denormalization can simplify the database schema in some
cases, it can also increase complexity by introducing redundant data.
• Increased Storage Requirements: By adding redundant data, denormalization can increase
storage requirements and increase the cost of maintaining the database.
• Increased Update and Maintenance Complexity: Denormalization can increase the
complexity of updating and maintaining the database by introducing redundant data.
• Limited Flexibility: Denormalization can reduce the flexibility of a database system by
introducing redundant data and making it harder to modify the schema.

You might also like