Relational Algebra Basics in DBMS
Relational Algebra Basics in DBMS
• Relational Algebra is a formal language used to query and manipulate relational databases,
consisting of a set of operations like selection, projection, union, and join.
• It provides a mathematical framework for querying databases, ensuring efficient data retrieval
and manipulation.
• Relational algebra serves as the mathematical foundation for query SQL.
• Relational algebra simplifies the process of querying databases and makes it easier to
understand and optimize query execution for better performance.
The Selection Operation is basically used to filter out rows from a given table based on certain given
condition. It basically allows us to retrieve only those rows that match the condition as per condition
passed during SQL Query.
Example: If we have a relation R with attributes A, B, and C, and we want to select tuples where C >
3, we write:
A B C
1 2 4
2 2 3
3 2 3
4 3 4
Output:
A B C
1 2 4
A B C
4 3 4
Explanation: The selection operation only filters rows but does not display or change their order.
The projection operator is used for displaying specific columns.
2. Projection(π)
While Selection operation works on rows, similarly projection operation of relational algebra works
on columns. It basically allows us to pick specific columns from a given relational table based on the
given condition and ignoring all the other remaining columns.
Output:
B C
2 4
2 3
3 4
3. Union(U)
The Union Operator is basically used to combine the results of two queries into a single result. The
only condition is that both queries must return same number of columns with same data types.
Union operation in relational algebra is the same as union operation in set theory.
Example: Consider the following table of Students having different optional subjects in their course.
FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Student_Name Roll_Number
Vivek 13
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
If FRENCH and GERMAN relations represent student names in two subjects, we can combine their
student names as follows:
π(Student_Name)(FRENCH) U π(Student_Name)(GERMAN)
Output:
Student_Name
Ram
Mohan
Vivek
Geeta
Student_Name
Shyam
Rohan
Explanation: The only constraint in the union of two relations is that both relations must have the
same set of Attributes.
4. Set Difference(-)
Set difference basically provides the rows that are present in one table, but not in another tables. Set
Difference in relational algebra is the same set difference operation as in set theory.
Example: To find students enrolled only in FRENCH but not in GERMAN, we write:
π(Student_Name)(FRENCH) - π(Student_Name)(GERMAN)
Student_Name
Ram
Mohan
Explanation: The only constraint in the Set Difference between two relations is that both relations
must have the same set of Attributes.
5. Rename(ρ)
Rename operator basically allows you to give a temporary name to a specific relational table or to its
columns. It is very useful when we want to avoid ambiguity, especially in complex Queries. Rename is
a unary operation used for renaming attributes of a relation.
A B C
1 2 4
2 2 3
3 2 3
A B C
4 3 4
Output Table:
A D C
1 2 4
2 2 3
3 2 3
4 3 4
6. Cartesian Product(X)
The Cartesian product combines every row of one table with every row of another table, producing
all the possible combination. It's mostly used as a precursor to more complex operation like joins.
Let’s say A and B, so the cross product between A X B will result in all the attributes of A followed by
each attribute of B. Each record of A will pair with every record of B.
Relation A:
Ram 14 M
Sona 15 F
Kim 20 M
Relation B:
ID Course
1 DS
2 DBMS
Output: If relation A has 3 rows and relation B has 2 rows, the Cartesian product A × B will result in 6
rows.
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Explanation: If A has 'n' tuples and B has 'm' tuples then A X B will have 'n*m' tuples.
Derived operators are built using basic operators and include operations like join, intersection, and
division.
1. Join Operators
Join operations in relational algebra combine data from two or more relations based on a related
attribute, allowing for more complex queries and data retrieval. Different types of joins include:
Inner Join
An inner join combines rows from two relations based on a matching condition and only returns rows
where there is a match in both relations. If a record in one relation doesn't have a corresponding
match in the other, it is excluded from the result. This is the most common type of join.
• Conditional Join: A conditional join is an inner join where the matching condition can involve
any comparison operator like equals (=), greater than (>),
etc. Example: Joining Employees and Departments on DepartmentID where Salary >
50000 will return employees in departments with a salary greater than 50,000
• Equi Join: An equi join is a type of conditional join where the condition is specifically equality
(=) between columns from both
relations. Example: Joining Customers and Orders on CustomerID where both relations have
this column, returning only matching records.
• Natural Join: A natural join automatically combines relations based on columns with the
same name and type, removing duplicate columns in the result. It’s a more efficient way of
joining. Example: Joining Students and Enrollments where StudentID is common in both, and
the result contains only unique columns.
Outer Join
An outer join returns all rows from one relation, and the matching rows from the other relation. If
there is no match, the result will still include all rows from the outer relation with NULL values in the
columns from the unmatched relation.
• Left Outer Join: A left outer join returns all rows from the left relation and the matching rows
from the right relation. If there is no match, the result will include NULL values for the right
relation’s attributes. Example: Joining Employees with Departments using a left outer join
ensures all employees are listed, even those who aren't assigned to any department,
with NULL values for the department columns.
• Right Outer Join: A right outer join returns all rows from the right relation and the matching
rows from the left relation. If no match exists, the left relation's columns will
contain NULL values. Example: Joining Departments with Employees using a right outer join
includes all departments, even those with no employees assigned, filling unmatched
employee columns with NULL.
• Full Outer Join: A full outer join returns all rows when there is a match in either the left or
right relation. If a row from one relation does not have a match in the other, NULL values are
included for the missing side. Example: Joining Customers and Orders using a full outer join
will return all customers and orders, even if there’s no corresponding order for a customer or
no customer for an order.
2. Set Intersection(∩)
Set Intersection basically allows to fetches only those rows of data that are common between two
sets of relational tables. Set Intersection in relational algebra is the same set intersection operation in
set theory.
Example: Consider the following table of Students having different optional subjects in their course.
Relation FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Vivek 13
Geeta 17
Relation GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
From the above table of FRENCH and GERMAN, the Set Intersection is used as follows:
π(Student_Name)(FRENCH ∩ π(Student_Name)(GERMAN)
Output:
Student_Name
Vivek
Geeta
Explanation: The only constraint in the Set Difference between two relations is that both relations
must have the same set of Attributes.
3. Division (÷)
The Division Operator is used to find tuples in one relation that are related to all tuples in another
relation. It’s typically used for "for all" queries.
Student_ID Course_ID
101 C1
101 C2
102 C1
103 C1
103 C2
Course_ID
C1
C2
Example: Query is to find students who are enrolled in all courses listed in the Course table. In this
case, students must be enrolled in both C1 and C2.
Output:
Student_ID
101
103
Domain Relational Calculus in DBMS
Domain Relational Calculus (DRC) is a non-procedural query language used to retrieve information
from a relational database.
It is based on predicate logic and allows the user to specify what data they want to retrieve, but not
how to retrieve it, making it a declarative query language.
Domain Relational Calculus (DRC) is a formal query language for relational databases. It describes
queries by specifying a set of conditions or formulas that the data must satisfy.
{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) }
where, <x1, x2, x3, ..., xn> represents resulting domains variables and P (x1, x2, x3, ..., xn) represents
the condition or formula equivalent to the Predicate calculus.
• Non-procedural: Specifies what data to retrieve without describing the steps for retrieval.
• Based on Predicate Calculus: Utilizes logical expressions (predicates) to describe the query.
• Existential quantifier (∃): Denotes that there exists at least one instance that satisfies a
condition.
• Universal quantifier (∀): Denotes that all instances in the domain satisfy a condition.
• SQL Aggregate Functions are used to perform calculations on a set of rows and return a
single value.
• These functions are particularly useful when we need to summarize, analyze, or group large
datasets in SQL databases.
They are often used with the GROUP BY clause in SQL to summarize data for each group.
Commonly used aggregate functions include COUNT(), SUM(), AVG(), MIN() and MAX().
• Operate on groups of rows: They work on a set of rows and return a single value.
• Ignore NULLs: Most aggregate functions ignore NULL values, except for COUNT(*).
• Used with GROUP BY: To perform calculations on grouped data, you often use aggregate
functions with GROUP BY.
• Can be combined with other SQL clauses: Aggregate functions can be used
alongside HAVING, ORDER BY, and other SQL clauses to filter or sort results.
1. Count()
The COUNT() function returns the number of rows that match a given condition or are present in a
column.
Examples:
2. SUM()
The SUM() function calculates the total sum of a numeric column.
Examples:
3. AVG()
The AVG() function calculates the average of a numeric column. It divides the sum of the column by
the number of non-NULL rows.
Examples:
The MIN() and MAX() functions return the smallest and largest values, respectively, from a column.
Examples:
Let's consider a demo Employee table to demonstrate SQL aggregate functions . This table contains
employee details such as their ID, Name, and Salary.
Id Name Salary
1 A 802
2 B 403
3 C 604
4 D 705
5 E 606
6 F NULL
Output:
TotalEmployees
Output:
TotalSalary
3120
Output:
AverageSalary
624
Output:
HighestSalary LowestSalary
802 403
SQL GROUP BY allows us to group rows that have the same values in specific columns. We can then
apply aggregate functions to these groups, which helps us summarize data for each group. This is
commonly used with the COUNT(), SUM(), AVG(), MIN(), and MAX() functions.
FROM Employee
GROUP BY Name;
Output:
Name TotalSalary
A 802
B 403
C 604
Name TotalSalary
D 705
E 606
F -
The HAVING clause is used to filter results after applying aggregate functions, unlike WHERE, which
filters rows before aggregation. HAVING is essential when we want to filter based on the result of an
aggregate function.
FROM Employee
GROUP BY Name
Output:
Name TotalSalary
A 802
C 604
D 705
E 606
• Aggregate functions in SQL operate on a group of values and return a single result.
• They are often used with the GROUP BY clause to summarize the grouped data.
• Aggregate function operates on non-NULL values only (except COUNT).
• Commonly used aggregate functions are - MIN(), MAX(), COUNT(), AVG(), and SUM().
1. AND Operator
The AND operator is used to combine two or more conditions in an SQL query. It returns
records only when all conditions specified in the query are true. This operator is
commonly used when filtering data that must satisfy multiple criteria simultaneously.
Example
Retrieve the records of employees from the employees table who are located
in 'Allahabad' and belong to 'India', ensuring that both conditions are met.
Query:
SELECT * FROM employee WHERE emp_city = 'Allahabad' AND emp_country = 'India';
Output
output
Explanation:
In the output, both conditions (emp_city = 'Allahabad' and emp_country = 'India') are
satisfied for the listed employees, so these records are returned by the query.
NOT Operator
The NOT operator is used to reverse the result of a condition, returning TRUE when the
condition is FALSE. It is typically used to exclude records that match a specific condition,
making it useful for filtering out unwanted data.
Example
Retrieve the records of employees from the employee table whose city names do not start
with the letter 'A'.
Query:
SELECT * FROM employee WHERE emp_city NOT LIKE 'A%';
Output
output
Explanation:
In this query, the NOT operator negates the LIKE condition. The LIKE operator is used
to match patterns in string data, and the 'A%' pattern matches any city name that starts with
the letter 'A'. By using the NOT operator, we exclude cities starting with 'A' from the result
set.
OR Operator
The OR operator combines multiple conditions in a SQL query and returns TRUE if at
least one of the conditions is satisfied. It is ideal for situations where you want to retrieve
records that meet any of several possible conditions.
Example
Retrieve the records of employees from the employee table who are either
from 'Varanasi' or have 'India' as their country.
Query
SELECT * FROM employee WHERE emp_city = 'Varanasi' OR emp_country = 'India';
Output
output
Explanation:
In this case, the output includes employees from 'Varanasi' as well as those who
have 'India' as their country, even if they are from different cities. The query returns all
records where at least one of the conditions is true.
SQL, predicates are conditions used in WHERE or HAVING clauses to filter data. LIKE, BETWEEN, ALIAS,
and DISTINCT are related concepts, though ALIAS and DISTINCT are not strictly predicates in the same
way LIKE and BETWEEN are. LIKE Predicate.
The LIKE predicate is used for pattern matching in string comparisons. It allows searching for values
that match a specified pattern using wildcard characters: %: Represents zero or more characters and
_: Represents a single character.
Code
BETWEEN Predicate.
The BETWEEN predicate checks if a value falls within a specified range (inclusive of both the lower and
upper bounds). It can be used with numeric, text, or date data types.
Code
SELECT product_name FROM products WHERE price BETWEEN 10.00 AND 50.00;
alias.
An ALIAS is a temporary name given to a table or a column in a SQL query. Aliases make queries more
readable and can simplify complex queries, especially when dealing with joins or long table/column
names. ALIAS is not a predicate itself but a naming convention.
Code
distinct.
The DISTINCT keyword is used with the SELECT statement to eliminate duplicate rows from the result
set, returning only unique values. Like ALIAS, DISTINCT is not a predicate but a clause modifier that
affects the output of the query.
Code
Armstrong's Axioms refer to a set of inference rules, introduced by William W. Armstrong, that are
used to test the logical implication of functional dependencies. Given a set of functional dependencies
F, the closure of F (denoted as F+) is the set of all functional dependencies logically implied by F.
Armstrong's Axioms, when applied repeatedly, help generate the closure of functional dependencies.
These axioms are fundamental in determining functional dependencies in databases and are used to
derive conclusions about the relationships between attributes.
• Axiom of Augmentation: If A→B holds and Y is the attribute set, then AY→BY also holds. That
is adding attributes to dependencies, does not change the basic dependencies. If A→B,
then AC→BC for any C.
• Axiom of Transitivity: Same as the transitive rule in algebra, if A→B holds and B→C holds,
then A→C also holds. A→B is called A functionally which determines B. If X→Y and Y→Z,
then X→Z.
Example:
{A} → {B}
{B} → {C}
{A, C} → {D}
1. Reflexivity: Since any set of attributes determines its subset, we can immediately infer the following:
• {B} → {B}.
• {A, C} → {A}.
2. Augmentation: If we know that {A} → {B}, we can add the same attribute (or set of attributes) to
both sides:
• From {A} → {B}, we can augment both sides with {C}: {A, C} → {B, C}.
• From {B} → {C}, we can augment both sides with {A}: {A, B} → {C, B}.
3. Transitivity: If we know {A} → {B} and {B} → {C}, we can infer that:
Although Armstrong's axioms are sound and complete, there are additional rules for functional
dependencies that are derived from them. These rules are introduced to simplify operations and make
the process easier.
A functional dependency occurs when one attribute uniquely determines another attribute within a
relation. It is a constraint that describes how attributes in a table relate to each other. If attribute A
functionally determines attribute B we write this as the A→B.
Example:
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building }→ Here, roll_no can determine values of fields
name, dept_name and dept_building, hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and
Y is the subset of X, then it is called trivial functional dependency.
Example 1 :
• ABC -> AB
• ABC -> A
Example 2:
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset
of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial functional
dependency.
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e.
If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example 1 :
• Id -> Name
Example 2:
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}
Real-life dependencies emerge when, within one relationship, the value of one of the attributes is
capable of identifying the value of another attribute. Simply put, if A fully determines B without any
proper subset of A giving full cause to B, then A is said to fully depend on B. In other words, you cannot
derive B from a smaller part of A.
• Data Integrity: Instantaneous reference derives ones and only one value from others on the
elements, which provides data integrity.
• Normalization: It plays a role in recording large tables into smaller, structured tables, which
reduces redundancy and helps people to handle.
• Update Anomalies: All major operations will be enabled by this through reduced anomalies in
updates.
• If “F” is a functional dependency then closure of functional dependency can be denoted using
“{F}+”.
Step-1 : Add the attributes which are present on Left Hand Side in the original functional dependency.
Step-2 : Now, add the attributes present on the Right Hand Side of the functional dependency.
Step-3 : With the help of attributes present on Right Hand Side, check the other attributes that can be
derived from the other given functional dependencies. Repeat this process until all the possible
attributes which can be derived are added in the closure.
Normal Forms in DBMS
Normal Forms are important for ensuring that data is structured logically, reducing redundancy, and
maintaining data integrity.
What is Normalization in DBMS?
Normalization is a systematic approach to organize data within a database to reduce redundancy and
eliminate undesirable characteristics such as insertion, update, and deletion anomalies.
The process involves breaking down large tables into smaller, well-structured ones and defining
relationships between them. This not only reduces the chances of storing duplicate data but also
improves the overall efficiency of the database.
Why is Normalization Important?
• Reduces Data Redundancy: Duplicate data is stored efficiently, saving disk space and
reducing inconsistency.
• Improves Data Integrity: Ensures the accuracy and consistency of data by organizing it in a
structured manner.
• Simplifies Database Design: By following a clear structure, database designs become easier
to maintain and update.
• Optimizes Performance: Reduces the chance of anomalies and increases the efficiency of
database operations.