DATABASE MANAGEMENT SYSTEM
BCS-501
UNIT 2
Prepared by –
ABHISHEK CHAUDHARY
ASSISTANT PROFESSOR
DEPARTMENT OF COMPUTER SCIENCE
SYLLABUS
Relational Data Model Overview
• Developed by E.F. Codd in 1970, the relational model organizes data into tables
(relations) with rows and columns.
• Each table represents real-world entities and relationships, making it easier to
manage and query than hierarchical or network databases.
• Examples of relational databases include MySQL, Oracle, DB2, and SQL Server.
Key Terminologies
• Relation (Table): A table with rows and columns.
Example: A Student table with columns like Stu_No, S_Name, and Gender.
• Tuple (Row): A single record in a table.
Example: A row in the Student table, like (1012, ‘Rama’, ‘F’)
• Attribute (Column): A single property of a relation.
Example: Stu_No, S_Name, Gender.
• Domain: The set of valid values for an attribute.
Example: The domain for Gender is {M, F}.
• Cardinality: The number of rows in a table.
Example: If Student has 5 rows, its cardinality is 5.
• Degree: The number of columns in a table.
Example: If Student has 5 attributes, its degree is 5.
• Relational Schema: Defines the table’s structure, including names and data
types for each [Link]: Student schema could be Student(Stu_No: INT,
S_Name: VARCHAR, Gender: CHAR).
• Relational Key: Unique identifiers for rows (e.g., primary key).
Example: Stu_No uniquely identifies each student.
Properties of Relations
• Each table has a unique name, and each attribute also has a unique name.
• Tables do not have duplicate rows; each row is distinct.
• Each cell contains a single (atomic) value.
• Order of rows and columns is not significant. → The order of rows and
columns doesn’t change the meaning of the data.
Basic Operations
• Insert: Adds new rows to a table.
• Delete: Removes rows from a table.
• Update: Changes values in rows.
• Retrieve: Fetches data based on queries.
Integrity Constraints
• Integrity constraints are rules that ensure data remains accurate, consistent, and
reliable in a database.
• They prevent errors and keep the data meaningful.
1. Entity Integrity
It constraints enforce that every table must have a primary key that uniquely
identifies each row, and that primary key cannot be NULL.
• Entity Integrity Constraint: Product_ID (primary key) must not be NULL.
• Explanation: The row with NULL in Product_ID violates the entity integrity
constraint, as the primary key must uniquely identify each row.
• Violation Example: Every product should have a non-null unique Product_ID.
2. Referential Integrity
It ensures that foreign keys in a table correctly reference primary keys in another
related table.
• Referential Integrity Rule: Each StudentID in Enrollment must exist in the
Student table.
• Explanation: In the Enrollment table, StudentID acts as a foreign key
referencing the primary key StudentID in the Student table.
• Violation Example: In the row with EnrollmentID 3, the StudentID 104 does
not exist in the Student table, which violates referential integrity since it
references a non-existent student.
3. Key Constraints
Key constraints ensure that the value in a column or set of columns must uniquely
identify each row. A table can have multiple keys, but one will be designated as the
primary key.
• Key Constraint: Customer_ID must be unique.
• Explanation: The row with Customer_ID C001 violates the key constraint
because it appears more than once in the table.
• Violation Example: The database should not allow two customers with the same
Customer_ID.
4. Domain Constraints
Domain constraints define the set of valid values for a given column. They ensure
that only permissible data types and values are allowed in the column.
• Domain Constraint: Salary must be a positive integer.
• Explanation: The Salary column can only contain numerical values that are
positive (no negative values or invalid data types).
• Violation Example: If the salary for an employee is entered as -5000, it would
violate the domain constraint.
Relational Algebra
Relational Algebra is a procedural query language used to
work with data in relational databases. It uses a set of
operators to retrieve, filter, and combine data, and it
provides a foundation for SQL and other database query
languages.
Each operator takes relations (tables) as input and outputs
a new relation, making it easy to build complex queries by
combining operations.
Basic Operators in Relational Algebra
1. Selection (σ)
• Selection Operator (σ) is a unary operator in relational algebra that performs a selection
operation.
• It selects tuples (or rows) that satisfy the given condition (or predicate) from a relation.
• It is denoted by sigma (σ).
Notation –
• p is used as a propositional logic formula which may use logical connectives: ∧ (AND), ∨
(OR), ! (NOT) and relational operators like =, ≠, <, >, ≤, ≥ to form the condition.
• The WHERE clause of a SQL command corresponds to relational select σ( ).
SQL: SELECT * FROM R WHERE C;
Example: Select tuples from student table whose age is greater than 17.
2. Projection (π)
• Projection Operator (∏) is a unary operator in relational algebra that performs a
projection operation.
• It projects (or displays) the particular columns (or attributes) from a relation and
• It deletes column(s) that are not in the projection list.
• It is denoted by ∏.
Notation –
• Where A1, A2, … An are attribute names of relation r.
• Duplicate rows are automatically eliminated from result.
3. Union (U)
• Suppose R and S are two relations. The Union operation selects all the tuples that
are either in relations R or S or in both relations R & S.
• It eliminates the duplicate tuples.
For a union operation to be valid, the following conditions must hold –
1. Two relations R and S both have same number of attributes.
2. Corresponding attribute (or column) have the same domain (or type).
- The attributes of R and S must occur in the same order.
3. Duplicate tuples should be automatically removed.
Symbol: U
Notation: R ∪ S
RA: R ∪ S
SQL: (SELECT * FROM R) UNION (SELECT * FROM S);
4. Set Difference (–)
Suppose R and S are two relations. The Set Difference operation selects all the
tuples that are present in first relation R but not in second relation S.
For a Set Difference to be valid, the following conditions must hold –
1. Two relations R and S both have same number of attributes.
2. Corresponding attribute (or column) have the same domain (or type).
- The attributes of R and S must occur in the same order.
Symbol: –
Syntax: R – S
RA: R – S
SQL: (SELECT * FROM R) EXCEPT (SELECT * FROM S);
5. Set Intersection (∩)
Suppose R and S are two relations. The Set Intersection operation selects all
the tuples that are in both relations R & S.
For a Set Intersection to be valid, the following conditions must hold –
1. Two relations R and S both have same number of attributes.
2. Corresponding attribute (or column) have the same domain (or type).
- The attributes of R and S must occur in the same order.
Symbol: ∩
Syntax: R ∩ S
RA: R ∩ S
SQL: (SELECT * FROM R) INTERSECT (SELECT * FROM S);
6. Rename (ρ)
• The results of relational algebra are also relations but without any name.
• The RENAME operator is used to rename the output of a relation.
• Sometimes it is simple and suitable to break a complicated sequence of operations
and rename it as a relation with different names.
Reasons to rename a relation can be many, like:
- We may want to save the result of a relational algebra expression as a relation so
that we can use it later.
- We may want to join (or cartesian product) a relation with itself, in that case, it
becomes too confusing to specify which one of the tables we are talking about, in
that case, we rename one of the tables and perform join operations on them.
Symbol: rho ρ
Notation 1:
Where the symbol ‘ρ’ is used to denote the RENAME operator and E is the result of
expression or sequence of operation which is saved with the name X.
Notation 2:
It returns the result of expression E under the name X, and with the attributes
renamed to A1, A2, …, An.
Notation 3:
It returns the result of expression E with the attributes renamed to A1, A2, …, An.
7. Cartesian Product (X)
• Cartesian Product is fundamental operator in relational algebra.
• Cartesian Product combines information of two different relations into one.
• It is also called Cross Product.
• Generally, a Cartesian Product is never a meaningful operation when it is
performed alone. However, it becomes meaningful when it is followed by other
operations.
• Generally it is followed by select operations.
Symbol: ×
Notation: R1 × R2
If relation R1 and R2 have a & b attributes respectively…
1. If relation R1 and R2 have a & b attributes respectively, then resulting
relation will have a + b attributes from both the input relations.
2. If relation R1 and R2 have n1 & n2 tuples respectively, then resulting relation
will have n1 × n2 tuples, combining each possible pair of tuples from both the
relations.
If both input relation have some attribute having same name, change the name
of the attribute with the name of the relation “relation_name.attribute_name”.
Additional Operators in Relational
Algebra
Cartesian product of two relations (A × B), gives us all the possible tuples that
are paired together.
• But it might not be feasible in certain cases to take a Cartesian product
where we encounter huge relations with thousands of tuples having a
considerable large number of attributes.
Join Operation (⋈)
• Join is an Additional / Derived operator which simplify the queries, but does
not add any new power to the basic relational algebra.
• Join is a combination of a Cartesian product followed by a selection process.
Join = Cartesian Product + Selection
• A Join operation pairs two tuples from different relations, if and only if a given
join condition is satisfied.
• Symbol: ⋈
Difference
Joins (⋈):
• Combination of tuples that satisfy the filtering/matching conditions
• Fewer tuples than cross product, might be able to compute efficiently
Cartesian Product / Cross Product / Cross Join (X):
• All possible combination of tuples from the relations
• Huge number of tuples and costly to manage
Types of JOINS
1. Inner Join:
Contains only those tuples that satisfy the matching condition
• Theta(θ) join
• Equi join
• Natural join
2. Outer join:
• Extension of join
• Contains matching tuples that satisfy the matching condition, along with some or all
tuples that do not satisfy the matching condition
• Contains all rows from either one or both relation
• Left Outer Join
• Right Outer Join
• Full Outer Join
Inner Join
• An Inner join includes only those tuples that satisfy the
matching criteria, while the rest of tuples are excluded.
• Theta Join, Equi join, and Natural Join are called
inner joins.
1. Theta(θ) / Conditional join
Theta join / Conditional Join
• It combines tuples from different relations provided they satisfy the theta (θ)
condition.
• It is a general case of join. And it is used when we want to join two or more
relation based on some condition.
• The join condition is denoted by the symbol θ.
• It uses all kinds of comparison operators like <, >, <=, >=, =, ≠
• Notation:
Where θ is a predicate/condition. It can use any comparison operator (<,
>, <=, >=, =, ≠)
2. Equi Join
• When a theta join uses only equivalence (=) condition, it becomes a Equi
join.
• Equi join is a special case of theta (or conditional) join where condition
contains equalities (=).
Notation:
3. Natural Join
• Natural join can only be performed if there is at least one common attribute
(column) that exist between two relations. In addition, the attributes must have
the same name and domain.
• Natural join does not use any comparison operator.
• It is same as equi join which occurs implicitly by comparing all the common
attributes (columns) in both relation, but difference is that in Natural Join the
common attributes appears only once. The resulting schema will change.
• Notation: A ⋈ B
• The result of the natural join is the set of all combinations of tuples in two
relations A and B that are equal on their common attribute names.
Note:
• The Natural Join of two relations can be obtained by applying a Projection
operation to Equi join of two relations. In terms of basic operators:
✅ Natural Join = Cartesian product + Selection + Projection
• Natural Join (⋈) is by default inner join because the tuples which does not
satisfy the conditions of join does not appear in result set.
• Natural Join is very important.
Outer Join
• An Inner join includes only those tuples with matching attributes and the
rest are discarded in the resulting relation. Therefore, we need to use outer
joins to include all the rest of the tuples from the participating relations in
the resulting relation.
• The outer join operation is an extension of the join operation that avoids
loss of information.
• Outer Join contains matching tuples that satisfy the matching condition,
along with some or all tuples that do not satisfy the matching condition.
• It is based on both matched or unmatched tuple.
• It contains all rows from either one or both relations are present.
• It uses NULL values.
Outer Join Types
Outer Join = Natural Join + Extra information (from left table, right table or
both table)
There are three kinds of outer joins:
• Left outer join
• Right outer join
• Full outer join
1. Left outer join (R1 ⟕ R2)
• When applying join on two relations R1 and R2, some tuples of R1 or R2 does
not appear in result set which does not satisfy the join conditions. But..
• In Left outer join, all the tuples from the Left relation R1 are included in
the resulting relation. The tuples of R1 which do not satisfy join condition
will have values as NULL for attributes of R2.
• In short:
• All record from left table
• Only matching records from right table
• Symbol: ⟕
• Notation: R1 ⟕ R2
2. Right outer join (R1 ⟖ R2)
• In Right outer join, all the tuples from the right relation R2 are included
in the resulting relation. The tuples of R2 which do not satisfy join
condition will have values as NULL for attributes of R1.
• In short:
• All record from right table
• Only matching records from left table
• Symbol: ⟖
• Notation: R1 ⟖ R2
3. Full outer join (R1 ⟗ R2)
• In Full outer join, all the tuples from both Left relation R1 and right relation
R2 are included in the resulting relation. The tuples of both relations R1 and
R2 which do not satisfy join condition, their respective unmatched attributes are
made NULL.
• In short:
• All record from all table
• Symbol: ⟗
• Notation: R1 ⟗ R2
Division Operator (÷, /)
• Division operator is a Derived Operator, not supported as a primitive operator.
• Suited to queries that include the keyword “all” or “every” like “at all”, “for all” or
“in all”, “at every”, “for every” or “in every”. Eg:
• Find the person that has account in all the banks of a particular city
• Find sailors who have reserved all boats.
• Find employees who works on all projects of company.
• Find students who have registered for every course.
• In all these queries, the description after the keyword “all” or “every” defines a set
which contains some elements and the final result contains those records who
satisfy these requirements.
• Notation:
A ÷ B or A / B
where A and B are two relations
• Division operator can be applied if and only if:
o Attributes of B is proper subset of Attributes of A.
o The relation returned by division operator will have attributes = (All
attributes of A – All attributes of B).
o The relation returned by division operator will return those tuples from
relation A which are associated to every B’s tuple.
Expressing A/B Using Basic Operators
• Division is a derived operator (or additional operator).
• Division can be expressed in terms of Cross Product, Set-Difference and
Projection.
• Idea:
For A/B, compute all x values that are not ‘disqualified’ by some y value in B.
• x value is disqualified if by attaching y value from B, we obtain an xy tuple
that is not in A.
• Disqualified x values:
So,
• A/B= – all disqualified tuples
• A/B= –
Extended Relational Algebra
Extended Relational Algebra increases power over basic relational algebra.
• Generalized Projection
• Aggregate Functions
• Outer Join
1. Generalized Projection
• Normal projection only projects the columns whereas generalized projection
allows arithmetic operations on those projected columns.
• Generalized Projection extends the projection operation by allowing arithmetic
functions to be used in the projection list.
• E is any relational-algebra expression.
• Each of F1, F2, …, Fn are arithmetic expressions involving constants and
attributes in the schema of E.
Aggregate Functions and Operations
Aggregation function takes a collection of values and returns a single value as a
result.
• avg: average value
• min: minimum value
• max: maximum value
• sum: sum of values
• count: number of values
• These operations can be applied on entire relation or certain groups of
tuples.
• It ignores NULL values except count.
• Generalize form (g) of Aggregate operation:
• E is any relational-algebra expression
• G1, G2, …, Gn is a list of attributes on which to group (can be empty)
• Each Fi is an aggregate function
• Each Ai is an attribute name
?????
?????
• Result of aggregation does not have a name
• Can use rename operation to give it a name
• For convenience, we permit renaming as part of aggregate operation using ‘as’
keyword
Relational Calculus
• Relational Calculus is a non-procedural query language (or declarative
language).
It uses mathematical predicate calculus (or first-order logic) instead of algebra.
• Relational Calculus tells what to do but never explains how to do.
• Relational Calculus provides description about the query to get the result
whereas Relational Algebra gives the method to get the result.
• When applied to database, it comes in two flavors:
1. Tuple Relational Calculus (TRC):
• Proposed by Codd in the year 1972
• Works on tuples (or rows)
2. Domain Relational Calculus (DRC):
• Proposed by Lacroix & Pirotte in the year 1977
• Works on domain of attributes (or Columns)
Calculus
• Calculus has variables, constants, comparison operator, logical connectives
and quantifiers.
• TRC: Variables range over tuples.
✔ Like SQL
• DRC: Variables range over domain elements.
✔ Like Query-By-Example (QBE)
• Expressions in the calculus are called formulas.
• Resulting tuple is an assignment of constants to variables that make the
formula evaluate to true.
1. Tuple Relational Calculus (TRC)
• Tuple relational calculus is a non-procedural query language
• Tuple relational calculus is used for selecting the tuples in a relation that satisfy
the given condition (or predicate).
•
The result of the relation can have one or more tuples.
• A query in TRC is expressed as:
{ t | P(t) }
• Where t denotes resulting tuple and P(t) denotes predicate (or condition) used to
fetch tuple t.
• Result of Query:
It is the set of all tuples t such that predicate P is true for t.
• Notations used:
• t is a tuple variable
• t[A] denotes the value of tuple t on attribute A
• t ∈ r denotes that tuple t is in relation r
• P is a formula similar to that of the predicate calculus
Predicate Calculus Formula
• Set of attributes and constants
• Set of comparison operators: e.g., <, ≤, =, ≠, >, ≥
• Set of connectives: and (∧), or (∨), not (¬)
• Implication (⇒): x ⇒ y, if x is true, then y is true
• Quantifiers: Existential Quantifiers (∃) and Universal Quantifier (∀)
• ∃ t ∈ r (Q(t)) ≡ “there exists” a tuple in t in relation r such that predicate
Q(t) is true
• ∀ t ∈ r (Q(t)) ≡ Q is true “for all” tuples t in relation r
Free and Bound variables:
• The use of quantifiers (∃X and ∀X) in a formula is said to bind X in the
formula.
• A variable that is not bound is free.
• Let us revisit the definition of a query:
• { t | P(t) }
• There is an important restriction
• – the variable t that appears to the left of | must be the only free variable
in the formula P(t).
• – in other words, all other tuple variables must be bound using a
quantifier.
2. Domain Relational Calculus (DRC)
• Domain Relational Calculus is a non-procedural query language.
• In Domain Relational Calculus the records are filtered based on the
domains.
• DRC uses list of attributes to be selected from relation based on the
condition (or predicate).
• DRC is same as TRC but differs by selecting the attributes rather than
selecting whole tuples.
• In DRC, each query is an expression of the form:
• a₁, a₂, …, aₙ represent domain variables
• P represents a predicate similar to that of the predicate calculus
• Result of Query: It is the set of all tuples a₁, a₂, …, aₙ such that predicate P
is true for a₁, a₂, …, aₙ tuples
Predicate Calculus Formula
where r is relation on n attributes and a₁, a₂, …, aₙ are domain variables or domain
constants
P is a formula similar to that of the predicate calculus
Formula:
• Set of domain variables and constants
• Set of comparison operators: e.g., <, ≤, =, ≠, >, ≥
• Set of connectives: and (∧), or (∨), not (¬)
• Implication (⇒): x ⇒ y, if x is true, then y is true
Quantifiers:
• Existential Quantifiers (∃) and Universal Quantifier (∀)
• ∃x (P(x)) and ∀x (P(x))
• x is Free domain variable
SQL OPERATORS
[Link]
sp
[Link]
operators/