0% found this document useful (0 votes)
25 views

lec07 (1)

Uploaded by

littlen991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

lec07 (1)

Uploaded by

littlen991
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

CS3402 : Chapter 7

Relational Algebra

CS3402 1
Relational Algebra
 Relational algebra: a formal language for the relational model
 The operations in relational algebra enable a user to specify basic
retrieval requests (or queries)

 Relational algebra consists of a set of operations on relations to


generate relations
 The result of an operation is a new relation
 They can be further manipulated using operations

 A sequence of relational algebra operations forms a relational


algebra expression

CS3402 2
Importance of Relational Algebra
 Foundation of SQL: Relational algebra forms the theoretical
foundation of SQL. SQL is a practical implementation of the
concepts and operations defined in relational algebra. By learning
relational algebra, you gain a deeper understanding of the
fundamental principles that underpin SQL, allowing you to write
better SQL queries

 Query optimization: Relational algebra provides a mathematical


framework for reasoning about the efficiency and optimization of
queries. Understanding relational algebra helps you analyze the
complexity of your queries, identify potential performance
bottlenecks, and devise strategies to optimize them.

CS3402 3
Relational Algebra Overview
 Relational algebra consists of several groups of operations
 Unary Relational Operations
SELECT (symbol:  (sigma))
PROJECT (symbol:  (pi))
RENAME (symbol:  (rho))
 Binary Relational Operations
JOIN (several variations of JOIN exist)
DIVISION
 Relational algebra operations from set theory
UNION (  ), INTERSECTION (  ), DIFFERENCE (or MINUS, – )
CARTESIAN PRODUCT ( x )
 Additional Relational Operations
AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)

CS3402 4
Database State for COMPANY
 All examples discussed below refer to the COMPANY database
shown here

Slide 8- 5
CS3402 5
The following query results refer to this
database state

CS3402 6
Unary Relational Operations: SELECT
 The SELECT operation (denoted by  (sigma)) is used to select a
subset of the tuples from a relation based on a selection condition.

 The selection condition acts as a filter


 Keeps only those tuples that satisfy the qualifying condition
 Horizontal partitioning
 Tuples satisfying the condition are selected whereas the other
tuples are discarded (filtered out)

 The general form of the select operation is:


 <condition>(R)

CS3402 7
Unary Relational Operations: SELECT
 Examples1:
 Select the EMPLOYEE tuples whose department number is 4:
 DNO = 4 (EMPLOYEE)

Equivalent to:
SELECT *
FROM EMPLOYEE
WHERE DNO=4;

CS3402 8
Unary Relational Operations: SELECT
 Examples2:
 Select the employee tuples whose department number is 4 and
salary is greater than $25,000 or department number is 5 and
salary is greater than $30,000:
 (Dno =4 AND Salary>25,000 ) OR
(Dno=5 AND Salary> 30 ,000)(EMPLOYEE)

Equivalent to:
SELECT *
FROM EMPLOYEE
WHERE (Dno=4 AND
Salary>25,000) OR
(Dno=5 AND
Salary>30,000)

CS3402 9
Unary Relational Operations: SELECT
 SELECT Operation Properties

 The SELECT operation  <selection condition>(R) produces a relation


S that has the same schema (same attributes) as R

 SELECT  is commutative:
 <condition1>( < condition2> (R)) =  <condition2> ( < condition1> (R))

 Because of commutativity property, a cascade (sequence) of


SELECT operations may be applied in any order:
<cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> (
R)))

CS3402 10
Unary Relational Operations: SELECT
 SELECT Operation Properties

 A cascade of SELECT operations may be replaced by a single


selection with a conjunction (and) of all the conditions:
<cond1>(< cond2> (<cond3>(R)) =  <cond1> AND < cond2> AND <
cond3>(R)))

 The number of tuples in the result of a SELECT is less than (or


equal to) the number of tuples in the input relation R.
 The fraction of tuples selected by a selection condition is called
the selectivity of the condition.

CS3402 11
Unary Relational Operations: PROJECT
 PROJECT Operation is denoted by  (pi)

 This operation keeps certain attributes from a relation and discards


the other attributes
 PROJECT creates a vertical partitioning
The list of specified attributes is kept in each tuple
The other attributes in each tuple are discarded

 The general form of the project operation is:


<attribute list>(R)
  (pi) is the symbol used to represent the project operation
 <attribute list> is the desired list of attributes from relation R

CS3402 12
Unary Relational Operations: PROJECT
 Example: To list each employee’s first and last name and salary,
the following is used:
LNAME, FNAME,SALARY(EMPLOYEE)

Equivalent to:

SELECT LNAME,
FNAME, SALARY
FROM EMPLOYEE;

CS3402 13
Examples of applying SELECT and
PROJECT operations
 The project operation removes any duplicate tuples
 This is because the result of the project operation must be a set of
tuples
Mathematical sets do not allow duplicate elements

 Example: Sex,Salary(EMPLOYEE)

CS3402 Duplicated tuples 14


Unary Relational Operations: PROJECT
 PROJECT Operation Properties

 The number of tuples in the result of projection <list>(R) is


always less (duplicates are removed) or equal (unique values)
to the number of tuples in R.
If the list of attributes includes a key of R, then the number
of tuples in the result of PROJECT is equal to the number of
tuples in R.
 PROJECT is not commutative
 <list1> ( <list2> (R) ) ≠  <list2> ( <list1> (R) )
 <list1> ( <list2> (R) ) =  <list1> (R) as long as <list2> contains
the attributes in <list1>
List1 = LNAME, FNAME
List2 = LNAME, FNAME, SALARY

CS3402 15
Relational Algebra Expressions
 We may want to apply several relational algebra operations one
after the other
 Either we can write the operations as a single relational algebra
expression by nesting the operations, or
 We can apply one operation at a time and create intermediate
result relations

 In the latter case, we must give names (rename) to the relations


that hold the intermediate results

CS3402 16
Single expression versus sequence of
relational operations
 Example: To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must apply a
select and a project operation

 We can write a single relational algebra expression as follows:


 FNAME, LNAME, SALARY( DNO=5(EMPLOYEE))

 OR We can explicitly show the sequence of operations, giving a


name to each intermediate relation:
 TEMP  DNO=5(EMPLOYEE)
 RESULT   FNAME, LNAME, SALARY (TEMP)

CS3402 17
Unary Relational Operations: RENAME
 The RENAME operator is denoted by  (rho)

 The general RENAME operation  can be expressed by any of the


following forms:
 S (B1, B2, …, Bn )(R) changes both:
the relation name to S, and
the attribute names to B1, B2, …..Bn
 S(R) changes:
the relation name only to S
 (B1, B2, …, Bn )(R) changes:
the attribute names only to B1, B2, …..Bn
SELECT E.Fname, E.Lname, E.Salary
FROM EMPLOYEE AS E
WHERE E.Dno = 5;
CS3402 18
Unary Relational Operations: RENAME
 If we write:
• RESULT   FNAME, LNAME, SALARY (TEMP )
• RESULT will have the same attribute names as TEMP

 If we write:
•  R (First_name , Last_name , Salary))(RESULT )
• The 3 attributes of RESULT are renamed to First_name ,
Last_name and Salary, respectively; and R is the name of the
result relation.

Note: the  symbol is an assignment operator

CS3402 19
Example of applying multiple operations
and RENAME

Slide 8- 20
CS3402 20
Set Theory: UNION
 UNION Operation
 Binary operation, denoted by 

 The result of R  S, is a relation that includes all tuples that are


either in R or in S or in both R and S

 Duplicate tuples are eliminated

 The two operand relations R and S must be “type compatible”


(or UNION compatible)
R and S must have same number of attributes
Each pair of corresponding attributes must be type
compatible (have same or compatible domains)

CS3402 21
Relational Algebra Operations from
Set Theory: UNION
 Example:
 To retrieve the social security numbers of all employees who
either work in department 5 (RESULT1 below) or directly
supervise an employee who works in department 5 (RESULT2
below)

 We can use the UNION operation as follows:


DEP5_EMPS  DNO=5 (EMPLOYEE)
RESULT1   SSN(DEP5_EMPS)
RESULT2  SUPERSSN(DEP5_EMPS)
RESULT  RESULT1  RESULT2

 The union operation produces the tuples that are in either


RESULT1 or RESULT2 or both
CS3402 22
Figure 8.3 Result of the UNION
operation RESULT ← RESULT1
RESULT2.

RESULT  RESULT1  RESULT2

CS3402 23
Relational Algebra Operations from Set
Theory: INTERSECTION
 INTERSECTION is denoted by 

 The result of the operation R  S, is a relation that includes all tuples


that are in both R and S

 The two operand relations R and S must be “type compatible”

CS3402 24
Relational Algebra Operations from Set
Theory: SET DIFFERENCE
 SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by –

 The result of R – S, is a relation that includes all tuples that are in R


but not in S

 The two operand relations R and S must be “type compatible”

 R  S = (R  S) – (R – S)) – (S – R)

CS3402 25
Example to illustrate the result of UNION,
INTERSECT, and DIFFERENCE

CS3402 26
Example to illustrate the result of UNION,
INTERSECT, and DIFFERENCE

CS3402 27
Example to illustrate the result of UNION,
INTERSECT, and DIFFERENCE

CS3402 28
Example to illustrate the result of UNION,
INTERSECT, and DIFFERENCE

CS3402 29
Requirements of UNION, INTERSECT, and
DIFFERENCE
 Type compatibility of operands is required for the binary set
operation UNION , (also for INTERSECTION , and SET
DIFFERENCE –)

 R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:
 they have the same number of attributes, and
 the domains of corresponding attributes are type compatible
(i.e. dom(Ai)=dom(Bi) for i=1, 2, ..., n)

 It does not require R1 and R2 have same attribute name. The


resulting relation for R1R2 (also for R1R2, or R1–R2) has the
same attribute names as the first operand relation R1

CS3402 30
Properties of UNION, INTERSECT, and
DIFFERENCE
 Notice that both union and intersection are commutative operations;
that is
 R  S = S  R, and R  S = S  R

 The minus operation is not commutative


R – S ≠ S – R

 Both union and intersection can be treated as n-ary operations


applicable to any number of relations as both are associative
operations; that is
 R  (S  T) = (R  S)  T
 (R  S)  T = R  (S  T)

CS3402 31
Relational Algebra Operations from Set
Theory: CARTESIAN PRODUCT
 CARTESIAN (or CROSS) PRODUCT Operation
 This operation is used to combine tuples from two relations in a
combinatorial fashion
 Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
 Result is a relation Q with degree n + m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order
 The resulting relation state has one tuple for each combination
of tuples - one from R and one from S
 Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS
tuples, then R x S will have nR * nS tuples
 The two operands R and S do NOT have to be "type
compatible”

CS3402 32
Relational Algebra Operations from Set
Theory: CARTESIAN PRODUCT
 Generally, CROSS PRODUCT is not a meaningful operation
 Some tuples in the result do not exist in the mini-world
 Can become meaningful when followed by other operations

 Example (not meaningful):


 FEMALE_EMPS   SEX=’F’(EMPLOYEE)
 EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)
 EMP_DEPENDENTS  EMPNAMES x DEPENDENT

 EMP_DEPENDENTS will contain every combination of


EMPNAMES and DEPENDENT
 whether or not they are actually related

CS3402 33
Figure 8.5 The CARTESIAN PRODUCT
(CROSS PRODUCT) operation

CS3402 34
Figure 8.5 The CARTESIAN PRODUCT
(CROSS PRODUCT) operation

CS3402 35
Figure 8.5 The CARTESIAN PRODUCT
(CROSS PRODUCT) operation

CS3402 36
Relational Algebra Operations from Set
Theory: CARTESIAN PRODUCT
 To keep only combinations where the DEPENDENT is related to
the EMPLOYEE, we add a SELECT operation as follows

 Example (meaningful):
 FEMALE_EMPS   SEX=’F’(EMPLOYEE)
 EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)
 EMP_DEPENDENTS  EMPNAMES x DEPENDENT
 ACTUAL_DEPS   SSN=ESSN(EMP_DEPENDENTS)
 RESULT   FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)

 RESULT will now contain the name of female employees and their
dependents

CS3402 37
The CARTESIAN PRODUCT (CROSS PRODUCT)
operation

CS3402 38
Binary Relational Operations: JOIN
 JOIN Operation (denoted by )
 The sequence of CARTESIAN PRODUCT followed by SELECT
is used quite commonly to identify and select related tuples from
two relations
 A special operation, called JOIN combines this sequence into a
single operation
 The general form of a join operation on two relations R(A1, A2, .
. ., An) and S(B1, B2, . . ., Bm) is:
R <join condition>S

 R and S can be any relations that result from general relational


algebra expressions
 R and S are not required to be type compatible.

CS3402 39
Binary Relational Operations: JOIN
 Example: Suppose that we want to retrieve the name of the manager
of each department
 To get the manager’s name, we need to combine each
DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value
matches the MGRSSN value in the department tuple.
 DEPT_MGR  DEPARTMENT MGRSSN=SSN EMPLOYEE

 MGRSSN=SSN is the join condition


 Combines each department record with the employee who
manages the department

CS3402 40
Figure 8.6 Result of the JOIN operation

DEPT_MGR ← DEPARTMENT Mgr_ssn=SsnEMPLOYEE

CS3402 41
Some properties of JOIN
 Consider the following JOIN operation:
 R(A1, A2, . . ., An) S(B1, B2, . . ., Bm)
R.Ai=S.Bj
 Result is a relation Q with degree n + m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order
 The resulting relation state has one tuple for each combination
of tuples: r from R and s from S, but only if they satisfy the join
condition r[Ai]=s[Bj]
 Hence, if R has nR tuples, and S has nS tuples, then the join
result will generally have less than nR x nS tuples.

CS3402 42
Theta-join
 The general case of JOIN operation is called a Theta-join:
 R <conditions> S
 The join condition is called theta
 Theta can be any general boolean expression on the attributes of R
and S; for example:
 R.Ai < S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)

 Most join conditions involve one or more conditions “AND”ed


together; for example:
 R.Ai=S.Bj AND R.Ak>S.Bl AND R.Ap<S.Bq

CS3402 43
EQUIJOIN
 EQUIJOIN Operation

 The most common use of join involves join conditions with equality
comparisons only

 Such a join, where the only comparison operator used is =, is called


an EQUIJOIN
 In the result of an EQUIJOIN we always have one or more pairs
of attributes (whose names need not be identical) that have
identical values in every tuple
 The JOIN seen in the previous example was an EQUIJOIN

CS3402 44
NATURAL JOIN Operation
 NATURAL JOIN Operation
 Another variation of JOIN called NATURAL JOIN — denoted by
* was created to get rid of the second (superfluous) attribute in
an EQUIJOIN condition
because one of each pair of attributes with identical values
is superfluous

 The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have
the same name in both relations.
e.g. Q  R(A,B,C,D) * S(C,D,E)
 The implicit join condition includes each pair of attributes with
the same name, “AND”ed together: R.C=S.C AND R.D=S.D
 Result keeps only one attribute of each such pair:
CS3402 Q(A,B,C,D,E) 45
NATURAL JOIN
 Example: Suppose we want to combine each PROJECT tuple with
the DEPARTMENT controlling it.

 We first rename the Dnumber attribute of DEPARTMENT to Dnum,


so that it has the same name as the Dnum attribute in PROJECT,
and then we apply NATURAL JOIN.

DEPT ← ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT)


PROJ_DEPT ← PROJECT * DEPT

 The attribute Dnum is called the join attribute for NATURAL JOIN,
because it is the attribute with the same name in both relations.

 Only one join attribute value is kept.

CS3402 46
Example of NATURAL JOIN operation

PROJ_DEPT  PROJECT * DEPT

DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS

CS3402 47
Binary Relational Operations: DIVISION
 DIVISION Operation
 The division operation is applied to two relations
 R(Z)  S(X), where X is a subset of Z
 Let Y = Z - X (and hence Z = X  Y); that is, let Y be the set of
attributes of R that are not attributes of S

 The result of DIVISION is a relation T(Y) that includes a tuple t if


tuples tR appear in R with tR [Y] = t, and with
tR [X] = ts for every tuple ts in S

 For a tuple t to appear in the result T of the DIVISION, the


values in t must appear in R in combination with every tuple in
S.

CS3402 48
Binary Relational Operations: DIVISION
 Example: retrieve the Social Security numbers of employees who
work on all the projects that ‘John Smith’ works on.

 First, retrieve the list of project numbers that ‘John Smith’ works on
in the intermediate relation SMITH_PNOS:
SMITH ← σ Fname=‘John’ AND Lname=‘Smith’ (EMPLOYEE)
SMITH_PNOS ← π Pno(WORKS_ON Essn=SsnSMITH)

 Next, we create a relation that includes a tuple <Essn, Pno> for all
employees:
SSN_PNOS ← π Essn, Pno(WORKS_ON)

 Finally, apply the DIVISION operation to the two relations, which


gives the desired employees’ Social Security numbers:
CS3402 SSN_PNOS ÷SMITH_PNOS 49
Example of DIVISION

SSNS(Ssn) ← SSN_PNOS
÷SMITH_PNOS

•R= SSN_PNOS, S= SMITH_PNOS,


T= SSNS. Essn
Z={Essn,Pno}, Y = {Essn}, X ={Pno}

•E.g., 123456789 is in SSNS because


tuples tR appear in R with
tR[Y]=123456789, and with tR[X]= ts for
every tuple ts(i.e., 1 and 2) in S
48

CS3402 50
Complete Set of Relational Operations
 The set of operations including SELECT , PROJECT  , UNION
, DIFFERENCE - , RENAME , and CARTESIAN PRODUCT X is
called a complete set because any other relational algebra
expression can be expressed by a combination of these SIX
operations

 For example:
 R  S = (R  S ) – ((R - S)  (S - R))
R <join condition>S =  <join condition> (R X S)

CS3402 51
Table 8.1 Operations of Relational
Algebra

CS3402 52
Table 8.1 Operations of Relational
Algebra

CS3402 53
Additional operators:Grouping and
Aggregate Functions
 We can define an AGGREGATE FUNCTION operation, using the
symbol (pronounced script F) as follows:

 <grouping attributes> is a list of attributes of the relation specified in R


 <function list> is a list of <function> (attribute) pairs. In each such pair,
<function> is one of the allowed functions—such as SUM, AVERAGE,
MAXIMUM, MINIMUM, COUNT

 E.g. retrieve each department number, the number of employees in the


department, and their average salary

CS3402 54
References
 6e
 Ch. 6, p. 141-157, 167-170

CS3402 55

You might also like