Introduction To Database Concepts
Introduction To Database Concepts
Introduction to Database
Concepts
using such a database; a college of the 1930’s would have kept the same database
in paper form. However, the existence of computers to store and manipulate
the data does change user expectations: we expect to store more data and make
more sophisticated use of these data.
Users interact with database systems through query languages. The query lan-
guage of a DBMS has two broad tasks: to define the data structures that serve
as receptacles for the data of the database, and to allow the speedy retrieval
and modification of data. Accordingly, we distinguish between two components
of a query language: the data definition component and the data manipulation
component.
The main tasks of data manipulation are data retrieval and data update.
Data retrieval entails obtaining data stored in the database that satisfies a
certain specification formulated by the user in a query. Data updates include
data modification, deletion and insertion.
Programming in query languages of DBMSs is done differently from pro-
gramming in higher-level programming languages. The typical program written
in C, Pascal, or PL/1 directly implements an algorithm for solving a problem.
A query written in a database query language merely states what the problem
is and leaves the construction of the code that solves the problem to a special
component of the DBMS software. This approach to programming is called
nonprocedural.
A central task of DBMSs is transaction management. A transaction is a
sequence of database operations (that usually consists of updates, with possible
retrievals) that must be executed in its entirety or not at all. This property of
transactions is known as atomicity. A typical example includes the transfer of
funds between two account records A and B in the database of a bank. Such a
banking operation should not modify the total amount of funds that the bank
has in its accounts, which is a clear consistency requirement for the database.
The transaction consists of the following sequence of operations:
1. Decrease the balance of account A by d dollars;
2. Increase the balance of account B by d dollars.
4 Introduction to Database Concepts
If only the first operation is executed, then d dollars will disappear from the
funds deposited with the bank. If only the second is executed, then the total
funds will increase by d dollars. In either case, the consistency of the database
will be compromised. Thus, a transaction transforms one consistent database
state into another consistent database state, a property of transactions known
as consistency.
Typically, at any given moment in time a large number of transactions co-
exist in the database system. The transaction management component ensures
that the execution of one transaction is not influenced by the execution of any
other transaction. This is the isolation property of transactions. Finally, the
effect of a transaction to the state of the database must by durable, that is, it
must persist in the database after the execution of the transaction is completed.
This is the durability property of transactions. Collectively, the four fundamen-
tal properties of transactions outlined above are known as the ACID properties,
the acronym of atomicity, consistency, isolation, and durability.
DBMS software usually contains application development tools in addition
to query languages. The role of these tools is to facilitate user interface develop-
ment. They include forms systems, procedural and nonprocedural programming
languages that integrate database querying with various user interfaces, etc.
architecture that focuses on application uses of DBMSs, and the logical archi-
tecture that describes various levels of data abstractions.
Functionally, a DBMS contains several main components shown in Fig-
ure 1.1:
• the memory manager;
• the query processor;
• the transaction manager.
The query processor converts a user query into instructions the DBMS can
process efficiently, taking into account the current structure of the database
(also referred as metadata — which means data about data).
The memory manager obtains data from the database that satisfies queries
compiled by the query processor and manages the structures that contain data,
according to the DDL directives.
Finally, the transaction manager ensures that the execution of possibly many
transactions on the DBMS satisfies the ACID properties mentioned above and,
also, provides facilities for recovery from system and media failures.
The standard application architecture of DBMSs is based on a client/server
model. The client, which can be a user or an application, generates a query that
is conveyed to the server. The server processes the query (a process that includes
parsing, generation of optimized execution code, and execution) and returns an
answer to the client. This architecture is known as two-tier architecture. In
general, the number of clients may vary over time.
In large organizations, it is often necessary to create more layers of process-
ing, with, say, a layer of software to concentrate the data activities of a branch
office and organize the communication between the branch and the main data
repository. This leads to what is called a multi-tier architecture. In this setting
data are scattered among various data sources that could be DBMSs, file sys-
tems, etc. These constitute the lowest tier of the architecture, that is, the tier
that is closest to the data. The highest tier consists of users that act through
user interfaces and applications to obtain answers to queries. The intermediate
tiers constitute the middleware, and their role is, in general, to serve as me-
diators between the highest and the lowest tiers. Middleware may be consist
of web servers, data warehouses, and may be considerably complex. Multi-tier
architecture is virtually a requirement for world wide web applications.
The logical architecture, also known as the ANSI/SPARC architecture, was
elaborated at the beginning of the 1970s. It distinguishes three layers of data
abstraction:
1. The physical layer contains specific and detailed information that describes
how data are stored: addresses of various data components, lengths in
bytes, etc. DBMSs aim to achieve data independence, which means that
the database organization at the physical level should be indifferent to
application programs.
2. The logical layer describes data in a manner that is similar to, say, defini-
tions of structures in C. This layer has a conceptual character; it shields
the user from the tedium of details contained by the physical layer, but is
6 Introduction to Database Concepts
Query
Processor
Transaction
Manager
Memory
Manager
Database
metadata
Database
The Entity–Relationship
Model
STUDENTS ADVISING
GRADES INSTRUCTORS
COURSES
advisee
STUDENTS ADVISING
graded
advisor
grader
GRADES INSTRUCTORS
subject
COURSES
These role explain which entities are involved in the relationship and in
which capacity: who is graded, who is the instructor who gave the grade, and
in which course was the grade given. In Figure 2.2 we show the diagram from
Figure 2.1 with the edges marked by the roles discussed above.
2.2 Attributes
Properties of entities and relationships are described by attributes. Each at-
tribute A has an associated set of values, which we refer to as the domain of A
and denote by Dom(A). The set of attributes of a set of entities E is denoted
by Attr(E); similarly, the set of attributes of a set of relationships R is denoted
by Attr(R).
Example 2.2.1 The set of entities STUDENTS of the college database has
12 The Entity–Relationship Model
the attributes student identification number (stno), student name (name), street
address (addr), city (city), state of residence (state), zip code (zip).
The student Edwards P. David, who lives at 10 Red Rd. in Newton, MA,
02129, has been assigned ID number 1011. The value of his attributes are:
Attribute Value
stno ’1011’
name ’Edwards P. David’
addr ’10 Red Rd.’
city ’Newton’
state ’MA’
zip ’02129’
The attributes of the sets of entities considered in our current example (the
college database) are summarized in Figure 2.3.
If several sets of entities that occur in the same context each have an at-
tribute A, we qualify the attribute with the name of the entity set to be able to
differentiate between these attributes. For example, because both STUDENTS
and INSTRUCTORS have the attribute name, we use the qualified attributes
STUDENTS.name and INSTRUCTORS.name.
Attributes of relationships may either be attributes of the entities they relate,
or be new attributes, specific to the relationship. For instance, a grade involves
a student, a course, and an instructor, and for these, we use attributes from the
participating entities: stno, cno, and empno, respectively. In addition, we need
to specify the semester and year when the grade was given as well as the grade
itself. For these, we use new attributes: sem, year, and grade. Therefore, the set
of relationships GRADES has the attributes stno (from STUDENTS), cno (from
COURSES), and empno (from INSTRUCTORS), and also its own attributes sem,
year, and grade. By contrast, the set of relationships ADVISING has only at-
tributes gathered from the entities it relates stno (from STUDENTS) and empno
(from INSTRUCTORS). The attributes of the sets of relationships GRADES and
ADVISING are listed in Figure 2.4. Note that in our college, grades are integers
(between 0 and 100) rather than letters.
It is a feature of the E/R model that the distinction between entities and re-
lationships is intentionally vague. This allows different views of the constituents
14 The Entity–Relationship Model
advisee
STUDENTS ADVISING
graded
advisor
sem
grader
year GRADES INSTRUCTORS
grade
empno name rank roomno telno
subject
COURSES
2.3 Keys
In order to talk about a specific student, you have to be able to identify him. A
common way to do this is to use his name, and generally, this works reasonably
well. So, you can ask something like, “Where does Roland Novak live?” In
database terminology, we are using the student’s name as a “key”, an attribute
(or set of attributes) that uniquely identifies each student. So long as no two
students have the same name, you can use the name attribute as a key.
What would happen, though, if there were two students named “Helen
2.3 Keys 15
Rivers”? Then, the question, “Where does Helen Rivers live?” could not be
answered without additional information. The name attribute would no longer
uniquely identify students, so it could not be used as a key for STUDENTS.
The college solves this problem in a common way: it assigns a unique identi-
fier (corresponding to the stno attribute) to each student when he first enrolls.
This identifier can then be used to specify a student unambiguously; i.e., it can
be used as a key. If one Helen Rivers has ID 6568 and the other has ID 4140,
then instead of talking about “Helen Rivers”, leaving your listener wondering
which one is meant, you can talk about “the student with ID number 6568”.
It’s less natural in conversation, but it makes clear which student is meant.
Avoiding ambiguity is especially important for computer programs, so having
an attribute, or a set of attributes, that uniquely identifies each entity in a
collection is generally a necessity for electronic databases.
We discuss the notion of keys for both sets of entities and sets of relation-
ships. We begin with sets of entities.
Let E be a set of entities having A1 , . . . , An as its attributes. The set
{A1 , . . . , An } is denoted by A1 . . . An . Unfortunately, this notation conflicts
with standard mathematical notation; however, it has been consecrated by its
use in databases, so we adhere to it when dealing with sets of attributes. Further,
if H and L are two sets of attributes, their union is denoted by concatenation;
namely, we write HL = A1 . . . An B1 . . . Bm for H ∪ L if H = A1 . . . An and
L = B1 . . . Bm .
Definition 2.3.1 Let E be a set of entities such that Attr(E) = A1 . . . An . A
key of E is a nonempty subset L of Attr(E) such that the following conditions
are satisfied:
1. For all entities, e, e′ in E, if A(e) = A(e′ ) for every attribute A of L, then
e = e′ (the unique identification property of keys).
2. No proper, nonempty subset of L has the unique identification property
(the minimality property of keys).
Example 2.3.2 In the college database, the value of the attribute stno is suf-
ficient to identify a student entity. Since the set stno has no proper, nonempty
subsets, it clearly satisfies the minimality condition and, therefore, it is a key
for the STUDENTS entity set. For our college, the entity set COURSES both
cno and cname are keys. Note that this reflects a “business rule”, namely that
no two courses may have the same name, even if they are offered by different
departments.
Example 2.3.3 Consider the design of the database of the customers of a town
library. We introduce the entity sets PATRONS and BOOKS and the set of
relationships LOANS between BOOKS and PATRONS. The E/R diagram of this
database is represented in Figure 2.6.
The inventory number invno is clearly a key for the set of entities BOOKS.
If the library never buys more than one copy of any title, then the ISBN
number, isbn, is another key, and so is the set of attributes author title publ
16 The Entity–Relationship Model
year. For the PATRONS set of entities, it is easy to see that the sets H =
name telno date of birth and L = name addr city date of birth are keys. Indeed,
it is consistent with the usual interpretation of these attributes to assume that
a reader can be uniquely identified by his name, his telephone number, and his
date of birth. Note that the set H satisfies the minimality property. Assume,
for example, that we drop the date of birth attribute. In this case, a father and
a son who live in the same household and are both named “John Smith” can-
not be distinguished through the values of the attributes name and telno. On
the other hand, we may not drop the attribute telno because we can have two
different readers with the same name and date of birth. Finally, we may not
drop name from H because we could not distinguish between two individuals
who live in the same household and have the same date of birth (for instance,
between twins). Similar reasoning shows that L is also a key (see Exercise 10).
Example 2.3.3 shows that it is possible to have several keys for a set of
entities. One of these keys is chosen as the primary key; the remaining keys are
alternate keys.
The primary key of a set of entities E is used by other constituents of the
E/R model to refer to the entities of E.
As we now see, the definition of keys for sets of relationships is completely
parallel to the definition of keys for sets of entities.
Definition 2.3.4 Let R be a set of relationships. A subset L of the set of
attributes of R is a key of R if it satisfies the following conditions:
1. If A(r) = A(r′ ) for every attribute A of L, then r = r′ (the unique identi-
fication property of relationships).
2. No proper subset of L has the unique identification property (the mini-
mality property of keys of relationships).
Note that the attributes that form a key of a set R of relationships are
themselves either attributes of R or keys of the entities that participate in
the relationships of R. The presence of the keys of the entities is necessary
to indicate which entities actually participate in the relationships. There is no
logical necessity that any particular key be chosen, but the reason for designating
2.4 Participation Constraints 17
one of the keys as the primary key is to make sure a single key is used to access
entities of the corresponding set.
Example 2.3.5 For instance, if we designate
H = name telno date of birth
as the primary key for PATRONS and invno as primary key for BOOKS, we
obtain the following primary key for LOANS:
K = name telno date of birth invno date
To account for the possibility that a single patron borrows the same book repeat-
edly, thereby creating several loan relationships, the date attribute is necessary
to distinguish among them.
Definition 2.3.6 A foreign key for a set of relationships is a set of attributes
that is a primary key of a set of entities that participates in the relationship set.
Example 2.3.7 The set of attributes name telno date of birth is a foreign key
for the set of relationships LOANS because it is a primary key of PATRONS.
We conclude this initial presentation of keys by stressing that the identifi-
cation of the primary key and of the alternate keys is a semantic statement: It
reflects our understanding of the role played by various attributes in the real
world. In other words, choosing the primary key from among the available keys
is a choice of the designer.
stno advisee
name 1:1
addr STUDENTS ADVISING
city
zip
graded
1:45 advisor
0:7
sem
year grader
GRADES INSTRUCTORS
grade
empno name rank roomno telno
subject
COURSES
The second restriction reflects the fact that a book is on loan to at most one
patron.
Let R be a set of binary relationships involving the sets of entities U and
V . We single out several types of sets of binary relationships because they are
popular in the business-oriented database literature. If every entity in U is
related to exactly one entity in V , then we say that R is a set of one-to-one
relationships. If an entity in U may be related to several entities of V , then R
is a set of one-to-many relationships from U to V . If, on the other hand, many
entities of U are related to a single entity in V , then R is a set of many-to-one
relationships from U to V . And finally, if there are no such limitations between
the entities of U and V , then R is a set of many-to-many relationships.
Example 2.4.3 The set of binary relationships LOANS between BOOKS and
PATRONS considered in Example 2.4.2 is a one-to-many set of binary relation-
ships.
2.5 Weak Entities 19
p:q m:n
U R V
STUDENTS PREREQ
STUDENTS stno
name
addr
recipient city
1:10 zip
BORROW
award
1:1
source
LOANS amount
year
The sets of entities STUDENTS and LOANS are related by the one-to-many
sets of relationships BORROW.
If a student entity is deleted, the LOANS entities that depend on the student
entity should also be removed. Note that the attributes of the LOANS entity
set (source, amount, year) are not sufficient to identify an entity in this set.
Indeed, if two students (say, the student whose student number is s1 and the
student whose student number is s2 ) both got the “CALS” loan for 1993, valued
at $1000, there is no way to distinguish between these entities using their own
attributes. In other words, the set of entities LOANS does not have a key.
Definition 2.5.1 Let E, E ′ be sets of entities and let R be a set of relationships
between E and E ′ . E is a set of weak entities if the following conditions are
satisfied:
1. The set of entities E does not have a key, and
2. the participation constraint (E, 1, k, R) is satisfied for some k ≥ 1.
The second condition of Definition 2.5.1 states that no entity can exist in E
unless it is involved in a relationship of R with an entity of E ′ . According to
Definition 2.5.1, LOANS is a set of weak entities
Weak entity sets are represented in E/R diagrams by dashed boxes (see
Figure 2.10).
Example 2.5.2 Consider a personnel database that contains a set of entities
PERSINFO that contains personal information of the employees of a software
company and a set of entities EMPHIST that contains employment history
records of the employees. A set of recursive relationships REPORTING gives
the reporting lines between employees; the set of entities EMPHIST is related
to PERSINFO through the sets of relationships BELONGS TO.
Note that the existence of an employment history entity in EMPHIST is
2.6 Is-a Relationships 21
superv
1: +
sub
PERSINFO 0:1
REPORTING
emp
1: +
BELONGS_TO
pos
1:1
LOANS
T T
is-a is-a
S S
(a) (b)
Example 2.6.2 Teaching assistants are both students and instructors, and
therefore, the corresponding set of entities, TAs, inherits its attributes from
both STUDENTS and INSTRUCTORS.
This phenomenon described in Example 2.6.2 is called multiple inheritance,
and certain precautions must be taken when it occurs. If S is-a U and S is-a V
and both U and V have an attribute A, we must have Dom(U.A) = Dom(V.A),
because otherwise it would be impossible to have any meaning for the common
restrictions of these attribute to S.
The is-a relation between sets of entities is transitive; that is, S is-a T and
T is-a U imply S is-a U . To avoid redundancy in defining the is-a relation
between entity sets (and, consequently, to eliminate redundancies involving the
is-a relationships between entities), we assume that for no set of entities S do
we have S is-a S.
The introduction of is-a relationships can be accomplished through two dis-
tinct processes, called specialization and generalization. Specialization makes a
smaller set of entities by selecting entities from a set. Generalization makes a
single set of entities by combining several sets whose attributes are those the
original sets had in common.
Definition 2.6.3 A set of entities E ′ is derived from a set of entities through
a specialization process if E ′ consists of all entities of E that satisfy a certain
condition.
If E ′ is obtained from E through specialization, then E ′ is-a E. In this case
we may mark the arrow leading from E ′ to E by is-a(sp).
Example 2.6.4 The set TAs can be regarded as a specialization of both STU-
DENTS and INSTRUCTORS (see Figure 2.14). Therefore, entities of this set
have all attributes applicable to INSTRUCTORS and STUDENTS and, in addi-
2.6 Is-a Relationships 23
stno
name
addr STUDENTS ADVISING
city
zip
sem
year GRADES INSTRUCTORS
grade
empno name rank roomno telno
COURSES
stno
name
addr STUDENTS ADVISING
city
zip
is-a(gen) is-a(sp)
is-a(gen)
UNDERGRADUATES GRADUATES TAs stipend
sat gre is-a(sp)
sem
year GRADES INSTRUCTORS
grade
empno name rank roomno telno
COURSES
2.7 Exercises
1. Consider the following alternative designs for the college database:
(a) Make GRADES a set of entities, and consider binary sets of relation-
ships between GRADES and each of the sets of entities STUDENTS,
COURSES, and INSTRUCTORS.
(b) Replace the set of relationship GRADES with two binary sets of rela-
tionships: One such set should relate STUDENTS with COURSES and
reflect the results obtained by students in the courses; another one
should relate COURSES with INSTRUCTORS and reflect the teaching
assignment of the instructors.
Explain the advantages and disadvantages of these design choices.
2. Consider a database that has a set of entities CUSTOMERS that consists
of all the customers of a natural gas distribution company. Suppose that
this database also records the meter readings for each customer. Each
meter reading has the date of the reading and the number read from the
meter. Bills are generated after six consecutive readings.
(a) Can you consider the readings to be weak entities?
2.7 Exercises 25
(b) Draw the E/R diagram for this database; identify relevant participa-
tion constraints.
3. The data for our college will grow quite large. One of the techniques for
dealing with the explosion of data is to remove those items that are not
used. Describe how you would augment the college database to include
information about when something was accessed (either read or written).
To what will you attach this information? How? How detailed will the
information you attach be? Why? What would you suggest be done with
this information once it is available?
4. Design an E/R model for the patient population of a small medical of-
fice. The database must reflect patients, office visits, prescriptions, bills
and payments. Explain your choice of attributes, relationships, and con-
straints.
5. Design an E/R model for the database of a bank. The database must
reflect customers, branch offices, accounts, and tellers. If you like, you
can include other features, such as deposits, withdrawals, charges, inter-
est, transfers between accounts, etc. Explain your choice of attributes,
relationships, and constraints.
6. Design an E/R model for the database of a car-rental business. Specify
entities, relationships, attributes, keys, and cardinality constraints for this
database and explain your design choices. Be sure to include such objects
like vehicles, renters, rental locations, etc.
7. A small manufacturing company needs a database for keeping track of
its inventory of parts and supplies. Parts have part numbers, names,
type, physical characteristics, etc. Some parts used by the company are
installed during the fabrication process as components of other parts that
enter the device produced by the company. Parts are stored at several
manufacturing facilities. The database must contain information about
the vendors and must keep track of the orders placed with vendors. Use
the E/R technique to design the database.
8. Let A, B, C, and D, be four entity sets linked by is-a relationships as shown
in Figure 2.15. What is wrong with the choice of these relationships?
9. Let E1 , E2 be two sets of entities.
(a) Assume that E is a nonempty set of entities that is a specialization
of both E1 and E2 . Can you construct the generalization of the sets
E1 and E2 ?
(b) Suppose that E ′ is a generalization of E1 and E2 . Can you construct
a set of entities E ′′ that is a common specialization of E1 and E2 ?
What can you say about |E ′′ |?
10. Explain why the set L in example 2.3.3 is a key. Are there reasons for
prefering H as the primary key and L as an alternate key?
26 The Entity–Relationship Model
is-a(sp)
is-a(gen) is-a(gen)
A B
3.1 Introduction
3.2 Tables — The Main Data Structure of the Relational Model
3.3 Transforming an E/R Design into a Relational Design
3.4 Entity and Referential Integrity
3.5 Metadata
3.6 Exercises
3.7 Bibliographical Comments
3.1 Introduction
The relational model is the mainstay of contemporary databases. This chapter
presents the fundamental ideas of this model, which focus on data organization
and retrieval capabilities.
Informally, the relational model consists of:
• A class of data structures referred to as tables.
• A collection of methods for building new tables starting from an initial
collection of tables; we refer to these methods as relational algebra opera-
tions.
• A collection of constraints imposed on the data contained in tables.
SCHEDULE
dow cno roomno time
’Mon’ ’cs110’ 84 5:00 p.m.
’Mon’ ’cs450’ 62 7:00 p.m.
’Wed’ ’cs110’ 65 10:00 a.m.
’Wed’ ’cs310’ 63 12:00 p.m.
’Thu’ ’cs210’ 63 2:00 p.m.
’Thu’ ’cs450’ 65 3:00 p.m.
’Thu’ ’cs240’ 84 5:00 p.m.
’Fri’ ’cs310’ 63 5:00 p.m.
The Cartesian product can generate rather large sets starting from sets that
have a modest size. For example if D1 , D2 , D3 are three sets having 1000 ele-
ments each, then D1 × D2 × D3 contains 1,000,000,000 elements.
Example 3.2.2 Consider the domains of the attributes dow, cno, roomno and
time:
Example 3.2.4 Consider the set D = {1, 2, 3, 4, 5, 6} and the Cartesian prod-
uct D × D, which has 36 pairs. Certain of these pairs (a, b) have the property
that a is less than b, i.e., that they satisfy the relation a < b. With a little bit
of counting, we see that there are 15 such pairs.
One way to characterize this set is to describe it is operationally. We could
say that if a and b are in D, then a < b if there is some number k in D such
that a + k = b. This has the advantage of being concise.
However, there is another way to describe < on this set: we could list out
all 15 pairs (a, b) of D × D such that a < b. If we do this in a vertical list, we
get (in no particular order)
30 The Relational Model
(3, 4)
(1, 2)
(2, 6)
(2, 5)
(1, 6)
(2, 3)
(1, 3)
(2, 4)
(1, 5)
(5, 6)
(3, 6)
(4, 5)
(4, 6)
(3, 5)
(1, 4)
With a little reformatting, to remove all those parentheses and commas, this
same list of pairs becomes a table of two columns and 15 rows, where each row
lists two elements of D, such that the first is less than the second. Furthermore,
just as in the list above, all pairs with the first element less than the second
occur as some row in this table.
3 4
1 2
2 6
2 5
1 6
2 3
1 3
2 4
1 5
5 6
3 6
4 5
4 6
3 5
1 4
Thus, we have a table that lists precisely the pairs of D × D that comprise
the < relation. In just this same manner, we can list out all the tuples of
any relation defined on finite sets as rows of a table. It is this correspondence
between tables and relations that is at the heart of the name “relational model.”
we can define course schedules as relations. One possible course schedule is the
relation ρ that consists of the following 8 quadruples:
(’Mon’, ’cs110’, 84, ’5:00 p.m.’), (’Mon’, ’cs450’, 62, ’7:00 p.m.’),
(’Wed’, ’cs110’, 65, ’10:00 a.m.’), (’Wed’, ’cs310’, 63, ’12:00 p.m.’),
(’Thu’, ’cs210’, 63, ’2:00 p.m.’), (’Thu’, ’cs450’, 65, ’3:00 p.m.’),
(’Thu’, ’cs240’, 84, ’5:00 p.m.’), (’Fri’, ’cs310’, 63, ’5:00 p.m.’)
If D1 , D2 , . . . , Dn are n sets with k1 , . . . , kn elements, respectively, then there
are 2k1 k2 ···kn relations that can be defined on these sets. The number of relations
that can be defined on relatively small sets can be astronomical. For example,
if each of D1 , D2 and D3 has ten elements, then there are 21000 relations that
can be defined on D1 , D2 , D3 .
It is clear now that the content of a table T having the heading H = A1 · · · An
is a relation that consists of tuples from tupl(H), and this is the essential part
of the table.
During the life of a database, the constituent tables of the database may
change through insertions or deletions of tuples, or changes to existing tuples.
Thus, at any given moment we may see a different picture of the database, which
suggests the need of introducing the the notion of relational database instance.
Definition 3.2.6 Let H1 , . . . , Hn be n sets of attributes. An relational data-
base instance is a finite collection of tables T1 , . . . , Tn that have the headings
H1 , . . . , Hn , respectively, such that all names of the tables are distinct.
Definition 3.2.7 The tables T and S are compatible if they have the same
headings.
Implicit in the definition of tables is the fact that tables do not contain
duplicate tuples. This is not a realistic assumption, and we shall remove it later,
during the study of SQL, the standard query language for relational databases.
The same relational attribute may occur in several tables of a relational data-
base. Therefore, it is important to be able to differentiate between attributes
that originate from different tables; we accomplish this using the following no-
tion.
Definition 3.2.8 Let T be a table. A qualified attribute is an attribute of the
form T.A. For every qualified attribute of the form T.A, Dom(T.A) is the same
as Dom(A).
Example 3.2.9 The qualified attributes of the table SCHEDULE are
3.2.1 Projections
For a tuple t of a table T having the heading H we may wish to consider only
some of the attributes of t while ignoring others. If L is the set of attributes
we are interested in, then t[L] is the corresponding tuple, referred to as the
projection of t on L.
32 The Relational Model
Example 3.2.10 Let H = dow cno roomno time be the set of attributes that
is the heading of the table SCHEDULE introduced above. The tuple
can be restricted to any of the sixteen subsets of the set H. For example, the
restriction of t to the set L = dow roomno is the tuple t[L] = (’Mon’, 84), and
the restriction of t to the set K = cno room time is (’cs110’, 84, ’5:00 p.m.’)
The restriction of t to H is, of course, t itself. Also, the restriction of t to
the empty set is t[∅] = (), that is, the empty sequence.
By extension, the table T itself can be projected onto L, giving a new table
named T [L], with heading L, consisting of all tuples of the form t[L], where t
is a tuple in T ; i.e., the rows of T [L] are obtained by projecting the rows of T
on L. Projecting a table often creates duplicate rows, but within the context of
the relational model, which is based on sets, only one copy of each row appears
in the table, as shown in the second projection of Example 3.2.11.
Example 3.2.11 The projection of the table SCHEDULE on the set of at-
tributes dow cno is
SCHEDULE[dow cno]
dow cno
’Mon’ ’cs110’
’Mon’ ’cs450’
’Wed’ ’cs110’
’Wed’ ’cs310’
’Thu’ ’cs210’
’Thu’ ’cs450’
’Thu’ ’cs240’
’Fri’ ’cs310’
The projection of SCHEDULE on the attribute dow gives the following table.
SCHEDULE[dow]
dow
’Mon’
’Wed’
’Thu’
’Fri’
While in the E/R model we dealt with two types of basic constituents, entity
sets and relationship sets, in the relational model, we deal only with tables, and
we use these to represent both sets of entities and sets of relationships. Thus,
it is necessary to reformulate the definition of keys in this new setting. The
conditions imposed on keys are obvious translations of the conditions formulated
in Definition 2.3.1.
Definition 3.3.1 Let T be a table that has the heading H. A set of attributes
K is a key for T if K ⊆ H and the following conditions are satisfied:
1. For all tuples u, v of the table, if u[K] = v[K], then u = v (unique identi-
fication property).
2. There is no proper subset L of K that has the unique identification prop-
erty (minimality property).
34 The Relational Model
STUDENTS
stno name addr city state zip
1011 Edwards P. David 10 Red Rd. Newton MA 02159
2415 Grogan A. Mary 8 Walnut St. Malden MA 02148
2661 Mixon Leatha 100 School St. Brookline MA 02146
2890 McLane Sandy 30 Cass Rd. Boston MA 02122
3442 Novak Roland 42 Beacon St. Nashua NH 03060
3566 Pierce Richard 70 Park St. Brookline MA 02146
4022 Prior Lorraine 8 Beacon St. Boston MA 02125
5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115
5571 Lewis Jerry 1 Main Rd. Providence RI 02904
INSTRUCTORS
empno name rank roomno telno
019 Evans Robert Professor 82 7122
023 Exxon George Professor 90 9101
056 Sawyer Kathy Assoc. Prof. 91 5110
126 Davis William Assoc. Prof. 72 5411
234 Will Samuel Assist. Prof. 90 7024
COURSES
cno cname cr cap
cs110 Introduction to Computing 4 120
cs210 Computer Programming 4 100
cs240 Computer Architecture 3 100
cs310 Data Structures 3 60
cs350 Higher Level Languages 3 50
cs410 Software Engineering 3 40
cs460 Graphics 3 30
GRADES
stno empno cno sem year grade
1011 019 cs110 Fall 2001 40
2661 019 cs110 Fall 2001 80
3566 019 cs110 Fall 2001 95
5544 019 cs110 Fall 2001 100
1011 023 cs110 Spring 2002 75
4022 023 cs110 Spring 2002 60
3566 019 cs240 Spring 2002 100
5571 019 cs240 Spring 2002 50
2415 019 cs240 Spring 2002 100
3442 234 cs410 Spring 2002 60 ADVISING
5571 234 cs410 Spring 2002 80 stno empno
1011 019 cs210 Fall 2002 90 1011 019
2661 019 cs210 Fall 2002 70 2415 019
3566 019 cs210 Fall 2002 90 2661 023
5571 019 cs210 Spring 2003 85 2890 023
4022 019 cs210 Spring 2003 70 3442 056
5544 056 cs240 Spring 2003 70 3566 126
1011 056 cs240 Spring 2003 90 4022 234
4022 056 cs240 Spring 2003 80 5544 023
2661 234 cs310 Spring 2003 100 5571 234
4022 234 cs310 Spring 2003 75
If several keys exist for a table, one of them is designated as the primary
key of the table; the remaining keys are alternate keys. The main role of the
primary key of a table T is to serve as a reference for the tuples of T that can
be used by other tables that refer to these tuples.
Example 3.3.2 The table that results from the translation of the set of entities
PATRONS introduced in Example 2.3.3 has the keys
K = name telno date of birth
and
L = name address city date of birth.
If we consider K to be the primary key, then L is an alternate key.
As a practical matter, if K1 and K2 are both keys, where K1 has fewer
attributes than K2 , we would prefer K1 as the primary key.
Translating sets of relationships is a little more intricate than translating
tables. Let R be a set of relationships that relates the set of entities E1 , . . . , En .
Suppose that every set Ei has its own primary key Ki for 1 ≤ i ≤ n and that
no two such keys have an attribute in common. We exclude, for the moment,
the is-a relationship and the dependency relationship that relates sets of weak
entities to sets of regular entities. When translating relationships, the entities
involved are represented by their primary key values.
If the set of attributes of R itself is B1 , . . . , Bk , then a relationship r of
R relates the entities e1 , . . . , en with some values, say b1 , . . . , bk . In making
the translation of this particular relationship, each entity ei is represented by
its primary key, ei [Ki ], which may comprise several values, ei1 , . . . , eimi . To
simplify, we will write ~ei for the primary key of ei . The value of the relationship
itself is represented by the value bj of each attribute Bj . So, r can be translated
to a tuple wr = (~e1 , . . . , ~en , b1 , . . . , bk ). In other words, the translation WR of
the set of relationships R is defined on the set of all attributes that appear in
the primary keys of the entities, K1 , . . . , Kn , as well as attributes B1 , . . . , Bk ;
and the tuple wr is put together from the values of the primary keys of the
participating entities and the values of the attributes of the relationship r.
Example 3.3.3 Consider, for example, the relationship g that belongs to the
set of relationships GRADES, that relates STUDENTS, COURSES, and INSTRUC-
TORS. Further, assume that this relationship involves the student whose student
number is ’1011’, the instructor whose employee number is ’019’, and the course
whose number is ’cs110’. Further, assume that
sem(g) = ’Fall’
year(g) = ’2001’
grade(g) = 40.
Then, the relationship g will be represented by the tuple
wg = (’1011’, ’019’, ’cs110’, ’Fall’, ’2001’, 40).
36 The Relational Model
In turn, the set R is translated into a table named R whose heading contains
B1 , . . . , Bk as well as all attributes that occur in a key K1 , . . . , Kn . The content
of this table consists of all tuples of the form wr for each relationship r.
The collection of tables shown in Figure 3.1 represents an instance of the
college database obtained by the transformation of the E/R design shown in
Figure 2.5.
If E is a set of weak entities linked by a dependency relationship R to a set
of entities E ′ , then we map both the set of entities E and the set of relationships
R to a single table T defined as follows. If K is the primary key of the table
T ′ that represents the set of entities E ′ , we define H to be the set of attributes
that includes the attributes of E and the attributes of K. The content of the
table T consists of those tuples t in tupl(H) such that there exists an entity e′
in E ′ and a weak entity e in E such that
A(e) if A is an attribute of E
t(A) =
A(e′ ) if A belongs to K.
Example 3.3.4 Consider the set of weak entities LOANS dependent on the set
STUDENTS. Assuming that the primary key of STUDENTS is stno, both the
relationship GRANTS and the weak set of entities LOANS are translated into
the table named LOANS:
LOANS
stno source amount year
1011 CALS 1000 2002
1011 Stafford 1200 2003
3566 Stafford 1000 2002
3566 CALS 1200 2003
3566 Gulf Bank 2000 2003
EMPHIST
empno position dept appt date term date salary
1000 ’President’ null ’1-oct-1999’ null 150000
1005 ’Vice-President’ ’DB’ ’12-oct-1999’ null 120000
1010 ’Vice-President’ ’WWW’ ’1-jan-2000’ null 120000
1015 ’Senior Engineer’ ’DB’ ’25-oct-1999’ null 100000
1020 ’Engineer’ ’DB’ ’1-nov-1999’ null 70000
1025 ’Programmer’ ’DB’ ’10-mar-2000’ null 70000
1030 ’Senior Engineer’ ’WWW’ ’10-jan-2000’ null 90000
1035 ’Programmer’ ’WWW’ ’20-feb-2000’ null 75000
1040 ’Programmer’ ’WWW’ ’1-mar-2000’ null 70000
REPORTING
empno superv
1000 null
1005 1000
1010 1000
1015 1005
1020 1005
1025 1005
1030 1010
1035 1010
1040 1010
GRADUATES
stno name addr city state zip qualdate
3566 Pierce Richard 70 Park St. Brookline MA 02146 2/1/92
4022 Prior Lorraine 8 Beacon St. Boston MA 02125 11/5/93
5544 Rawlings Jerry 15 Pleasant Dr. Boston MA 02115 2/1/92
5571 Lewis Jerry 1 Main Rd. Providence RI 02904 11/5/93
then the table that represents the set of entities STUDENTS obtained by gen-
eralization from UNDERGRADUATES and GRADUATES is the one shown in
Figure 3.1.
If the set of entities E ′ is obtained by specialization from the set of entities
E, the heading of the table that represents E ′ must include the attributes of E
plus the extra attributes that are specific to E ′ whenever such attributes exist
(see Figure 3.3).
38 The Relational Model
A1
.. translation A1 ··· An
. E -
An
6
is-a
(sp)
A1
..
.
translation A1 ··· An B1 ··· Bℓ
An E ′ -
B1
..
.
Bℓ
Example 3.3.7 The heading of the table that represents the set of entities TA
consists of the attributes stno, name, addr, city, state, zip, empno, rank, roomno,
telno, stipend. The extension of the table that results from the translation of TA
consists of the translation of all entities that belong to both STUDENTS and
INSTRUCTORS.
null value that occurs under this attribute in the table INSTRUCTORS must
appear in the table ROOMS. This corresponds to the real-world constraint that
either an instructor has no office, in which case the roomno-component is null,
or the instructor’s office in one of the rooms of the college.
Of course, if an S-foreign key is a part of the primary key of a table T (as is
the case with stno for GRADES, for example), then null values are not permitted
in T under the attributes of the S-foreign key.
3.5 Metadata
Metadata is a term that refers to data that describes other data. In the context
of the relational model, metadata are data that describe the tables and their
attributes.
The relational model allows a relational database to contain tables that
describe the database itself. These tables are known as catalog tables, and they
constitute the data catalog or the data dictionary of the database.
Typically, the catalog tables of a database include a table that describes the
names, owners, and some parameters of the headings of the data tables of the
database. The owner of a table is relevant in multi-user relational database
systems, where some users are permitted only limited access to tables they do
not own.
For example, a catalog table named SYSCATALOG that describes the tables
of the college database might look like:
SYSCATALOG
owner tname dbspacename
dsim courses system
dsim students system
dsim instructors system
dsim grades system
dsim advising system
sys syscolumns system
.. .. ..
. . .
In the table SYSCATALOG the attribute owner describes the creator of the
table; this coincides, in general, with the owner of that table. The attribute
tname gives the name of the table, while dbspacename indicates the memory
area (also known as the table space) where the table was placed.
Note that the above table mentions the table SYSCOLUMNS (recall that ta-
ble names are case insensitive). SYSCOLUMNS describes various attributes and
domains that occur in the user’s tables. For example, for the college database,
the table may look like:
3.6 Exercises 41
SYSCOLUMNS
owner cname tname coltype nulls length in pr key
dsim cno courses char N 5 Y
dsim cname courses char Y 20 N
dsim cr courses smallint Y 2 N
dsim cap courses integer Y 4 N
dsim stno grades char N 10 Y
dsim empno grades char N 11 N
dsim cno grades char N 5 Y
dsim sem grades char N 6 Y
dsim year grades integer N 4 Y
dsim grade grades integer Y 4 N
.. .. .. .. .. .. ..
. . . . . . .
The attributes cname and tname give the name of the column (attribute)
and the name of table where the attribute occurs. The nature of the domain
(character or numeric) is given by the attribute coltype and the size in bytes of
the values of the domain is given by the attribute length. The attribute nulls
specifies whether or not null values are allowed. Finally, the attribute in pr key
indicates whether the attribute belongs to the primary key of the table tname.
The access to and the presentation of metadata is highly dependent on the
specific database system. We examine the approach taken by ORACLE in
section 5.24.
The relational model currently dominates all database systems, and it is
likely to continue to do so for quite some time. Researchers are continually
producing enhancements to the model, adding, e.g., object-oriented and web-
centered features. Some of these features are already implemented in contem-
porary database systems, as we will see when we discuss ORACLE in detail.
3.6 Exercises
1. Convert the alternative E/R models for the college database discussed in
Exercise 1 of Chapter 2 to a relational design.
2. Convert the E/R design of the database of the customers of the natural
gas distribution company to a relational design. Specify the keys of each
relation.
3. Suppose that the set of entities E ′ is obtained by specialization from the
set of entities E and that
τ = (T, A1 . . . An , ρ),
τ ′ = (T ′ , A1 . . . An B1 . . . Bℓ , ρ′ )
are the tables that result from the translation of ρ and ρ′ , respectively.
Show that if e is an entity from E − E ′ and t is the tuple that results from
the translation of e, then t ∈ ρ − ρ′ [A1 . . . An ].
42 The Relational Model
4.1 Introduction
Tables are more than simply places to store data. The real interest in tables is
in how they are used. To obtain information from a database, a user formulates
a question known as a “query.” For example, if we wanted to construct an
honor roll for the college for Fall 2002, we could examine the GRADES table and
select all students whose grades are above some threshold, say 90. Note that the
result can again be stored in a table. In this case, every tuple in the resultant
table actually appears in the original table. However, if we wanted to know the
names of the students in this table, we cannot find it out directly, as students
are represented only by their student numbers in the GRADES table. We have
to add some information from the STUDENTS table to find their names. The
result can again be stored in a table, which we can call HONOR ROLL.
In general, the method of working with relational databases is to modify and
combine tables using specific techniques. These techniques have been studied
and, of course, have names. For example, the method above that generates the
sub-table of GRADES is an example of a “selection.” This table can be thought
of as an “intermediate result” along the path of obtaining HONOR ROLL. The
method of combining this intermediate result with STUDENTS is known as
“joining.” These and various other methods are what we study under the name
“relational algebra.”
Relational algebra is thus a collection of methods for building new tables
starting from existing ones. These methods are referred to as “operations”
44 Data Retrieval in the Relational Model
T ′ (B1 , . . . , Bn ) := T (A1 , . . . , An ).
SUBJECTS := COURSES,
regular program or the continuing education division, then we compute the table
(COURSES ∪ CED COURSES):
(COURSES ∪ CED COURSES)
cno cname cr cap
cs105 Computer Literacy 2 150
cs110 Introduction to Computing 4 120
cs199 Survey of Programming 3 120
cs210 Computer Programming 4 100
cs240 Computer Architecture 3 100
cs310 Data Structures 3 60
cs350 Higher Level Languages 3 50
cs410 Software Engineering 3 40
cs460 Graphics 3 30
Courses offered under both the regular and the extension program are com-
puted in the table (COURSES ∩ CED COURSES):
(COURSES ∩ CED COURSES)
cno cname cr cap
cs110 Introduction to Computing 4 120
Definition 4.1.7 Let T and S be two distinct tables. The product of T and
S is the table named (T × S) whose heading is T.A1 . . . T.An S.B1 . . . S.Bk and
which contains all tuples of the form
(u1 , . . . , un , v1 , . . . , vk ),
In short, the product contains all possible combinations of the rows of the
original tables. So, we see that the product operation can create huge tables
starting from tables of modest size; for instance, the product of three tables of
1000 rows apiece yields a table with one billion tuples.
Note that the definition of the product of tables prevents us from consid-
ering the product of a table with itself. Indeed, if we were to try to con-
struct the product T × T , where the attributes of the new table would be
T.A1 , . . . , T.An , T.A1 , . . . , T.An . This contradicts the requirement that all at-
tributes of a table be distinct. To get around this restriction we create an alias
T ′ by writing T ′ := T ; then, we can compute (T ×T ′), which has T.A1 , . . . , T.An ,
T ′ .A1 , . . . , T ′ .An as its attributes.
Example 4.1.18 shows a query that requires this kind of special handling.
4.1.3 Selection
Selection is a unary operation (that is, an operation that applies to one table)
that allows us to select tuples that satisfy specified conditions. For instance,
using selection, we can extract the tuples that refer to all students who live in
Massachusetts from the STUDENTS table. To begin, we formalize the notion of
a condition.
Definition 4.1.9 Let H be a set of attributes. An atomic condition on H has
the form A oper a or A oper B, where A, B are attributes of H that have the
same domain, oper is one of =, !=, <, >, ≤, or ≥, and a is a value from the
domain of A.
As is common in query languages, we use != to represent 6=, because 6= is
not part of the ASCII character set and does not appear on most keyboards.
Example 4.1.10 Consider the table ITEMS that is a part of the database of a
department store and lists items sold by the store. We assume that the heading
consists of the following attributes:
dept = ’Sport’
cost > retprice
cost <= 1.25
are atomic conditions on the attributes of ITEMS. Note that we use quotation
marks for the value ’Sport’, because it is a part of a string domain, but there
are no quotation marks around 1.25, because this value belongs to a numerical
domain.
Starting from these atomic condtions, we can build more complicated con-
ditions using and, or, and not. So, if we want to list the sports items that
sell for under $ 1.25, we can use the condition dept = ’Sport’ and cost <= 1.25.
This method of building conditions is known as “recursive”, and we use it in
the following definition.
Definition 4.1.11 Conditions on a set of attributes H are defined recursively
as follows:
1. Every atomic condition on H is a condition on H.
2. If C1 , C2 are conditions on H, then
are conditions on H.
Example 4.1.15 Let us find the list of grades given in CS110 during the spring
semester of 2002. This can be done by applying the following selection operation:
T := (GRADES wherecno = ’CS110’ and
sem = ’Spring’ and year = 2002).
This selection gives the table:
T
stno empno cno sem year grade
1011 023 cs110 Spring 2002 75
4022 023 cs110 Spring 2002 60
We conclude the definition of selection with the observation that selection ex-
tracts “horizontal” slices from a table. The next operation extracts vertical
slices from tables.
4.1.4 Projection
Recall that we introduced the projection of tables in Section 3.2. In this section
we re-examine this notion as a relational algebra operation.
A table may contain many attributes, but for any particular query, only
some of these may be relevant; projection allows us to chose these.
Example 4.1.16 Suppose that we wish to produce a list of instructors’ names
and the room numbers of their offices. This can be accomplished by projection:
OFFICE LIST := INSTRUCTORS[name roomno]
and we obtain the table:
OFFICE LIST
name roomno
Evans Robert 82
Exxon George 90
Sawyer Kathy 91
Davis William 72
Will Samuel 90
Example 4.1.17 Projection and selection may be combined, provided the pro-
jection does not eliminate the attributes used in the selection. Consider, for ex-
4.1 Introduction 51
ample, the task of determining the grades of the student whose student number
is 1011. The table T created by
T:=(GRADES wherestno=’1011’)[grade]
is
T
grade
40
75
90
A1 · · · Am B1 · · · Bn and B1 · · · Bn C1 · · · Cp ,
52 Data Retrieval in the Relational Model
respectively. (In other words, assume that the two tables that have only the
attributes B1 , . . . , Bn in common.)
The tuples t1 in T1 and t2 in T2 are joinable if
t1 [B1 · · · Bn ] = t2 [B1 · · · Bn ].
A1 . . . Am B1 . . . Bn C1 . . . Cp
such that
t[A1 . . . Am B1 . . . Bn ] = t1 [A1 . . . Am B1 . . . Bn ],
and
t[B1 . . . Bn C1 . . . Cp ] = t2 [B1 . . . Bn C1 . . . Cp ].
Definition 4.1.21 Suppose that B1 , . . . , Bn are the attributes that two tables
T1 , T2 have in common.
The natural join of T1 and T2 , or simply the join, is the table named (T1 1
T2 ) having the heading A1 . . . Am B1 . . . Bn C1 . . . Cp that contains of all tuples
t1 1 t2 such that t1 is in T1 and t2 is in T2 , and t1 is joinable with t2 .
Note that if n = 0 (that is, if the tables T1 , T2 have no attributes in common),
then the joinability condition is satisfied by every tuple t1 of T1 and t2 of T2 .
In this special case, the tables T1 1 T2 and T1 × T2 are virtually identical: they
have the same rows but different names and headings.
4.1 Introduction 53
Example 4.1.23 Suppose that we need to find the names of all instructors
who have taught cs110. Initially, we extract all grade records involving cs110
using a selection operation:
T1 := (GRADES wherecno = ’cs110’).
Then, by joining T1 with the table INSTRUCTORS we extract the records of
instructors who teach this course:
T2 := (T1 1 INSTRUCTORS).
ANS := T2 [name].
Example 4.1.24 To find the names of all instructors who have ever taught any
four-credit course, we can compute the join:
ANS := T2 [name].
would require the name of the student to be identical with the name of the
instructor (which is, of course, not what is required by this query). Instead, we
54 Data Retrieval in the Relational Model
can use the product of tables and enforce the “limited joining” through selection:
T1 := (STUDENTS × GRADES × INSTRUCTORS)
T2 := T1 where STUDENTS.stno = GRADES.stno and
GRADES.empno = INSTRUCTORS.empno and
STUDENTS.city = ’Brookline’
Then, by projection, we extract the name of the instructors involved:
ANS := T2 [INSTRUCTORS.name].
Join can be used to express other operations. Note, for instance, that if T
and T ′ are two compatible tables, then T 1 T ′ has the same rows as T ∩ T ′ .
Indeed, since the two tables have all their attributes in common, two tuples t
in T and t′ in T ′ are joinable only if they are equal on all attributes, that is, if
they are the same.
4.1.6 Division
Definition 4.1.26 Let T1 , T2 be two tables such that the heading of T1 is
A1 . . . An B1 . . . Bk and the heading of T2 is B1 . . . Bk . The table obtained by
division of T1 by T2 is the table T1 ÷ T2 that has the heading A1 . . . An and
contains those tuples t in tupl(A1 . . . An ) such that t 1 t2 is a tuple in T1 for
every tuple t2 of T2 .
In other words, the content of the table obtained by dividing T1 by T2 ,
T1 ÷ T2 , consists of each tuple from tupl(A1 . . . An ) which, when concatenated
with every tuple of T2 , yields a tuple of T1 .
We stress that, in order for two tables T1 and T2 to be involved in a division,
the heading of T2 must be included in the heading of T1 .
Example 4.1.27 Suppose that we need to determine the courses taught by
all full professors. We can solve this query by first determining the employee
numbers (empno) for all full professors:
T2 := GRADES[cno, empno],
which results in
4.2 The Basic Operations of Relational Algebra 55
T2
empno cno
019 cs110
023 cs110
019 cs240
234 cs410
019 cs210
056 cs240
234 cs310
Finally, by applying division, we extract the course numbers of courses that are
taught by all full professors:
ANS := (T2 ÷ T1 ),
that is,
ANS
cno
cs110
T3
T1 .A T1 .B T1 .D T2 .B T2 .C T2 .D
a2 b1 d1 b1 c1 d1
a1 b2 d4 b2 c2 d4
a3 b1 d1 b1 c1 d1
T4
A B D C
a2 b1 d1 c1
a1 b2 d4 c2
a3 b1 d1 c1
The table T4 contains exactly the same tuples as the join T1 1 T2 .
Example 4.2.2 The query considered in Example 4.1.24 (where we use join to
find the names of instructors who have taught any four-credit course) can now
be solved using product, selection, projection, and renaming by the following
computation:
The renaming of last step is required to replace the qualified attributes with
unqualified ones; i.e., we must consider all possible combinations (pairs) of pro-
fessors and courses, not just the courses that the various professors taught.
Next, by computing
T1
empno
019
023
GRADES[cno]
cno
cs110
cs210
cs240
cs310
cs410
T3 gives all possible pairs of course numbers and employee numbers for full
professors:
T3
cno empno
cs110 019
cs110 023
cs210 019
cs210 023
cs240 019
cs240 023
cs310 019
cs310 023
cs410 019
cs410 023
T4
cno empno
cs210 023
cs240 023
cs310 019
cs310 023
cs410 019
cs410 023
This means that the courses not taught by every full professor are:
T5
cno
cs210
cs240
cs310
cs410
Finally, the result of the computation is:
T6
cno
cs110
The same series of steps can always be used to calculate the division opera-
tion.
The arguments just presented show that only six of the nine operations of
relational algebra are required: renaming, union, difference, product, selection,
and projection.
It is natural to ask whether we can eliminate any of these remaining oper-
ations and still retain the full computational power of relational algebra. We
show, however, the set of six operations just mentioned is minimal. In other
words, if we discard any of these six operations, the remaining five are unable
to do the job of the discarded operation.
The following observation shows that the union operation cannot be dis-
carded.
Consider the one-attribute, one-tuple tables:
T1 T2
A and A
a1 a2
If we assume a1 6= a2 , then the table T1 ∪ T2 is given by
(T1 ∪ T2 )
A
a1
a2
Note that if we apply the operations of difference, product, selection, projection,
and renaming to tables that consist of at most one tuple, then the result may
contain at most one tuple; therefore, any computation that makes use only of
these operations is not capable of computing a target that contains more than
4.3 Other Relational Algebra Operations 59
one tuple, and therefore is unable to produce a table that has the same content
as T1 ∪ T2 .
The product operation cannot be eliminated from the set of basic operations.
Indeed, suppose that a database consists of two tables T and S that have the
one-attribute headings A and B, respectively. Observe that no computation
that uses renaming, union, difference, selection, and projection is capable of
computing the table (T1 × T2 ). Indeed, any table that is obtained through such
a computation may have only one attribute and, therefore, it cannot compute
T1 × T2 , which has two attributes.
Slightly more complicated examples show that the difference, selection, and
projection operations are all essential.
Since every student has one advisor it suffices to compute the θ-join:
The semijoin of table τ1 with table τ2 computes that part of table τ1 that
consists of the tuples of τ1 that are joinable with tuples of τ2 ; in other words,
it computes the “useful” part of τ1 for the join with τ2 . This operation is
very important for distributed databases. In such databases various tables (or
even portions of tables) may reside at different computing sites, and it is often
important to minimize the amount of data traffic through the network that
connects these sites. Suppose, for example that τ1 is a very large table stored
at site S1 , τ2 is a relatively small table stored at site S2 and τ1 1 τ2 is needed
at site S2 (see Figure 4.2).
Suppose that the tuples of T1 and T2 have approximatively the same size.
Also, assume that T1 contains n1 tuples, T2 contains n2 tuples and k tuples of
T1 are joinable with the tuples of T2 . We need to compare two scenarios:
1) Ship table T1 to site S2 . The traffic cost is proportional to the size n1 of
T1 .
4.3 Other Relational Algebra Operations 61
Scenario 1:
τ1 τ2
1. Ship τ1 to S2
S1 S2
Scenario 2:
1. Ship τ2 to S1 .
2. Ship τ1 ⋉τ2 to S2 .
3. Compute τ1 1 τ2 at S2 as (τ1 ⋉τ2 ) 1 τ2
4.4 Exercises
1. Consider a database that consists of one table T :
T
A B
a b
Prove that there is no computation that uses renaming, union, product,
difference, and selection that can compute the projection T [A]. Conclude
that projection is an essential operation.
2. Find the names of students who live in Boston; find the names of students
who live outside Boston.
3. Find all pairs of student names and course names for grades obtained
during Fall of 2001.
4. Find the names of students who took some four-credit courses.
5. Find the names of students who took every four-credit course.
4.4 Exercises 63