0% found this document useful (0 votes)
4 views

III

The document discusses relational database design, focusing on integrity constraints, including key declarations, referential integrity, and various SQL constraints such as not null, unique, and check. It also covers assertions, SQL data types, default values, index creation, large-object types, and user-defined types. The importance of maintaining data consistency and the mechanisms to enforce these constraints in SQL are emphasized throughout the text.

Uploaded by

Sneha Giri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

III

The document discusses relational database design, focusing on integrity constraints, including key declarations, referential integrity, and various SQL constraints such as not null, unique, and check. It also covers assertions, SQL data types, default values, index creation, large-object types, and user-defined types. The importance of maintaining data consistency and the mechanisms to enforce these constraints in SQL are emphasized throughout the text.

Uploaded by

Sneha Giri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Unit–III: Relational Database

Design

1. Integrity Constraints
Integrity constraints ensure that changes made to the database by authorized users do not
result in a loss of data consistency. Therefore, integrity constraints guard against accidental
damage to the database. Examples of integrity constraints are:
 An instructor name cannot be null.
 No two instructors can have the same instructor ID.
 Every department name in the course relation must have a matching department name in
the department relation.
 The budget of a department must be greater than 0.00.
There are two types of integrity constraints for the E-R model:
1. Key declarations: The condition that certain attributes form a candidate key for a given
entity set.
2. Form of a relationship: Many to many, one to many, one to one.
An integrity constraint can be a random condition pertaining to the database. But, random
condition may be costly to test. Thus, we concentrate on integrity constraints that can be tested
with minimal overhead (cost).
Integrity constraints are usually identified as part of the database schema design process,
and declared as part of the create table command used to create relations. However, integrity
constraints can also be added to an existing relation by using the command alter table table-
name add constraint, where constraint can be any constraint on the relation. When such a
command is executed, the system first ensures that the relation satisfies the specified constraint.
If it does, the constraint is added to the relation; if not, the command is rejected.

a) Constraints on a Single Relation


The create table command also include integrity-constraint statements. The allowed
integrity constraints include
 not null
 unique
 check(<predicate>)

b) Not Null Constraint


The null value is a member of all domains, and as a result is a legal value for every
attribute in SQL by default. For certain attributes, null values may be inappropriate. Consider a
tuple in the student relation where name is null. Such a tuple gives student information for an
unknown student; thus, it does not contain useful information. Similarly, we would not want the

Database Management Systems (DBMS) 1


department budget to be null. In cases such as this, we wish to prevent null values, and we
can do so by restricting the domain of the attributes name and budget to exclude null values, by
declaring it as follows:
name varchar(20) not null
budget numeric(12,2) not null
The not null specification prohibits the insertion of a null value for the attribute. Any database
modification that would cause a null to be inserted in an attribute declared to be not null
generates an error diagnostic.
There are many situations where we want to avoid null values. In particular, SQL
prohibits null values in the primary key of a relation schema. In the university example, the
department relation, if the attribute dept_name is declared as the primary key for department, it
cannot take a null value. As a result it would not need to be declared explicitly to be not null.

c) Unique Constraint
SQL also supports an integrity constraint:
unique(A1, A2, ...,Am)
The unique specification says that attributes A1, A2 ,...,Am form a candidate key; that is, no two
tuples in the relation can be equal on all the listed attributes. However, candidate key attributes
are permitted to be null unless they have explicitly been declared to be not null.

d) The check Clause


When applied to a relation declaration, the clause check(P) specifies a predicate P that must be
satisfied by every tuple in a relation.
A common use of the check clause is to ensure that attribute values satisfy specified
conditions. For instance, a clause check(budget>0) in the create table command for relation
department would ensure that the value of budget is nonnegative.
create table department(
dept_name varchar(20) primary key,
building varchar(15),
budget numeric(12,2) check(budget > 0));
As another example, consider the following:
create table section(
course_id varchar(8),
semester varchar(6) check (semester in(’Fall’, ’Winter’, ’Spring’, ’Summer’)));
Here, we use the check clause to simulate an enumerated type, by specifying that semester must
be one of ’Fall’, ’Winter’, ’Spring’, or ’Summer’.

Question 1: Explain with suitable example what is Referential Integrity.


2 Database Management Systems (DBMS)
e) Referential Integrity
A value that appears in one relation for a given set of attributes also appears for a certain
set of attributes in another relation. This condition is called referential integrity.
Foreign keys can be specified as part of the SQL create table statement by using the
foreign key clause. (Refer Example on page no 75 to see foreign-key declarations).
A foreign key reference the primary key attributes of the referenced table. SQL also
supports a version of the references clause where a list of attributes of the referenced relation can
be specified explicitly. The specified list of attributes must be declared as a candidate key of the
referenced relation.
Example:
Following short form can be used as part of an attribute definition to declare that the
attribute forms a foreign key:
branch-name char(15) references branch
When a referential-integrity constraint is violated, the normal procedure is to reject the action that
caused the violation. But, a foreign key clause can specify that if a delete or update action on the
referenced relation violates the constraint, then, instead of rejecting the action, the system must
take steps to change the tuple in the referencing relation to restore the constraint. Consider this
definition of an integrity constraint on the relation account:
create table account (
...
foreign key(branch-name) references branch
on delete cascade
on update cascade,
...)
Because of the clause on delete cascade associated with the foreign-key declaration, if a delete of
a tuple in branch results in this referential-integrity constraint being violated, the system does not
reject the delete. Instead, the delete “cascades” to the account relation, deleting the tuple that
refers to the branch that was deleted.
Similarly, the system does not reject an update to a field referenced by the constraint if it
violates the constraint; instead, the system updates the field branch-name in the referencing
tuples in account to the new value as well.
SQL also allows the foreign key clause to specify actions other than cascade, if the
constraint is violated:
The referencing field (here, branch-name) can be set to null (by using set null in place of
cascade), or to the default value for the domain (by using set default).
If there is a chain of foreign-key dependencies across multiple relations, a deletion or
update at one end of the chain can propagate across the entire chain.

Database Management Systems (DBMS) 3


Null values complicate the semantics of referential integrity constraints in SQL.
Attributes of foreign keys are allowed to be null, provided that they have not been declared to be
non-null.
If all the columns of a foreign key are non-null in a given tuple, the usual definition of
foreign-key constraints is used for that tuple. If any of the foreign-key columns is null, the tuple
is defined automatically to satisfy the constraint. SQL also provides constructs that allow you to
change the behavior with null values;
Transactions may consist of several steps, and integrity constraints may be violated
temporarily after one step, but a later step may remove the violation.

f) Integrity Constraint Violation During a Transaction


Transactions may consist of several steps, and integrity constraints may be violated
temporarily after one step, but a later step may remove the violation. For instance, suppose we
have a relation person with primary key name, and an attribute spouse, and suppose that spouse is
a foreign key on person. That is, the constraint says that the spouse attribute must contain a name
that is present in the person table. Suppose we wish to note the fact that John and Mary are
married to each other by inserting two tuples, one for John and one for Mary, in the above
relation, with the spouse attributes set to Mary and John. The insertion of the first tuple would
violate the foreign-key constraint, regardless of which of the two tuples is inserted first. After the
second tuple is inserted the foreign-key constraint would hold again.
To handle such situations, the SQL standard allows a clause initially deferred to be added
to a constraint specification; the constraint would then be checked at the end of a transaction, and
not at intermediate steps. A constraint can alternatively be specified as deferrable, which means
it is checked immediately by default, but can be deferred when desired. For constraints declared
as deferrable, executing a statement set constraints constraint-list deferred as part of a
transaction causes the checking of the specified constraints to be deferred to the end of that
transaction.

Question 2: What is assertion? Give SQL construct for assertion and


explain with examples.

g) Question
Assertion 3: Define assertion with example:
An assertion is a condition that we wish the database always to satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions. There are many
constraints that cannot be express by using domain and referential-integrity constraints. For e.g.
1. The sum of all loan amounts for each branch must be less than the sum of all account
balances at the branch.

4 Database Management Systems (DBMS)


2. Every loan has at least one customer who maintains an account with a minimum
balance of 500.00
Syntax:
create assertion <assertion-name> check <condition>;
The assertion-name is used to identify the constraints specified by the assertion and can be used
for modification and deletion of assertion, whenever required. DBMS tests the assertion for its
validity when it is created.
An assertion is implemented by writing a query that retrieves any tuple that violetes the
specified condition. Then this query is placed inside a not exists clause, which indicates that the
result of this query must be empty. Hence the assertion is violated whenever the result of this
query is not empty.
Example:
The price of textbook must not be less than the minimum price of a novel, the assertion
for this requirement can be specified as:
create assertion price-constraint check(
not exists ( select *
from book
where category = 'Textbook' and
( Price < (select min(price)
from book
where category = 'Novel')
When an assertion is created, the system tests it for validity. If the assertion is valid then
any future modification to the database is allowed only if it does not cause that assertion to be
violated. This testing may introduce a significant amount of overhead if complex assertions have
been made. Hence, assertion should be used with great care.

2. SQL Data Types and Schemas


We have seen number of built-in data types supported in SQL, such as integer types, real
types, and character types. There are additional built-in data types supported by SQL.

a) Date and Time Types in SQL


The SQL standard supports several data types relating to dates and times:
 date: A calendar date containing a (four-digit) year, month, and day of the month.
 time: The time of day, in hours, minutes, and seconds. A variant, time(p), can be used to
specify the number of fractional digits for seconds. It is also possible to store time-zone
information along with the time by specifying time with timezone.
 timestamp: A combination of date and time. A variant, timestamp(p), can be used to

Database Management Systems (DBMS) 5


specify the number of fractional digits for seconds. Time-zone information is also
stored if with timezone is specified.
Date and time values can be specified like this:
date’2005-02-21’
time’08:20:00’
timestamp’2005-02-21 09:12:01.45’
Dates must be specified in the format year followed by month followed by day as shown. The
seconds field of time or timestamp can have a fractional part, as in the timestamp above.
We can use an expression of the form cast e as t to convert a character string (or string
valued expression) e to the type t, where t is one of date, time, or timestamp.
SQL defines several functions to get the current date and time. For example,
current_date returns the current date, current_time returns the current time (with time zone),
and local_time returns the current local time (without time zone). Timestamps (date plus time)
are returned by current_timestamp (with time zone) and local_timestamp(local date and time
without time zone).
SQL allows comparison operations on all the types, and it allows both arithmetic and
comparison operations on the various numeric types. SQL also provides a data type called
interval, and it allows computations based on dates and times and on intervals. For example, if x
and y are of type date, then x – y is an interval whose value is the number of days from date x to
date y. Similarly, adding or subtracting an interval to a date or time gives back a date or time.

b) Default Values
SQL allows a default value to be specified for an attribute as shown by the following
create table statement:
create table student(
ID varchar(5),
name varchar(20) not null,
dept_name varchar(20),
tot_cred numeric(3,0) default 0,
primary key(ID));
The default value of the tot_cred attribute is declared to be 0. As a result, when a tuple is inserted
into the student relation, if no value is provided for the tot_cred attribute, its value is set to 0. The
following insert statement shows how an insertion can omit the value for the tot_cred attribute.
insert into student(ID, name, dept_name)
values(’12789’, ’Newman’, ’Comp. Sci.’);

c) Index Creation

6 Database Management Systems (DBMS)


Many queries reference only a small proportion of the records in a file. For
example, a query like “Find all instructors in the Physics department” or “Find the tot_cred
value of the student with ID 22201” references only a fraction of the student records. It is
inefficient for the system to read every record and to check ID field for the ID “32556,” or the
building field for the value “Physics”.
An index on an attribute of a relation is a data structure that allows the database system
to find those tuples in the relation that have a specified value for that attribute efficiently, without
scanning through all the tuples of the relation. For example, if we create in index on attribute ID
of relation student, the database system can find the record with any specified ID value, such as
22201, or 44553, directly, without reading all the tuples of the student relation. An index can also
be created on a list of attributes, for example on attributes name, and dept_name of student. Many
databases support index creation using the syntax given below.
create index studentID_index on student (ID);
The above statement creates an index named studentID_index on the attribute ID of the relation
student. When a user submits an SQL query that can benefit from using an index, the SQL query
processor automatically uses the index. For example, given an SQL query that selects the student
tuple with ID 22201, the SQL query processor would use the index studentID_index defined
above to find the required tuple without reading the whole relation.

d) Large-Object Types
Many current-generation database applications need to store attributes that can be large
(of the order of many kilobytes), such as a photograph, or very large (of the order of many
megabytes or even gigabytes), such as a high-resolution medical image or video clip. SQL
provides large-object data types for character data (clob) and binary data (blob). The letters “lob”
in these data types stand for “Large OBject.” For example, we may declare attributes
book_review clob(10KB)
image blob(10MB)
movie blob(2GB)
For result tuples containing large objects (multiple megabytes to gigabytes), it is inefficient or
impractical to retrieve an entire large object into memory. Instead, an application would usually
use an SQL query to retrieve a “locator” for a large object and then use the locator to manipulate
the object from the host language in which the application itself is written. For instance, the
JDBC application program interface permits a locator to be fetched instead of the entire large
object; the locator can then be used to fetch the large object in small pieces, rather than all at
once.

e) User-Defined Types
SQL supports two forms of user-defined data types. The first form is called distinct

Database Management Systems (DBMS) 7


types. The other form, called structured data types, allows the creation of complex data
types with nested record structures, arrays, and multisets.
It is possible for several attributes to have the same data type. For example, the name
attributes for student name and instructor name might have the same domain: the set of all person
names.
A domain constraint not only allows us to test values inserted in the database, but also
permits us to test queries to ensure that the comparisons made make sense.
The create domain clause can be used to define new domains (or new data types). For
example, the statements:
create domain Dollars numeric(12, 2)
create domain Pounds numeric(12, 2)
define the domains Dollars and Pounds to be decimal numbers with a total of 12 digits, two of
which are placed after the decimal point.
An attempt to assign a value of type Dollars to a variable of type Pounds would result in
a syntax error, although both are of the same numeric type.
Values of one domain can be converted (cast) to another domain. If the attribute A or
relation r is of type Dollars, we can convert it to Pounds by writing
cast r.A as Pounds
In an application we would multiply r.A by a currency conversion factor before converting it to
pounds.
 SQL also provides drop domain and alter domain clauses to drop or modify domains
that have been created earlier.
 The check clause in SQL permits domains to be restricted, the clause permits the schema
designer to specify a condition that must be satisfied by any value assigned to a variable
whose type is the domain.
For example, a check clause can ensure that an hourly wage domain allows only values
greater than a specified value (such as the minimum wage):
create domain HourlyWage numeric(5, 2)
constraint test check (value >= 3.00)
The domain HourlyWage has a constraint that ensures that the hourly wage is greater than
3.00.The clause constraint test is optional, and is used to give the name test to the constraint. The
name is used to indicate which constraint an update violated. The check clause can also be used
to restrict a domain to not contain any null values:
create domain AccountNumber char(10)
constraint null-test check(value not null)
The domain can be restricted to contain only a specified set of values by using the in clause:
create domain AccountType char(10)
constraint type-test check(value in (’Checking’, ’Saving’))

8 Database Management Systems (DBMS)


The check conditions can be tested, when a tuple is inserted or modified.

f) Create Table Extensions


Applications often require creation of tables that have the same schema as an existing table. SQL
provides a create table like extension to support this task:
create table temp_instructor like instructor;
The above statement creates a new table t temp_instructor that has the same schema as
instructor. When writing a complex query, it is useful to store the result of a query as a new table;
the table is usually temporary. Two statements are required, one to create the table (with
appropriate columns) and the second to insert the query result into the table.
SQL:2003 provides a simpler technique to create a table containing the results of a query.
For example the following statement creates a table t1 containing the results of a query.
create table t1 as (
select *
from instructor
where dept_name = ’Music’)
with data;
By default, the names and data types of the columns are inferred from the query result. Names
can be explicitly given to the columns by listing the column names after the relation name. If the
with data clause is omitted, the table is created but not populated with data.

g) Schemas, Catalogs and Environments


To understand the motivation for schemas and catalogs, consider how files are named in
a file system. Early file systems were flat; that is, all files were stored in a single directory.
Current file systems have a directory structure, with files stored within subdirectories. To name a
file uniquely, we must specify the full path name of the file, for example,
/users/cse/dbs-book/chapter1.ppt.
Like early file systems, early database systems also had a single name space for all relations.
Users had to coordinate to make sure they did not try to use the same name for different relations.
Modern database systems provide a three-level hierarchy for naming relations. The top level of
the hierarchy consists of catalogs, each of which can contain schemas. SQL objects such as
relations and views are contained within a schema.
In order to perform any actions on a database, a user (or a program) must first connect to
the database. The user must provide the user name and a password for verifying the identity of
the user. Each user has a default catalog and schema, and the combination is unique to the user.
When a user connects to a database system, the default catalog and schema are set up for the
connection; this corresponds to the current directory being set to the user’s home directory when
the user logs into an operating system. To identify a relation uniquely, a three-part name may be

Database Management Systems (DBMS) 9


used, for example,
catalog2.studschema.course
We may omit the catalog component, in which case the catalog part of the name is considered to
be the default catalog for the connection. Thus if catalog2 is the default catalog, we can use
studschema.course to identify the same relation uniquely.
We can create and drop schemas by means of create schema and drop schema
statements. In most database systems, schemas are also created automatically when user accounts
are created, with the schema name set to the user account name. The schema is created in either a
default catalog, or a catalog specified in creating the user account. The newly created schema
becomes the default schema for the user account.

3. Authorization
Administrator may assign a user different form of authorization on parts of the database.
1. Read authorization: Allows reading, but not modification, of data.
2. Insert authorization: Allows insertion of new data, but not modification of existing data.
3. Update authorization: Allows modification, but not deletion, of data.
4. Delete authorization: Allows deletion of data.
Each of these types of authorizations is called a privilege. We may authorize the user all,
none, or a combination of these types of privileges on specified parts of a database, such as a
relation or a view.
When a user submits a query or an update, the SQL implementation first checks if the
query or update is authorized, based on the authorizations that the user has been granted. If the
query or update is not authorized, it is rejected. In addition to authorizations on data, users may
also be granted authorizations on the database schema, allowing them, for example, to create,
modify, or drop relations. A user who has some form of authorization may be allowed to pass on
(grant) this authorization to other users, or to withdraw (revoke) an authorization that was granted
earlier.

a) Granting and Revoking of Privileges


The SQL standard provides the privileges delete, insert, select, and update. The privilege all
privileges can be used as a short form for all the allowable privileges. A user who creates a new
relation is given all privileges on that relation automatically.
The SQL data-definition language includes commands to grant and revoke privileges.
The grant statement is used to give authorization. The basic form of this statement is:
grant <privilege-list>
on <relation name or view name>
to <user/role list>;
where privilege-list allows the granting of several privileges.

10 Database Management Systems (DBMS)


Example:
The select authorization on a relation is required to read tuples in the relation. The
following grant statement grants database users P, Q and R authorization on the student relation.
grant select on student to P, Q, R;
This allows those users to run queries on the student relation.

i) Update Privilege
The update authorization may be given either on all attributes of the relation or on only
some. If update authorization is included in a grant statement, the list of attributes on which
update authorization is to be granted optionally appears in parentheses immediately after the
update keyword. If the list of attributes is omitted, the update privilege will be granted on all
attributes of the relation.

Example:
Following grant statement gives users P, Q, and R update authorization on the name
attribute of the student relation:
grant update (name) on student to P, Q, R;

ii) Insert Privilege


The insert privilege specify a list of attributes; any inserts to the relation must specify
only these attributes, and the system either gives each of the remaining attributes default values
or sets them to null.
Example: Following grant statement gives users P, Q, and R insert authorization on the student
relation:
grant insert on student to P, Q, R;

iii) Transfer of Privileges


The references privilege is granted on specific attributes in a same way that for the
update privilege.
Example:
Following grant statement allows user P to create relations that reference the key branch-
name of the branch relation as a foreign key:
grant references (branch-name) on branch to P;
The privilege all privileges can be used all the allowable privileges. Also, the user name
public refers to all current and future users of the system.
By default, a user/role that is granted a privilege is not authorized to grant that privilege
to another user/role. If we wish to grant a privilege and to allow the recipient to pass the privilege
on to other users, we append the with grant option clause to the appropriate grant command. For

Database Management Systems (DBMS) 11


example, if we wish to allow P the select privilege on branch and allow P to grant this
privilege to others, we write
grant select on branch to P with grant option

b) Revoke Privileges
To revoke an authorization, we use the revoke statement.
Syntax:
revoke <privilege list> on <relation name or view name> from <user/role list>
[restrict | cascade];
Thus, to revoke the privileges that we granted previously, we write
revoke select on student from P, Q, R;
revoke update (name) on student from P, Q, R;
revoke insert on student from P, Q, R;
revoke references (branch-name) on branch from P;
The revocation of a privilege from a user/role may cause other users/roles also to lose that
privilege. This behavior is called cascading of the revoke. The revoke statement may alternatively
specify restrict:
revoke select on student from P, Q, R restrict
In this case, the system returns an error if there are any cascading revokes, and does not carry out
the revoke action. The following revoke statement revokes only the grant option, rather than the
actual select privilege:
revoke grant option for select on student from P

c) Authorization and Views


View provides a user with a personalized model of the database. A view can hide data
that a user does not need to see. The ability of views to hide data serves both to simplify usage of
the system and to enhance security.
Views simplify system usage as they restrict the user’s attention to the data of interest.
Although a user may be denied direct access to a relation, that user may be allowed to access part
of that relation through a view.
Therefore, a combination of relational-level security and view-level security limits a
user’s access to precisely the data that the user needs.
In banking example, consider a clerk who needs to know the names of all customers who
have a loan at each branch. This clerk is not authorized to see information regarding specific
loans that the customer may have. Thus, the clerk must be denied direct access to the loan
relation.

12 Database Management Systems (DBMS)


But, if he is to have access to the information needed, the clerk must be granted
access to the view cust-loan, which consists of only the names of customers and the
branches at which they have a loan. This view can be defined in SQL as follows:
create view cust-loan as
(select branch-name, customer-name
from borrower, loan
where borrower.loan-number = loan.loan-number)
Suppose the clerk issues the following SQL query:
select *
from cust-loan
The clerk is authorized to see the result of this query. But, when the query processor translates it
into a query on the actual relations in the database, it produces a query on borrower and loan.
Thus, the system must check authorization on the clerk’s query before it begins query processing.

Question 3: What is role? Explain authorization grant graph.

d) Authorization Grant Graph


A user who has been granted some form of authorization may be allowed to pass on this
authorization to other users. But such authorization can be revoked at some future time.
Example:
Suppose the database administrator grants update authorization on loan to users P, Q, and
R, who may in turn pass on this authorization to other users. The passing of authorization from
one user to another can be represented by an authorization graph.
The nodes of this graph are the users. The graph includes an edge P  Q if user P grants
update authorization on loan to Q. The root of the graph is the database administrator. In the
graph shown in Figure 1, user T is granted authorization by both P and Q; S is granted
authorization by only P.
P S

DBA Q T

Figure 1: Authorization-grant graph.


A user has an authorization if and only if there is a path from the root of the authorization
graph down to the node representing the user. If suppose the database administrator decides to
revoke the authorization of user P. Since S has authorization from P, that authorization should be
revoked as well. But, T was granted authorization by both P and Q. As the database administrator
did not revoke update authorization on loan from Q, T retains update authorization on loan. If Q

Database Management Systems (DBMS) 13


revokes authorization from T, then T loses the authorization.
A pair of tricky users might attempt to overcome the rules for revocation of authorization
by granting authorization to each other, as shown in Figure2(a). If the database administrator
revokes authorization from Q, Q retains authorization through R, as in Figure2(b).
If authorization is revoked subsequently from R, R appears to retain authorization
through Q, as in Figure 2(c). But, when the database administrator revokes authorization from R,
the edges from R to Q and from Q to R are no longer part of a path starting with the database
administrator.
DBA DBA

P Q R P Q R
(a) (b)
DBA DBA

P Q R P Q R
(c) (d)
Figure 2: Attempt to beat authorization revocation
We require that all edges in an authorization graph be part of some path originating with
the database administrator. The edges between Q and R are deleted, and the resulting
authorization graph is shown in Figure 2(d).

e) Role
Consider a college where there are many students. Each student must have the same
types of authorizations to the same set of relations. Whenever a new student is joined, he/she will
be given all these authorizations individually.
A better scheme would be to specify the authorizations that every student is to be given,
and to separately identify which database users are student. The system can use these two pieces
of information to determine the authorizations of each person who is a student.
When a new person is joined as a student, a user identifier must be allocated to him, and
he must be identified as a student. Individual permissions given to students need not be specified
again.
A set of roles is created in the database. Authorizations can be granted to roles, in exactly
the same way as they are granted to individual users. Each database user is granted a set of roles
that he or she is authorized to perform.
Another alternative would be to create a student userid, and permit each student to
connect to the database using the student userid. The problem with this scheme is that it would

14 Database Management Systems (DBMS)


not be possible to identify exactly which student carried out a transaction, leading to
security risks. The use of roles has the benefit of requiring users to connect to the database
with their own userid.
Any authorization that can be granted to a user can be granted to a role. Roles are granted
to users just as authorizations are. And like other authorizations, a user may also be granted
authorization to grant a particular role to others. Roles can be created in SQL as follows:
create role role-name;
where role-name is the name of roles to create.
Example:
The following statement creates a teller role.
create role teller;
Roles can then be granted privileges same as the users can, as shown in following statement:
grant select on student to teller;

Question 4: Explain with an example the use of triggers. What


precautions are to be taken while using triggers?
Question 5: Define trigger. Explain need for trigger with example.

4. Trigger
A trigger is a procedure that is automatically executed in response to specified changes
made to the database like insert, update, delete etc., and is specified by the database
administrator. Triggers do not accept any arguments. The main aim of trigger is to maintain data
integrity and also one can design trigger for recording information.
A database that has a set of associated triggers is called an active database. A trigger
description contains three parts:
1. Event: A change to the database that activates the trigger.
2. Condition: A query or test that is run when the trigger is activated.
3. Action: A procedure that is executed when the trigger is activated and its condition is
true.
The model of triggers is referred to as the event-condition-action model for triggers. A
trigger can be thought of as a `tool’ that monitors a database, and is executed when the database
is modified in a way that matches the event specification.
 A condition in a trigger can be a true/false statement or a query. If the condition part
evaluates to true, the action associated with the trigger is executed.
 A trigger action can examine the answers to the query in the condition part of the trigger,
refer to old and new values of tuples modified by the statement activating the trigger,
execute new queries, and make changes to the database.
An action can even execute a series of data-definition commands (e.g., create new tables,

Database Management Systems (DBMS) 15


change authorizations) and transaction-oriented commands or call host-language
procedures.
Syntax:
create trigger <trigger-name>
[before or after] [insert or update or delete] on <relation-name>
[for each row]
when <condition>
<statements>;
Example:
When the value in phone attribute of a new inserted tuple of relation author is empty,
indicating the absence of phone number, the trigger to insert a null value in this attribute can be
specified as:
create trigger set-null-phone
before insert on author
referencing new row as nr
for each row
when nr.phone = ' '
set nr.phone = NULL;
 The for each row clause makes sure that a trigger is executed for every single tuple
processed. Such a type of trigger is known as row level trigger., whereas the trigger that
is executed only once for a specified statement, regardless of the number of tuples being
effected as a result of that statement, is known as a statement level trigger.
 The for each statement clause specifies the trigger as a statement-level trigger. Trigger
can be enabled and disabled by using the alter trigger command and the trigger that are
not required can be removed by using the drop trigger command.

a) Need for Triggers


Triggers are useful mechanism for dealing with the changes made to the database as they
can be used for the following purposes:
1. Implementing and maintaining complex integrity constraints.
2. Triggers can generate a log of events to support auditing and security checks.
3. Automatically passing a signal to other programs that action needs to be taken whenever
changes are made to relation.
4. Triggers are useful mechanism for alerting humans or for starting certain tasks
automatically when certain conditions are met.
Example:

16 Database Management Systems (DBMS)


Suppose instead of allowing negative account balances, the bank deals with
overdrafts by setting the account balance to zero, and creating a loan in the amount of the
overdraft. The bank gives this loan a loan number identical to the account number of the
overdrawn account.
For this example, the condition for executing the trigger is an update to the account
relation that results in a negative balance value. Suppose that Smith withdraw some money from
an account and made the account balance negative. Let t denote the account tuple with a negative
balance value. The actions to be taken are:
 Insert a new tuple s in the loan relation with
s[loan-number] = t[account-number]
s[branch-name] = t[branch-name]
s[amount] = – t[balance]
 Insert a new tuple u in the borrower relation with
u[customer-name] = “Smith”
u[loan-number] = t[account-number]
 Set t[balance] to 0.

b) When Not to Use Triggers


Triggers should be written with great care, since a trigger error detected at run time
causes the failure of the insert/delete/update statement that set off the trigger. Also, the action of
one trigger can set off another trigger. In the worst case, this could even lead to an infinite chain
of triggering.
For example, suppose an insert trigger on a relation has an action that causes another
(new) insert on the same relation. The insert action then triggers yet another insert action, and so
on infinite times. Database systems typically limit the length of such chains of triggers and
consider longer chains of triggering an error.

5. Features of Good Relational Designs


It is possible to generate a set of relation schemas directly from the E-R design. The
goodness of the resulting set of schemas depends on how good the E-R design was in the first
place. Consider the university database schemas.
classroom(building, roomnumber, capacity)
department(deptname, building, budget)
course(courseid, title, deptname, credits)
instructor(ID, name, deptname, salary)
section(courseid, secid, semester, year, building, roomnumber, timeslotid)
teaches(ID, courseid, secid, semester, year)
student(ID, name, deptname, totcred)

Database Management Systems (DBMS) 17


takes(ID, courseid, secid, semester, year, grade)
advisor(sID, iID)
timeslot(timeslotid, day, starttime, endtime)
prereq(courseid, prereqid)
Figure 3: Schema for the university database.

a) Design Alternative: Larger Schemas


Suppose that instead of having the schemas instructor and department, we have the schema:
instdept(ID, name, salary, deptname, building, budget)
This represents the result of a natural join on the relations corresponding to instructor and
department. This seems like a good idea because some queries can be expressed using fewer
joins.
Let us consider the instance of the instdept relation shown in Figure 4. We have to repeat
the department information (“building” and “budget”) once for each instructor in the department.
For example, the information about the Comp. Sci. department (Taylor, 100000) is included in
the tuples of instructors Katz, Srinivasan, and Brandt.
ID name salary deptname building budget
22222 Einstein 95000 Physics Watson 70000
12121 Wu 90000 Finance Painter 120000
32343 El Said 60000 History Painter 50000
45565 Katz 75000 Comp. Sci. Taylor 100000
98345 Kim 80000 Elec. Eng. Taylor 85000
76766 Crick 72000 Biology Watson 90000
10101 Srinivasan 65000 Comp. Sci. Taylor 100000
58583 Califieri 62000 History Painter 50000
83821 Brandt 92000 Comp. Sci. Taylor 100000
15151 Mozart 40000 Music Packard 80000
33456 Gold 87000 Physics Watson 70000
76543 Singh 80000 Finance Painter 120000
Figure 4: The instdept table.
It is important that all these tuples agree as to the budget amount since otherwise our
database would be inconsistent. In our original design using instructor and department, we stored
the amount of each budget exactly once. This suggests that using instdept is a bad idea since it
stores the budget amounts redundantly and runs the risk that some user might update the budget
amount in one tuple but not all, and thus create inconsistency.

Even if we decided to live with the redundancy problem, there is still another problem
with the instdept schema. Suppose we are creating a new department in the university. In the
18 Database Management Systems (DBMS)
alternative design above, we cannot represent directly the information concerning a
department (deptname, building, budget) unless that department has at least one instructor at
the university. This is because tuples in the instdept table require values for ID, name, and salary.
This means that we cannot record information about the newly created department until the first
instructor is hired for the new department.
In the old design, the schema department can handle this, but under the revised design,
we would have to create a tuple with a null value for building and budget. In some cases null
values are difficult.

b) Design Alternative: Smaller Schemas


By observing the contents of actual relations on schema instdept, we could note the
repetition of information resulting from having to list the building and budget once for each
instructor associated with a department. However, this is an unreliable process. A real-world
database has a large number of schemas and an even larger number of attributes.
In our example, how would we know that in our university organization, each department
must reside in a single building and must have a single budget amount? In the case of instdept,
our process of creating an E-R design successfully avoided the creation of this schema. However,
this unexpected situation does not always occur. Therefore, we need to allow the database
designer to specify rules such as “each specific value for deptname corresponds to at most one
budget” even in cases where deptname is not the primary key for the schema in question.
In other words, we need to write a rule that says “if there were a schema (deptname,
budget), then deptname is able to serve as the primary key.” This rule is specified as a functional
dependency
deptname → budget
Given such a rule, we now have sufficient information to recognize the problem of the instdept
schema. Because deptname cannot be the primary key for instdept (because a department may
need several tuples in the relation on schema instdept), the amount of a budget may have to be
repeated.
Observations such as these and the rules that result from them allow the database
designer to recognize situations where a schema must to be split, or decomposed, into two or
more schemas. It is not hard to see that the right way to decompose instdept is into schemas
instructor and department as in the original design. Finding the right decomposition is much
harder for schemas with a large number of attributes and several functional dependencies.
Not all decompositions of schemas are helpful. Consider an extreme case where all we had were
schemas consisting of one attribute. No interesting relationships of any kind could be expressed.
Now consider a less extreme case where we choose to decompose the employee schema:
employee(ID, name, street, city, salary)
into the following two schemas:

Database Management Systems (DBMS) 19


employee1(ID, name)
employee2(name, street, city, salary)
The fault in this decomposition arises from the possibility that the enterprise has two employees
with the same name. Each person would have a unique employee-id, which is why ID can serve
as the primary key. As an example, let us assume two employees, both named Kim, work at the
university and have the following tuples in the relation on schema employee in the original
design:
(57766, Kim, Main, Perryridge, 75000)
(98776, Kim, North, Hampton, 67000)
ID name street city salary
.. ... ... ... ...
57766 Kim Main Perryridge 75000
98776 Kim North Hampton 67000
... ... ... ... ...

ID name name street city salary


.. ... ... ... ... ...
57766 Kim Kim Main Perryridge 75000
98776 Kim Kim North Hampton 67000
... ... ... ... ... ...

natural join
ID name street city salary
.. ... ... ... ...
57766 Kim Main Perryridge 75000
98776 Kim North Hampton 67000
... ... ... ... ...
Figure 5: Loss of information via a bad decomposition

Figure 5 shows these tuples, the resulting tuples using the schemas resulting from the
decomposition, and the result if we attempted to regenerate the original tuples using a natural
join. As we see in the figure, the two original tuples appear in the result along with two new
tuples that incorrectly mix data values pertaining to the two employees named Kim. Although we
have more tuples, we actually have less information in the following sense. We can indicate that a
certain street, city, and salary pertain to someone named Kim, but we are unable to distinguish
which of the Kims. Thus, our decomposition is unable to represent certain important facts about
20 Database Management Systems (DBMS)
the university employees. Clearly, we would like to avoid such decompositions. We shall
refer to such decompositions as being lossy decompositions and to those that are not as
lossless decompositions.

6. Atomic Domains and First Normal Form


The E-R model allows entity sets and relationship sets to have attributes that have some
degree of substructure. It allows multivalued attributes such as phone number and composite
attributes (such as an attribute address with component attributes street, city, state, and zip).
When we create tables from E-R designs that contain these types of attributes, we eliminate this
substructure.
For composite attributes, we let each component be an attribute in its own right. For
multivalued attributes, we create one tuple for each item in a multivalued set. In the relational
model, we formalize this idea that attributes do not have any substructure.
A domain is atomic if elements of the domain are considered to be indivisible units. We
say that a relation schema R is in first normal form (1NF) if the domains of all attributes of R
are atomic.
A set of names is an example of a non atomic value. For example, if the schema of a
relation employee included an attribute children whose domain elements are sets of names, the
schema would not be in first normal form. Composite attributes, such as an attribute address with
component attributes street, city, state, and zip also have non atomic domains.
Integers are assumed to be atomic, so the set of integers is an atomic domain; however,
the set of all sets of integers is a nonatomic domain. The domain of all integers would be
nonatomic if we considered each integer to be an ordered list of digits.
Consider an organization that assigns employees identification numbers of the following
form: The first two letters specify the department and the remaining four digits are a unique
number within the department for the employee. Examples of such numbers would be
“CS001”and“EE1127”. Such identification numbers can be divided into smaller units, and are
therefore nonatomic.
If a relation schema had an attribute whose domain consists of identification numbers
encoded as above, the schema would not be in first normal form. When such identification
numbers are used, the department of an employee can be found by writing code that breaks up the
structure of an identification number. Doing so requires extra programming, and information gets
encoded in the application program rather than in the database. Further problems arise if such
identification numbers are used as primary keys: When an employee changes departments, the
employee’s identification number must be changed everywhere it occurs, which can be a difficult
task, or the code that interprets the number would give a wrong result.
From the above discussion, it may appear that our use of course identifiers such as “CS-
101”, where “CS” indicates the Computer Science department, means that the domain of course

Database Management Systems (DBMS) 21


identifiers is not atomic. Such a domain is not atomic as far as humans using the system are
concerned. However, the database application still treats the domain as atomic, as long as it does
not attempt to split the identifier and interpret parts of the identifier as a department abbreviation.
The course schema stores the department name as a separate attribute, and the database
application can use this attribute value to find the department of a course, instead of interpreting
particular characters of the course identifier. Thus, university schema can be considered to be in
first normal form.
The use of set-valued attributes can lead to designs with redundant storage of data, which
in turn can result in inconsistencies. For instance, instead of having the relationship between
instructors and sections being represented as a separate relation teaches, a database designer may
be tempted to store a set of course section identifiers with each instructor and a set of instructor
identifiers with each section. Whenever data pertaining to which instructor teaches which section
is changed, the update has to be performed at two places: in the set of instructors for the section,
and the set of sections for the instructor. Failure to perform both updates can leave the database in
an inconsistent state.

7. Decomposition Using Functional Dependencies


There is a formal methodology for evaluating whether a relational schema should be
decomposed. This methodology is based upon the concepts of keys and functional dependencies.

a) Keys and Functional Dependencies


A database models a set of entities and relationships in the real world. There are usually a
variety of constraints (rules) on the data in the real world. For example, some of the constraints
that are expected to hold in a university database are:
1. Students and instructors are uniquely identified by their ID.
2. Each student and instructor has only one name.
3. Each instructor and student is (primarily) associated with only one department.
4. Each department has only one value for its budget, and only one associated building.
An instance of a relation that satisfies all such real-world constraints is called a legal instance of
the relation; a legal instance of a database is one where all the relation instances are legal
instances. Some of the most commonly used types of real-world constraints can be represented
formally as keys (superkeys, candidate keys and primary keys), or as functional dependencies.
A superkeyas a set of one or more attributes that, taken collectively, allows us to identify
uniquely a tuple in the relation. We restate that definition here as follows:
Let r(R) be a relation schema. A subset K of R is a superkey of r(R) if, in any legal
instance of r(R), for all pairs t1 and t2 of tuples in the instance of r if t1  t2, then t1[K]  t2[K].
That is, no two tuples in any legal instance of relation r(R) may have the same value on attribute

22 Database Management Systems (DBMS)


set K. Clearly, if no two tuples in r have the same value on K, then a K-value uniquely
identifies a tuple in r.
Whereas a superkey is a set of attributes that uniquely identifies an entire tuple, a
functional dependency allows us to express constraints that uniquely identify the values of certain
attributes. Consider a relation schema r(R), and let   R and   R.
 Given an instance of r(R), we say that the instance satisfies the functional dependency
 →  if for all pairs of tuples t1 and t2 in the instance such that t1[] = t2 [], it is also
the case that t1[] = t2 [].
 We say that the functional dependency →  holds on schema r(R) if, in every legal
instance of r(R) it satisfies the functional dependency.
Using the functional-dependency notation, we say that K is a superkey of r(R) if the
functional dependency K→ R holds on r(R). In other words, K is a superkey if, for every legal
instance of r(R), for every pair of tuples t1 and t2 from the instance, whenever t1[K] = t2[K], it is
also the case that t1[R] = t2[R] (that is, t1 = t2).
Functional dependencies allow us to express constraints that we cannot express with
superkeys. We considered the schema:
instdept(ID, name, salary, deptname, building, budget)
in which the functional dependency deptname → budget holds because for each department
(identified by deptname) there is a unique budget amount. We denote the fact that the pair of
attributes (ID, deptname) forms a superkey for instdep tby writing:
ID, deptname → name, salary, building, budget
We shall use functional dependencies in two ways:
1. To test relations to see whether they are legal under a given set of functional
dependencies. If a relation r is legal under a set F of functional dependencies, we say that
r satisfies F.
2. To specify constraints on the set of legal relations. If we wish to constrain ourselves to
relations on schema R that satisfy a set F of functional dependencies, we say that F holds
on R.
Consider the relation r shown in table, to see which functional dependencies are satisfied.
Observe that A  C is satisfied. There are two tuples that have an A value of a1. These tuples
have the same C value namely, c1. Similarly, the two tuples with an A value of a2 have the same
C value, c2. There are no other pairs of distinct tuples that have the same A value. The functional
dependency C  A is not satisfied.
A B C D
a1 b1 c1 d1
a1 b1 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4
Database Management Systems (DBMS) 23
Table: Sample instance of relation r.
To see that it is not, consider the tuples t1 = (a2, b3, c2, d3) and t2 = (a3, b3, c2, d4). These
two tuples have the same C values, c2, but they have different A values, a2 and a3, respectively.
Thus, we have found a pair of tuples t1 and t2 such that t1[C] = t2[C], but t1[A]  t2[A].
Many other functional dependencies are satisfied by r, including, the functional
dependency AB  D. AB as a shorthand for {A, B}. Observe that there is no pair of distinct
tuples t1 and t2 such that t1[AB] = t2[AB]. Therefore, if t1[AB] = t2[AB], it must be that t1 = t2 and,
thus, t1[D] = t2[D]. So, r satisfies AB  D.

b) Trivial Functional Dependencies


Functional dependencies are said to be trivial as they are satisfied by all relations. For
example, A  A is satisfied by all relations involving attribute A. For all tuples t1 and t2 such that
t1[A] = t2[A], it is the case that t1[A] = t2[A].
Similarly, AB  A is satisfied by all relations involving attribute A. In general, a
functional dependency of the form α  β is trivial if β  α.
Example:
customer-name street city loan-number branch-name amount
Smith MG NA A-10 ABC 5000
John BK AK A-20 PQR 1000
Adams RG MU A-30 XYZ 2000
The customer relation The loan relation
branch-name branch-city asserts
ABC NAGPUR 10000
PQR MUMBAI 40000
XYZ GOA 89000
The branch relation
Consider the customer relation shown in above table; we see that customer-street 
customer-city is satisfied. But, two cities can have streets with the same name.
Thus, it is possible, at some time, to have an instance of the customer relation in which
customer-street  customer-city is not satisfied. So, we would not include customer-street 
customer-city in the set of functional dependencies that hold on customer-schema.
In the loan relation, we see that the dependency loan-number  amount is satisfied.
Thus, we want to require that loan-number  amount be satisfied by the loan relation at all
times. We require that the constraint loan-number  amount hold on loan-schema.
In the branch relation, we see that branch-name  assets is satisfied, as is assets 
branch-name. We want to require that branch-name  assets hold on branch-schema. Also, we
do not require that assets  branch-name hold, as it is possible to have several branches that

24 Database Management Systems (DBMS)


have the same asset value. In the banking example (see page no. 94), list of dependencies
includes the following:
o On branch-schema: o On loan-schema:
branch-name  branch-city loan-number  amount
branch-name  assets loan-number  branch-name
o On customer-schema: o On account-schema:
customer-name  customer-city account-number  branch-name
customer-name  customer-street account-number  balance
o On borrower-schema: o On depositor-schema:
No functional dependencies No functional dependencies

8. Functional-Dependency Theory
It is useful to be able to reason systematically about functional dependencies as part of a
process of testing schemas for BCNF or 3NF.

a) Closure of a Set of Functional Dependencies


We need to consider all functional dependencies that hold. Given a set F of functional
dependencies, we can prove that certain other functional dependencies hold. We say that such
functional dependencies are “logically implied” by F. In general, given a relational schema R, a
functional dependency f on R is logically implied by a set of functional dependencies F on R if
every relation instance r(R) that satisfies F also satisfies f.
Example: Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of functional
dependencies
AB
AC
CG  H
CG  I
BH
The functional dependency
AH
is logically implied, i.e., we can show that, whenever given set of functional dependencies holds
on a relation, A  H must also hold on the relation. Suppose that t1 and t2 are tuples such that
t1[A] = t2[A]
Since we are given that A  B, from the definition of functional dependency, we get
t1[B] = t2[B]
Then, since we are given that B  H, from the definition of functional dependency, we get
t1[H] = t2[H]
Therefore, we have shown that, whenever t1 and t2 are tuples such that t1[A] = t2[A], it must be

Database Management Systems (DBMS) 25


that t1[H] = t2[H]. But that is exactly the definition of A  H.
The closure of F, denoted by F+, is the set of all functional dependencies logically
implied by F. Given F, we can compute F+ directly from the definition of functional dependency.
If F were large, this process would be lengthy and difficult. Such a computation of F+
requires arguments of the type used to show that A  H is in the closure of our example set of
dependencies.

b) Multivalued Dependency
Functional dependencies rule out certain tuples from being in a relation. If A  B, then
we cannot have two tuples with the same A value but different B values. Multivalued
dependencies do not rule out the existence of certain tuples. Instead, they require that other tuples
of a certain form be present in the relation.
Thus, FDs sometimes are referred to as equality generating dependencies, and
multivalued dependencies are referred to as tuple generating dependencies. Let R be a relation
schema and let α  R and β  R. The multivalued dependency
αβ
holds on R.
Example: Consider the customer relation, we see that we want the multivalued dependency
customer-name   customer-street, customer-city
to hold. Multivalued dependencies can be used in two ways:
1. To test relations to determine whether they are legal under a given set of functional and
multivalued dependencies.
2. To specify constraints on the set of legal relations; we shall thus concern ourselves with
only those relations that satisfy a given set of functional and multivalued dependencies.
If a relation r fails to satisfy a given multivalued dependency, we can construct a relation
r’ that does satisfy the multivalued dependency by adding tuples to r.
Let D denote a set of functional and multivalued dependencies. The closure D+ of D is
the set of all functional and multivalued dependencies logically implied by D.
We can compute D+ from D, using the definitions of functional dependencies and
multivalued dependencies and derive the following rule:
If α  β, then α   β.
Every functional dependency is also a multivalued dependency.

c) Armstrong's Axioms
Axioms or rules of inference provide a technique for reasoning about functional
dependencies. In the rules that follow, we use letters (A, B, C, . . . ) for sets of attributes, and
uppercase Roman letters from the beginning of the alphabet for individual attributes. We use AB
to denote A  B.

Database Management Systems (DBMS) 141


We can use the following rules to find logically implied functional dependencies.
By applying these rules repeatedly, we can find all of F+, given F. This collection of rules is
called Armstrong’s axioms in honor of the person who first proposed it.
1. Reflexivity: If the B's are a subset of the A's then A  B.
2. Augmentation: If A  B, then A, C  B, C.
3. Transitivity: If A  B and B  C then A  C.
4. Decomposition: If A  B, C then A  B and A  C.

Proof:
Given A  B, C, then
B, C  B and B, C  C (by reflexivity)
A  B and A  C (by transitivity)

5. Pseudo-transitive: If A  B and C, B  D then C, A  D.


Proof:
Given (i) A  B and (ii) B, C  D, then
A, C  B, C (augmentation of (i) FD with C)
A, C  D (transitive rule using (ii) FD)

6. Union: If A  B and A  C then A  B, C.


Proof:
Given (i) A  B and (ii) A  C, then
A, A  A, B (augmentation of (i) FD with A)
A  A, B (as we are working with sets, AA = A) ….(a)
A, B  B, C (augmentation of (ii) FD with B) . ...(b)
A  B, C
The union and decomposition rules together, give us some using
(transitivity choices
(a)asand
to how
(b)) we choose to write a
set of FDs. For example, given the FDs
A  BC and A  DE
We could choose to write them as
AB
AC
AD
AE
A  BCDE
Following is the list for several members of F+ here:

142 Database Management Systems (DBMS)


1. A  H. Since A  B and B  H hold, we apply the transitivity rule. It is easier to
use Armstrong’s axioms to show that A  H.
2. CG  HI . Since CG  H and CG  I, the union rule implies that CG  HI.
3. AG  I. Since A  C and CG  I, the pseudo-transitivity rule implies that AG  I
holds.

Question 6: What is canonical cover? How to compute canonical


cover? Explain it by finding it for F = {A  BC, B  C, A  B, AB  C}, a
set of functional dependencies on schema (A, B, C)
d) Canonical Cover
A canonical cover Fc for F is a set of dependencies such that F logically implies all
dependencies in Fc, and Fc logically implies all dependencies in F. Also, Fc must have the
following properties:
1. No functional dependency in Fc contains an extraneous attribute: An attribute of a
functional dependency is said to be extraneous if we can remove it without c hanging the
closure of the set of functional dependencies.
2. Each left side of a functional dependency in Fc is unique. That is, there are no two
dependencies α1  β1 and α2  β2 in Fc such that α1 = α2.
A canonical cover for a set of functional dependencies F can be computed by using
following algorithm. When checking if an attribute is extraneous, the check uses the
dependencies in the current value of Fc, and not the dependencies in F.
If a functional dependency contains only one attribute in its right-hand side, for example
A  C, and that attribute is found to be extraneous, we would get a functional dependency with
an empty right-hand side. Such functional dependencies should be deleted.

Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1  β1 and α1  β2 with α1  β1 β2.
Find a functional dependency α  β in Fc with an extraneous attribute either in α or
in β.
If an extraneous attribute is found, delete it from α  β.
until Fc does not change.

The canonical cover of F, Fc, can be shown to have the same closure as F; hence, testing
whether Fc is satisfied is equivalent to testing whether F is satisfied. But, Fc is minimal in a
certain sense it does not contain extraneous attributes, and it Fc = F combines functional
dependencies with the same left side. It is cheaper to test Fc than it is to test F itself.

Database Management Systems (DBMS) 143


Example:
Consider the following set F of functional dependencies on schema (A, B, C):
A  BC
BC
AB
AB  C
Compute the canonical cover for F. There are two functional dependencies with the same set of
attributes on the left side of the arrow:
A  BC
AB
We combine these functional dependencies into A  BC.
1. A is extraneous in AB  C because F logically implies (F - {AB  C})  {B  C}.
This assertion is true because B  C is already in our set of functional dependencies.
2. C is extraneous in A  BC, since A  BC is logically implied by A  B and B  C.
Thus, canonical cover is
AB
BC

e) Lossless Decomposition
Let r(R) be a relation schema, and let F be a set of functional dependencies on r(R). Let
R1 and R2 form a decomposition of R. We say that the decomposition is a lossless decomposition
if there is no loss of information by replacing r(R) with two relation schemas r1(R1) and r2(R2).
In short, we say the decomposition is lossless if, for all legal database instances (that is,
database instances that satisfy the specified functional dependencies and other constraints),
relation r contains the same set of tuples as the result of the following SQL query:
select *
from (select R1 from r)
natural join
(select R2 from r)
This is stated in the relational algebra as:
 R1 (r)  R2 (r) = r
In other words, if we project r onto R1 and R2 and compute the natural join of the projection
results, we get back exactly r. A decomposition that is not a lossless decomposition is called a
lossy decomposition. The terms lossless-join decomposition and lossy-join decomposition are
sometimes used in place of lossless decomposition and lossy decomposition.
As an example of a lossy decomposition, consider the decomposition of the employee
schema into:
employee1(ID, name)

144 Database Management Systems (DBMS)


employee2(name, street, city, salary)
The result of employee1 employee2 is a superset of the original relation employee, but
the decomposition is lossy since the join result has lost information about which employee
identifiers correspond to which addresses and salaries, in the case where two or more employees
have the same name.
We can use functional dependencies to show when certain decompositions are lossless.
Let R, R1, R2, and F be as above. R1 and R2 form a lossless decomposition of R if at least one of
the following functional dependencies is in F+:
 R1 ∩ R2 → R1 and R1 ∩ R2 → R2
In other words, if R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition of R is a lossless
decomposition. We can use attribute closure to test efficiently for superkeys. To illustrate this,
consider the schema
instdept(ID, name, salary, deptname, building, budget)
is decomposed into the instructor and department schemas:
instructor(ID, name, deptname, salary)
department(deptname, building, budget)
Consider the intersection of these two schemas, which is deptname. We see that because
deptname → deptname, building, budget, the lossless-decomposition rule is satisfied.

f) Dependency Preservation
Using the theory of functional dependencies, it is easier to characterize dependency
preservation than using the ad-hoc approach. Let F be a set of functional dependencies on a
schema R, and let R1, R2,...,Rn be a decomposition of R. The restriction of F to Ri is the set Fi of all
functional dependencies in F+ that include only attributes of Ri. Since all functional dependencies
in a restriction involve attributes of only one relation schema, it is possible to test such a
dependency for satisfaction by checking only one relation. Note that the definition of restriction
uses all dependencies in F+, not just those in F. For instance, suppose F={A→B, B→C}, and we
have a decomposition into AC and AB. The restriction of F to AC includes A→C, since A→C is
in F+, even though it is not in F.
The set of restrictions F1, F2,...,Fn is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let F' = F1 ∪ F2
∪ ··· ∪ Fn. F ' is a set of functional dependencies on schema R, but, in general, F '  F. However,
even if F '  F, it may be that F '+ = F+. If the latter is true, then every dependency in F is
logically implied by F ', and, if we verify that F ' is satisfied, we have verified that F is satisfied.
We say that a decomposition having the property F '+ = F+ is a dependency-preserving
decomposition.
Figure shows an algorithm for testing dependency preservation. The input is a set D={
R1, R2,...,Rn}of decomposed relation schemas, and a set F of functional dependencies. This

Database Management Systems (DBMS) 145


algorithm is expensive since it requires computation of F+. Instead of applying the
algorithm, we consider two alternatives.
First, note that if each member of F can be tested on one of the relations of the
decomposition, then the decomposition is dependency preserving. This is an easy way to show
dependency preservation; however, it does not always work. There are cases where, even though
the decomposition is dependency preserving, there is a dependency in F that cannot be tested in
any one relation in the decomposition. Thus, this alternative test can be used only as a sufficient
condition that is easy to check; if it fails we cannot conclude that the decomposition is not
dependency preserving; instead we will have to apply the general test.
compute F+;
for each schema Ri in D do
begin
Fi : = the restriction of F+ to Ri;
end
F ' :=∅
for each restriction Fi do
begin
F ' = F ' ∪ Fi
end
compute F '+;
if(F '+ = F+ ) then return (true)
else return (false);
Figure: Testing for dependency preservation
Second, the test applies the following procedure to each  →  in F.
result = 
repeat
for each Ri in the decomposition
t = (result ∩ Ri)+ ∩ Ri
result = result ∪ t
until (result does not change)
The attribute closure here is under the set of functional dependencies F. If result contains all
attributes in , then the functional dependency  →  is preserved. The decomposition is
dependency preserving if and only if the procedure shows that all the dependencies in F are
preserved.

Question 7: Explain with appropriate example, a relation is said to be


in 1 NF, 2 NF and 3 NF.

146 Database Management Systems (DBMS)


9. Normal Form
A normal form is a property of a relation schema indicating the type of redundancy
that the relation schema exhibits.
a) Purpose of Normalization
 Minimize redundancy in data.
 Remove insert, delete and update anamoly during database activities.
 Reduce the need to reorganize data when it is modified or enhanced.
 Normalization reduces a complex user view to a set of small and stable subgroups of
fields/relations. This process helps to design a logical data model known as conceptual
data model.

b) Normalization Forms
1. First normal form (1NF): A relation is said to be in the first normal form if it does not
contain any repeating columns or repeating groups of columns.
2. Second normal form (2NF): A relation is said to be in the second normal form if it is
already in the first normal form and it has no partial dependency.
3. Third normal form (3NF): A relation is said to be in the third normal form if it is already
in second normal form and it has no transitive dependency.
4. Boyce-Codd normal form (BCNF): A relation is said to be in Boyce-Codd normal form if
it is already in the third normal form and every determinant is a candidate key. It is a
stronger version of 3NF.
5. Fourth normal form (4NF): A relation is said to be in the fourth normal form if it is
already in BCNF and it has no multivalued dependency.
6. Fifth normal form (5NF): A relation is said to be in 5NF if it is in the fourth normal form
and every join dependency in the table is implied by the candidate keys.
The different terminologies used in various normal forms are:

i) Partial Dependency
If a relation having more than one key field, a subset of non-key fields may depend on all
the key fields but another subset/particular non-key field may depend on only one of the key
fields (i.e. may not depend on all the key fields). Such dependency is called partial dependency.

ii) Transitive Dependency


In a relation, there may be dependency among non-key fields. Such dependency is called
as transitive dependency.

Database Management Systems (DBMS) 147


iii) Determinant
A determinant is any field (simple field or composite field) on which some other field is
fully functionally dependent.

iv) Multivalued Dependency


Consider three fields X, Y and Z in a relation. If for each value of X, there is well-
defined set of values of Y and a well-defined set of values of Z and the set of values of Y is
independent of the set of values of Z, then multivalued dependency exists.
X   Y/Z

v) Join dependency
A relation which has a join dependency cannot be decomposed by projection into other
relations without any difficulty and undesirable results.

c) First Normal Form


A table is in the first normal form (1NF) if it does not contain any repeating columns or
repeating groups of columns. To understand the application of normalization, consider the
following Order table structure:
Field Key Data Type
Order_number -- integer
Order_date -- date
Customer_number -- integer
Item_number -- integer
These would
Item_name -- character
repeat for as
Quantity -- integer many items as
Unit_price integer we have for a
given order
Bill_amount integer
As we can see, the table contains information about the orders received from various
customers. The table identifies an order based on the Order_number column. Similarly,
Customer_number can identify a customer and Item_number can identify an item uniquely.
For a given order several items repeat. Therefore, many columns associated with an item,
such as the item number, item name, quantity, unit price and the bill amount, also repeat. This
duplication of items violates the principle of first normal form. Therefore, we can conclude that
the Order table does not conform to the first normal form in its present state.
When a table is decomposed into two-dimensional tables with all repeating groups of
data eliminated, the table data is said to be in its first normal form. In order to bring the table into
the first normal form, we have to decompose the table and ensure that the decomposition is
lossless. We move all the repeating columns to another table.

148 Database Management Systems (DBMS)


 Item_number
 Item_name
 Quantity
 Unit_price
 Bill_amount
These columns can repeat many times in the same order. Therefore, we have to move
these columns to another table in order to bring the Order table in the first normal form.
As a result, we now have two tables: (a) the modified Order table, and (b) a new table,
called Order_item, which contains the repeating columns from the original table.
To join two tables, we add another column to the Order_item table, so that an item sold is
linked to a particular order. Therefore, we need to add the Order_number column to the
Order_item table. This will cause a referential integrity relationship based on this column, to be
established between the two tables. Order_number will be the primary key in the Order table and
the foreign key in the Order_item table. Therefore, the modified table structures will be as shown
in Fig. 6, in which the referential integrity relationship is also illustrated.
Field Field
Order_number Order_number
Order_date Item_number
Customer_number Item_name
Order relation Quantity
Unit_price
Bill_amount
Order_item relation
Figure 6: Illustration of first normal form
Now, what should be the primary key of the Order_item table? It cannot be
Order_number because there can be multiple rows for the same Order_number in this table. So,
here the primary key should be a combination of the Order_number and the Item_number. Thus,
the primary key is a composite primary key.

d) Second Normal Form (2NF)


A table is in the second normal form (2NF) if it is in the first normal form and if all non-
key columns in the table depend on the entire primary key or has no partial dependency.
We know that the first condition is already satisfied. Let us study the second condition
for both the tables.
1. Let us start with the Order table. This table now contains three columns, namely
Order_number, Order_date and Customer_number. The primary key of the table is
Order_number. From this column (i.e. Order_number), we can derive the other non-key
columns, namely Order_date and Customer_number. Similarly, these non-key columns

Database Management Systems (DBMS) 149


do not depend on each other at all. Therefore, we can state that this table is in the
2NF.
2. Now, let us study the Order_item table. This table contains the columns Order_number,
Item_number, Item_name. Quantity, Unit_price and Bill_amount. We know that the
primary key here is a composite key, namely Order_number + Item_number. Can we
determine the values of the other (i.e. non-key) columns of the table by using this
composite primary key? Not quite! We can determine Item_name based on the
Item_number alone. We do not need the Order_number for this purpose. Thus, all non-
key columns do not depend on the entire primary key, but some of them depend only on
a part of the primary key. In other words, the principle of the second normal form is
violated. Same is the case with the Unit_price column, this can also be determined with
the Item_number alone.
Therefore, we conclude that our database does not follow the second normal form. We need to
take some action to rectify this situation. We can move columns that do not depend on the entire
primary key to another table.
The columns that do not depend on the entire primary key are Item_name and Unit_price.
They depend only on the Item_number. Therefore, we need to move these two columns to a new
table, say Item. We also need Item_number as one column in this table and it should also act as
the primary key of that table. Therefore, the tables would now look as shown in Fig. 7.
Field Field Field
Order_number Order_number
Order_date Item_number Item_number
Customer_number Quantity Item_name
Order relation Bill_amount Unit_price
Order_item relation Item relation
Figure 7: Illustration of Second normal form
There is a referential integrity relationship between the Order_item and Item tables based
on the column Item_number.

e) Third Normal Form (3NF)


A relation is said to be in the third normal form if it is already in second normal form and
it has no transitive dependency. The third normal form requires a table to be in the second normal
form. Additionally, every non-key column in the table must be independent of all other non-key
columns.
Another kind of relationship exists in relational databases as transitive dependency. It is
an indirect relationship between two columns. Given the following conditions:
1. There is a functional dependency between two columns, A and B such that B is
functionally dependent on A.

150 Database Management Systems (DBMS)


2. There is a functional dependency between two columns. B and C, such that C is
functionally dependent on B.
We say that C transitively depends on A. We can represent this symbolically as:
ABC
This should be read as: C transitively depends on A. Of course, in the larger picture, B
functionally depends on A and C functionally depends on B.
The third normal form states that we should identify all such transitive dependencies and
get rid of them. Let us look for any possible transitive dependencies in our current table structure.
 Orders table: There is no such dependency in the Order table. The Order_date and
Customer_number columns are functionally dependent on the Order_number column.
 Order_item table: In this table, the Item_number and Quantity columns are functionally
dependent on the Order_number column. However, we cannot say the same thing about
the Bill_amount column. After all, Bill amount would be calculated as:
Bill amount = Quantity  Unit price
The quantity is available in the Order_item table, and the unit price is available in the
Item table. Thus, the column Bill_amount does not depend directly on the primary key of
the Order_item table. Therefore, this is a case for transitive dependency. Hence, we need
to get rid of the Bill_amount column.
 Item table: Here, the non-key columns Item_name and Unit_price functionally depend on
the primary key of the table, that is Item_number. Therefore, there is no transitive
dependency in this table.
In summary, we need to get rid of the Bill_amount column from the Order_item table in order to
bring the table into the third normal form. We can always find out the unit price for a given item
from the Item table. We can multiply that by the Quantity column in the Order_item table, to find
out the bill amount.
To bring a table into the third normal form, get rid of any transitive dependencies from
the table or remove those non-key columns that depend on other non-key columns. Thus, our
table structure in the third normal form would look as shown in Fig. 8.
Field Field Field
Order_number Order_number
Order_date Item_number Item_number
Customer_number Quantity Item_name
Order relation Unit_price
Order_item relation Item relation
Figure 8: Illustration of third normal form

f) Boyce-Codd Normal Form (BCNF)


We know that if B is functionally dependent on A, that is, A functionally determines B,

Database Management Systems (DBMS) 151


then the following notation holds good:
AB
In such a case, we call A the determinant.
A relation is said to be in Boyce-Codd normal form if it is already in the third normal
form and every determinant is a candidate key.
Consider a School table consisting of three columns: Student, Subject and Teacher. One
student can study zero or more subjects. For a given student-subject pair, there is always exactly
one teacher. However, there can be many teachers teaching the same subject (to different
students). Finally, one teacher can teach only one subject. Table shows a sample School table.
Student Subject Teacher
Amol English Meena
Amol Hindi Shrikant
Mahesh English Prasad
Naren Science Mona
Naren English Meena
Table: Sample School table
What are the functional dependencies in this table?
1. Given a student and a subject, we can find out the teacher who is leaching that subject.
Therefore, we have:
{Student, Subject}  Teacher
2. Given a student and a teacher, we can find out the subject that is being taught by the
teacher to the student. Therefore, we have:
{Student, Teacher}  Subject
3. Given a teacher, we can find out the subject that the teacher teaches. Therefore, we have:
Teacher  Subject
Now, let us find out the primary key candidates (i.e. candidate keys) for this table.
1. The Student and Subject columns together can constitute a composite candidate key. This
is because by using these two columns in combination we can determine the remaining
column, that is, the Teacher column. Thus, one candidate key is {Student, Subject}.
2. The Student and Teacher columns together can constitute a composite candidate key.
This is because by using these two columns in combination, we can determine the
remaining column, i.e. the Subject. Thus, one candidate key is {Student, Teacher}.
3. Is Teacher a candidate key? It is not, because given a teacher name, we can only
determine the subject, but not the student name.
We summarize our observations as shown in Table.
Functional dependency Candidate key
{Student, Subject}  Teacher {Student, Subject}
{Student, Teacher}  Subject {Student, Teacher}
152 Database Management Systems (DBMS)
Teacher  Subject None
Table: Functional dependencies and candidate keys for the School table
It can be seen that we have three functional dependencies but only two of them lead to candidate
keys. Thus, we have a situation that can be described as:
 There is a functional dependency (i.e. a determinant) without it being a candidate key.
This violates the principle of BCNF. Thus, we conclude that the table is not in BCNF.
Let us now try to convert the data to BCNF. For this purpose, we perform decomposition
operation on the School table to create the following two tables: Student_Subject and
Subject_Teacher. Their definitions are shown in Fig. 9.
Table Columns
Student_Subject Student, Subject
Subject_Teacher Subject, Teacher
Figure 9: Database in BCNF
There is just one functional dependency in the new database. Given a teacher name, we
can identify the subject that she teaches. The other two functional dependencies from the original
School table no longer exist.

g) Fourth Normal Form (4NF)


A relation is said to be in the fourth normal form if it is already in BCNF and it has no
multivalued dependency.
The fourth normal form is related to the concept of a multi-valued dependency (MVD). If
there are two columns, A and B, and if for a given A, there can be multiple values of B, then we
say that an MVD exists between A and B. If there are two or more such unrelated MVD
relationships in a table, then it violates the fourth normal form.
Let us consider a table in which a student name, the subjects she learns and the languages
she knows are stored. One student can learn zero or more subjects and can simultaneously know
zero or more languages. We can see that there are two independent MVD facts about this
relationship:
1. A student can study many subjects.
2. A student can learn many languages.
Table shows some sample data from this Student table.
Student Subject Language
Geeta Mythology English
Geeta Psychology English
Geeta Mythology Hindi
Geeta Psychology Hindi
Shekhar Gardening English
Table: Student table

Database Management Systems (DBMS) 153


We can see that Geeta is studying two subjects and knows two languages. Because
of these two facts about Geeta (namely subjects that she is studying and the languages that she
knows) are independent of each other, we need to have four rows in the table to capture this
information. If Geeta starts studying a third subject (say gardening), we would need to add two
rows in the Student table to depict this fact (one for Language = 'English' and another for
Language = 'Hindi') as shown in Table. This is certainly cumbersome.
The process of bringing this table the fourth normal form is so split the independent
multi-valued components of the primary key into two tables.
The primary key for the Student table is currently a composite key made up of all the
three columns in the table, Student, Subject and Language. In other words, the primary key of the
table is Student + Subject + Language. This primary key contains two independent multi-valued
dependencies. Hence, the table violates conditions of 4NF.
Thus, we need to split these two independent multi-valued dependencies into two
separate tables. Let us call them as Student_Subject and Student_Language tables. The resulting
tables would be as shown in Table.
Student Subject Student Language
Geeta Mythology Geeta English
Geeta Psychology Geeta Hindi
Shekhar Gardening Shekhar English
Student_Subject table Student_Language table
We can see that this decomposition reduces redundancy with respect to both the independent
MVD relationships, that is, subjects and languages. Also, if we need to add a new subject for
Geeta now, we have to add just one row to the Student_Subject table.

h) Fifth Normal Form (5NF)


A relation is said to be in 5NF also called as project-join normal form if it is in the fourth
normal form and every join dependency in the table is implied by the candidate keys.
All normal forms up to 5NF perform normalization with the help of lossless
decomposition. This decomposition usually splits one table into two by using a projection
operation. But, in 5NF, the decomposition splits the original table into at least three tables.
Consider a SPL table shown in Table, it shows suppliers (S), the parts they supply (P) and the
locations to which they supply these parts (L).
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
Table: SPL table
154 Database Management Systems (DBMS)
The table does not contain any independent MVD relationships. The parts (P) are supplied
to specific locations (L). Hence, they are dependent MVD relationships with the suppliers
(S). Therefore, the table is in 4NF. The table does have some amount of redundancy.
For instance, the row S1-P1 occurs twice but for two different locations (L2 and L1).
Therefore, we cannot decompose the table into two tables to remove this redundancy. This would
lead to loss of information and therefore, would be a lossy decomposition. Let us illustrate this
with actual decomposition of the SPL table into SP and PL tables, as shown in Table.
S P P L
S1 P1 P1 L2
S1 P2 P2 L1
S2 P1 P1 L1
SP table PL table
Why did we say that this decomposition is lossy? The reason is simple. We can still determine:
 Which suppliers supply which parts?
 Which suppliers supply to which locations?
However, we cannot determine:
 Which suppliers supply which parts to which locations?
We can prove this by carrying out a natural join on the two tables (SP and PL). This join will be
based on the common column P. The result of the join operation is shown in Table. Let us call it
SPL2.
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S2 P1 L2
S1 P1 L1
Table: SPL2 table after natural join of the SP and PL tables
The fourth row shown in bold (namely S2-P1-L2) is bogus. Compare the data in this table with the
data in the original SPL table. The new table contains this additional row which was not present
in the original SPL table. Therefore, the natural join operation has produced one unwanted bogus
row. This proves that our decomposition is lossy.
If we had decomposed the original SPL table into three tables, namely SP, PL and LS,
then the decomposition would have been lossless. Let us prove this. We have already seen the SP
and PL tables. Let us also view the LS table, as shown in Table.
L S
L2 S1
L1 S1

Database Management Systems (DBMS) 155


L2 S2
Now let us join together the SP, PL and LS tables. This process is shown in Fig. 10.
S P P L L S
S1 P1 P1 L2 L2 S1
S1 P2 P2 L1 L1 S1
S2 P1 P1 L1 L2 S2

Join
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
Figure 10: SPL table after joining SP, PL and LS tables
We are able to obtain the original SPL table with the process and the bogus row too disappears.
This proves that in order to bring a table in 5NF, we need to decompose the original table into at
least three tables.

i) Comparison of BCNF and 3NF


In 3NF, there is advantages, in that we know that it is always possible to obtain a 3NF
design without sacrificing losslessness or dependency preservation as compared to BCNF.
Nevertheless, there are disadvantages to 3NF: We may have to use null values to represent some
of the possible meaningful relationships among data items, and there is the problem of repetition
of information. Our goals of database design with functional dependencies are:
1. BCNF.
2. Losslessness.
3. Dependency preservation.
Since it is not always possible to satisfy all three, we may be forced to choose between BCNF
and dependency preservation with 3NF. SQL does not provide a way of specifying functional
dependencies, except for the special case of declaring superkeys by using the primary key or
unique constraints. It is possible to write assertions that enforce a functional dependency;
currently no database system supports the complex assertions that are required to enforce a
functional dependency, and the assertions would be expensive to test. Thus even if we had a
dependency-preserving decomposition, if we use standard SQL we can test efficiently only those
functional dependencies whose left-hand side is a key.
Testing functional dependencies may involve a join if the decomposition is not
dependency preserving, we could in principle reduce the cost by using materialized views, which
many database systems support, provided the database system supports primary key constraints
156 Database Management Systems (DBMS)
on materialized views. Given a BCNF decomposition that is not dependency preserving, we
consider each dependency in a canonical cover Fc that is not preserved in the
decomposition. For
each such dependency →, we define a materialized view that computes a join of all relations
in the decomposition, and projects the result on . The functional dependency can be tested
easily on the materialized view, using one of the constraints unique () or primary key ().
On the negative side, there is a space and time overhead due to the materialized view, but
on the positive side, the application programmer need not worry about writing code to keep
redundant data consistent on updates; it is the job of the database system to maintain the
materialized view, that is, keep it up to date when the database is updated. Most current database
systems do not support constraints on materialized views.

10. Database Design Process


We assumed that a relation schema R is given and proceeded to normalize it. There are
several ways in which we could have come up with the schema R:
1. R could have been generated when converting an E-R diagram to a set of tables.
2. R could have been a single relation containing all attributes that are of interest. The
normalization process then breaks up R into smaller relations.
3. R could have been the result of some ad hoc design of relations, which we then test to
verify that it satisfies a desired normal form.

a) E-R Model and Normalization


When we carefully define an E-R diagram, identifying all entities correctly, the tables
generated from the E-R diagram should not need further normalization. But, there can be
functional dependencies between attributes of an entity.
For example, suppose an employee entity had attributes dept-no and dept-address, and
there is a functional dependency dept-no  dept-address. We would then need to normalize the
relation generated from employee.
Examples of such dependencies arise out of poor E-R diagram design. In the above
example, if we did the E-R diagram correctly, we would have created a department entity with
attribute dept-address and a relationship between employee and department.
Similarly, a relationship involving more than two entities may not be in a desirable
normal form as most relationships are binary
Functional dependencies can help us detect poor E-R design. If the generated relations
are not in desired normal form, the problem can be fixed in the E-R diagram. That is,
normalization can be done formally as part of data modeling.

Database Management Systems (DBMS) 157


b) Naming of Attributes and Relationships
A desirable feature of a database design is the unique-role assumption, which means that
each attribute name has a unique meaning in the database. This prevents us from using the same
attribute to mean different things in different schemas. For example, we might consider using the
attribute number for phone number in the instructor schema and for room number in the
classroom schema. The join of a relation on schema instructor with one on classroom is
meaningless. While users and application developers can work carefully to ensure use of the right
number in each circumstance, having a different attribute name for phone number and for room
number serves to reduce user errors.
It is a good idea to keep names for incompatible attributes distinct, if attributes of
different relations have the same meaning, it may be a good idea to use the same attribute name.
For this reason we used the same attribute name “name” for both the instructor and the student
entity sets. If this was not the case, then if we wished to generalize these entity sets by creating a
person entity set, we would have to rename the attribute. Thus, even if we did not currently have
a generalization of student and instructor, if we foresee such a possibility it is best to use the
same name in both entity sets (and relations).
The order of attribute names in a schema does not matter, it is convention to list primary-
key attributes first. This makes reading default output (as from select *) easier. We cannot
always create relationship-set names by simple concatenation; for example, a manager or works-
for relationship between employees would not make much sense if it were called
employee_employee! Similarly, if there are multiple relationship sets possible between a pair of
entity sets, the relationship-set names must include extra parts to identify the relationship set.
Different organizations have different conventions for naming entity sets. For example,
we may call an entity set of students student or students. We have chosen to use the singular form
in our database designs. As schemas grow larger, with increasing numbers of relationship sets,
using consistent naming of attributes, relationships, and entities makes life much easier for the
database designer and application programmers.

c) Denormalization for Performance


Database designers choose a schema that has redundant information; that is, it is not
normalized. They use the redundancy to improve performance for specific applications. The
penalty paid for not using a normalized schema is the extra work (in terms of coding time and
execution time) to keep redundant data consistent.
For example, suppose that the name of an account holder has to be displayed along with
the account number and balance, every time the account is accessed. In normalized schema, this
requires a join of account with depositor.
One alternative to computing the join is to store a relation containing all the attributes of
account and depositor. This makes displaying the account information faster. But, the balance
information for an account is repeated for every person who owns the account, and all copies
158 Database Management Systems (DBMS)
must be updated by the application, whenever the account balance is updated. The process
of taking a normalized schema and making it non-normalized is called denormalization, and
designers use it to tune performance of systems to support time-critical operations.
A better alternative, supported by many database systems today, is to use the normalized
schema, and additionally store the join or account and depositor as a materialized view. Like
denormalization, using materialized view does have space and time overheads; but, it has the
advantage that keeping the view up to date is the job of the database system, not the application
programmer.
d) Other Design Issues
There are some aspects of database design that are not addressed by normalization, and
can thus lead to bad database design.
Consider a company database, where we want to store profit of companies in different
years. A relation profits(company-id, year, amount) could be used to store the profits information.
The only functional dependency on this relation is company-id, year  amount, and the relation
is in BCNF.
An alternative design is to use multiple relations, each storing the profits for a different
year. Let us say the years of interest are 2009, 2010, and 2011; we would then have relations of
the form profits-2009, profits-2010, profits-2011, all of which are on the schema (company-id,
profits). The only functional dependency here on each relation would be company-id  earnings,
so these relations are also in BCNF.
In this alternative design, we have to create a new relation every year, and would also
have to write new queries every year, to take each new relation into account. Queries would also
be more complicated as they may have to refer to many relations.
Another way of representing the same data is to have a single relation company-year
(company-id, profits-2009, profits-2010, profits-2011). Here the only functional dependencies are
from company-id to the other attributes, and again the relation is in BCNF.
Above design is also a bad idea as it has problems similar to the previous design, we
would have to modify the relation schema and write new queries, every year. Queries would also
be more complicated, since they may have to refer to many attributes.
Representations such as those in the company-year relation, with one column for each
value of an attribute, are called crosstabs; they are widely used in spreadsheets and reports and in
data analysis tools. While such representations are useful for display to users, they are not
desirable in a database design.
SQL extensions have been proposed to convert data from a normal relational
representation to a crosstab, for display.

Please feel free to relay your Comments, Suggestions


and Queries. If you find errors or inaccuracies in this
book, please
Database Management Systems (DBMS) 159

Call/WhatsApp
7972176178
Solution to Question Asked in University Examinations

Question 1: Consider the following relational database


Loan (loan_number, branch_name, amount)
Borrower (custorner_name, loan_number)
Give SQL DDL definitions for above relations. Identify referential
integrity constraints that should be hold and include them in DDL
definition. (Assume Banking database.)
Solution:
create table loan(
loan-number char(10) primary key,
branch-name char(15) references branch,
amount integer);

create table borrower(


customer-name char(20),
loan-number char(10),
primary key (customer-name, loan-number),
foreign key (customer-name) references customer,
foreign key (loan-number) references loan);

Question 2: Consider the following relational database:


employee (employee-name, street, city)
works (employee-name, company-name, salary)
company(company-name, city)
manages (employee-name, manager-name)
Give an SQL DDL definition of this database. Identify referential
integrity constraints that should hold, and include them in DDL
definition.
160 Database Management Systems (DBMS)
Solution:

creat e t able emplo yee(


employee -nam e char( 20) primary key ,
street char(30),
city char(30) )

creat e t able compan y(


compan y-nam e char(15) primary key ,
city char(30) )

creat e t able wor ks(


employee -nam e char( 20) primary key,
compan y-nam e char(15) references company,
salary int eger(10,2),
foreig n key ( employee -nam e) references em ployee)

creat e t able manages (


employee -nam e char( 20) primary key,
manager -name char(20 ),
foreig n key ( employee -nam e) references em ployee)

Question 3: Explain why 4 NF is a normal form more desirable than is


BCNF. Give an example of a relation schema R and a set of
dependencies such that R is in 4 NF but not in PJNF.
Solution:
The relation schema R = (A, B, C, D, E) and the set of dependencies
A  BC
B  CD
E  AD
constitute a BCNF decomposition, however it is clearly not in 4NF. 4NF is more desirable than
BCNF because it reduces the repetition of information. If we consider a BCNF schema not in
4NF, we observe that decomposition into 4NF does not lose information provided that a lossless
join decomposition is used, yet redundancy is reduced.

Question 4: Use Armstrong's axioms to prove the soundness of the union rule

Database Management Systems (DBMS) 161


Solution:
To prove that:
if A B and A  C then A  BC
Proof:
AB given
AA  AB augmentation rule
A  AB union of identical sets
AC given
AB  CB augmentation rule
A  BC transitivity rule and set union commutativity
Hence Prove

Question 5: Suppose that we decompose the schema R = (A, B, C, D,


E) into (A, B, C) and (A, D, E). Show that this decomposition is a loss-
less-join decomposition if the following set F of functional
dependencies holds.
A  BC CD  E BD EA
Solution:
A decomposition {R1, R2} is a lossless-join decomposition if
R1 ∩ R2 → R1
or R1 ∩ R 2 → R2
Let R1 = (A, B, C)
R2 = (A, D, E)
and R1 ∩ R2 = A.
Since A is a candidate key, Therefore R1 ∩ R2 → R1.

Question 6: Consider the following relational database;-


salaried-worker(name, office, phone, salary)
hourly-worker (name, hourly-rate)
address (name, street, city)
Suppose we wish to require that every name appears in address appear in either salaried-worker
or hourly-worker, but not necessarily in both then:
i) Propose syntax for expressing constraints
ii) Discuss the actions that system must take to enforce a constraint for this form.
Solution:
i) The create table expression for address we include

162 Database Management Systems (DBMS)


foreign key (name) references salaried-worker or hourly-worker
ii) To enforce this constraint, whenever a tuple is inserted into the address relation, a
lookup on the name value must be made on the salaried-worker relation and (if that
lookup failed) on the hourly-worker relation (or vice-versa).

Question 7: Use Armstrong's axioms to prove the soundness of the


decomposition rule.

Solution:
To prove that:
if A BC then A  B and A  C
Proof:
A  BC given
BC  B reflexivity rule
AB transitivity rule
BC  C reflexive rule
AC transitive rule
Hence Prove

Question 8: Write an assertion for the bank database to ensure that the assets value for the
Perryridge branch is equal to the sum of all the amounts lent by the Perryridge branch.
Solution:
The assertion-name is arbitrary. We have chosen the name perry. Note that since the
assertion applies only to the Perryridge branch we must restrict attention to only the Perryridge
tuple of the branch relation rather than writing a constraint on the entire relation.
create assertion perry check
(not exists (select *
from branch
where branch-name = ’Perryridge’ and
assets  (select sum (amount)
from loan
where branch-name = ’Perryridge’)))

Database Management Systems (DBMS) 163

You might also like