III
III
Design
1. Integrity Constraints
Integrity constraints ensure that changes made to the database by authorized users do not
result in a loss of data consistency. Therefore, integrity constraints guard against accidental
damage to the database. Examples of integrity constraints are:
An instructor name cannot be null.
No two instructors can have the same instructor ID.
Every department name in the course relation must have a matching department name in
the department relation.
The budget of a department must be greater than 0.00.
There are two types of integrity constraints for the E-R model:
1. Key declarations: The condition that certain attributes form a candidate key for a given
entity set.
2. Form of a relationship: Many to many, one to many, one to one.
An integrity constraint can be a random condition pertaining to the database. But, random
condition may be costly to test. Thus, we concentrate on integrity constraints that can be tested
with minimal overhead (cost).
Integrity constraints are usually identified as part of the database schema design process,
and declared as part of the create table command used to create relations. However, integrity
constraints can also be added to an existing relation by using the command alter table table-
name add constraint, where constraint can be any constraint on the relation. When such a
command is executed, the system first ensures that the relation satisfies the specified constraint.
If it does, the constraint is added to the relation; if not, the command is rejected.
c) Unique Constraint
SQL also supports an integrity constraint:
unique(A1, A2, ...,Am)
The unique specification says that attributes A1, A2 ,...,Am form a candidate key; that is, no two
tuples in the relation can be equal on all the listed attributes. However, candidate key attributes
are permitted to be null unless they have explicitly been declared to be not null.
g) Question
Assertion 3: Define assertion with example:
An assertion is a condition that we wish the database always to satisfy. Domain
constraints and referential-integrity constraints are special forms of assertions. There are many
constraints that cannot be express by using domain and referential-integrity constraints. For e.g.
1. The sum of all loan amounts for each branch must be less than the sum of all account
balances at the branch.
b) Default Values
SQL allows a default value to be specified for an attribute as shown by the following
create table statement:
create table student(
ID varchar(5),
name varchar(20) not null,
dept_name varchar(20),
tot_cred numeric(3,0) default 0,
primary key(ID));
The default value of the tot_cred attribute is declared to be 0. As a result, when a tuple is inserted
into the student relation, if no value is provided for the tot_cred attribute, its value is set to 0. The
following insert statement shows how an insertion can omit the value for the tot_cred attribute.
insert into student(ID, name, dept_name)
values(’12789’, ’Newman’, ’Comp. Sci.’);
c) Index Creation
d) Large-Object Types
Many current-generation database applications need to store attributes that can be large
(of the order of many kilobytes), such as a photograph, or very large (of the order of many
megabytes or even gigabytes), such as a high-resolution medical image or video clip. SQL
provides large-object data types for character data (clob) and binary data (blob). The letters “lob”
in these data types stand for “Large OBject.” For example, we may declare attributes
book_review clob(10KB)
image blob(10MB)
movie blob(2GB)
For result tuples containing large objects (multiple megabytes to gigabytes), it is inefficient or
impractical to retrieve an entire large object into memory. Instead, an application would usually
use an SQL query to retrieve a “locator” for a large object and then use the locator to manipulate
the object from the host language in which the application itself is written. For instance, the
JDBC application program interface permits a locator to be fetched instead of the entire large
object; the locator can then be used to fetch the large object in small pieces, rather than all at
once.
e) User-Defined Types
SQL supports two forms of user-defined data types. The first form is called distinct
3. Authorization
Administrator may assign a user different form of authorization on parts of the database.
1. Read authorization: Allows reading, but not modification, of data.
2. Insert authorization: Allows insertion of new data, but not modification of existing data.
3. Update authorization: Allows modification, but not deletion, of data.
4. Delete authorization: Allows deletion of data.
Each of these types of authorizations is called a privilege. We may authorize the user all,
none, or a combination of these types of privileges on specified parts of a database, such as a
relation or a view.
When a user submits a query or an update, the SQL implementation first checks if the
query or update is authorized, based on the authorizations that the user has been granted. If the
query or update is not authorized, it is rejected. In addition to authorizations on data, users may
also be granted authorizations on the database schema, allowing them, for example, to create,
modify, or drop relations. A user who has some form of authorization may be allowed to pass on
(grant) this authorization to other users, or to withdraw (revoke) an authorization that was granted
earlier.
i) Update Privilege
The update authorization may be given either on all attributes of the relation or on only
some. If update authorization is included in a grant statement, the list of attributes on which
update authorization is to be granted optionally appears in parentheses immediately after the
update keyword. If the list of attributes is omitted, the update privilege will be granted on all
attributes of the relation.
Example:
Following grant statement gives users P, Q, and R update authorization on the name
attribute of the student relation:
grant update (name) on student to P, Q, R;
b) Revoke Privileges
To revoke an authorization, we use the revoke statement.
Syntax:
revoke <privilege list> on <relation name or view name> from <user/role list>
[restrict | cascade];
Thus, to revoke the privileges that we granted previously, we write
revoke select on student from P, Q, R;
revoke update (name) on student from P, Q, R;
revoke insert on student from P, Q, R;
revoke references (branch-name) on branch from P;
The revocation of a privilege from a user/role may cause other users/roles also to lose that
privilege. This behavior is called cascading of the revoke. The revoke statement may alternatively
specify restrict:
revoke select on student from P, Q, R restrict
In this case, the system returns an error if there are any cascading revokes, and does not carry out
the revoke action. The following revoke statement revokes only the grant option, rather than the
actual select privilege:
revoke grant option for select on student from P
DBA Q T
P Q R P Q R
(a) (b)
DBA DBA
P Q R P Q R
(c) (d)
Figure 2: Attempt to beat authorization revocation
We require that all edges in an authorization graph be part of some path originating with
the database administrator. The edges between Q and R are deleted, and the resulting
authorization graph is shown in Figure 2(d).
e) Role
Consider a college where there are many students. Each student must have the same
types of authorizations to the same set of relations. Whenever a new student is joined, he/she will
be given all these authorizations individually.
A better scheme would be to specify the authorizations that every student is to be given,
and to separately identify which database users are student. The system can use these two pieces
of information to determine the authorizations of each person who is a student.
When a new person is joined as a student, a user identifier must be allocated to him, and
he must be identified as a student. Individual permissions given to students need not be specified
again.
A set of roles is created in the database. Authorizations can be granted to roles, in exactly
the same way as they are granted to individual users. Each database user is granted a set of roles
that he or she is authorized to perform.
Another alternative would be to create a student userid, and permit each student to
connect to the database using the student userid. The problem with this scheme is that it would
4. Trigger
A trigger is a procedure that is automatically executed in response to specified changes
made to the database like insert, update, delete etc., and is specified by the database
administrator. Triggers do not accept any arguments. The main aim of trigger is to maintain data
integrity and also one can design trigger for recording information.
A database that has a set of associated triggers is called an active database. A trigger
description contains three parts:
1. Event: A change to the database that activates the trigger.
2. Condition: A query or test that is run when the trigger is activated.
3. Action: A procedure that is executed when the trigger is activated and its condition is
true.
The model of triggers is referred to as the event-condition-action model for triggers. A
trigger can be thought of as a `tool’ that monitors a database, and is executed when the database
is modified in a way that matches the event specification.
A condition in a trigger can be a true/false statement or a query. If the condition part
evaluates to true, the action associated with the trigger is executed.
A trigger action can examine the answers to the query in the condition part of the trigger,
refer to old and new values of tuples modified by the statement activating the trigger,
execute new queries, and make changes to the database.
An action can even execute a series of data-definition commands (e.g., create new tables,
Even if we decided to live with the redundancy problem, there is still another problem
with the instdept schema. Suppose we are creating a new department in the university. In the
18 Database Management Systems (DBMS)
alternative design above, we cannot represent directly the information concerning a
department (deptname, building, budget) unless that department has at least one instructor at
the university. This is because tuples in the instdept table require values for ID, name, and salary.
This means that we cannot record information about the newly created department until the first
instructor is hired for the new department.
In the old design, the schema department can handle this, but under the revised design,
we would have to create a tuple with a null value for building and budget. In some cases null
values are difficult.
natural join
ID name street city salary
.. ... ... ... ...
57766 Kim Main Perryridge 75000
98776 Kim North Hampton 67000
... ... ... ... ...
Figure 5: Loss of information via a bad decomposition
Figure 5 shows these tuples, the resulting tuples using the schemas resulting from the
decomposition, and the result if we attempted to regenerate the original tuples using a natural
join. As we see in the figure, the two original tuples appear in the result along with two new
tuples that incorrectly mix data values pertaining to the two employees named Kim. Although we
have more tuples, we actually have less information in the following sense. We can indicate that a
certain street, city, and salary pertain to someone named Kim, but we are unable to distinguish
which of the Kims. Thus, our decomposition is unable to represent certain important facts about
20 Database Management Systems (DBMS)
the university employees. Clearly, we would like to avoid such decompositions. We shall
refer to such decompositions as being lossy decompositions and to those that are not as
lossless decompositions.
8. Functional-Dependency Theory
It is useful to be able to reason systematically about functional dependencies as part of a
process of testing schemas for BCNF or 3NF.
b) Multivalued Dependency
Functional dependencies rule out certain tuples from being in a relation. If A B, then
we cannot have two tuples with the same A value but different B values. Multivalued
dependencies do not rule out the existence of certain tuples. Instead, they require that other tuples
of a certain form be present in the relation.
Thus, FDs sometimes are referred to as equality generating dependencies, and
multivalued dependencies are referred to as tuple generating dependencies. Let R be a relation
schema and let α R and β R. The multivalued dependency
αβ
holds on R.
Example: Consider the customer relation, we see that we want the multivalued dependency
customer-name customer-street, customer-city
to hold. Multivalued dependencies can be used in two ways:
1. To test relations to determine whether they are legal under a given set of functional and
multivalued dependencies.
2. To specify constraints on the set of legal relations; we shall thus concern ourselves with
only those relations that satisfy a given set of functional and multivalued dependencies.
If a relation r fails to satisfy a given multivalued dependency, we can construct a relation
r’ that does satisfy the multivalued dependency by adding tuples to r.
Let D denote a set of functional and multivalued dependencies. The closure D+ of D is
the set of all functional and multivalued dependencies logically implied by D.
We can compute D+ from D, using the definitions of functional dependencies and
multivalued dependencies and derive the following rule:
If α β, then α β.
Every functional dependency is also a multivalued dependency.
c) Armstrong's Axioms
Axioms or rules of inference provide a technique for reasoning about functional
dependencies. In the rules that follow, we use letters (A, B, C, . . . ) for sets of attributes, and
uppercase Roman letters from the beginning of the alphabet for individual attributes. We use AB
to denote A B.
Proof:
Given A B, C, then
B, C B and B, C C (by reflexivity)
A B and A C (by transitivity)
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 β1 and α1 β2 with α1 β1 β2.
Find a functional dependency α β in Fc with an extraneous attribute either in α or
in β.
If an extraneous attribute is found, delete it from α β.
until Fc does not change.
The canonical cover of F, Fc, can be shown to have the same closure as F; hence, testing
whether Fc is satisfied is equivalent to testing whether F is satisfied. But, Fc is minimal in a
certain sense it does not contain extraneous attributes, and it Fc = F combines functional
dependencies with the same left side. It is cheaper to test Fc than it is to test F itself.
e) Lossless Decomposition
Let r(R) be a relation schema, and let F be a set of functional dependencies on r(R). Let
R1 and R2 form a decomposition of R. We say that the decomposition is a lossless decomposition
if there is no loss of information by replacing r(R) with two relation schemas r1(R1) and r2(R2).
In short, we say the decomposition is lossless if, for all legal database instances (that is,
database instances that satisfy the specified functional dependencies and other constraints),
relation r contains the same set of tuples as the result of the following SQL query:
select *
from (select R1 from r)
natural join
(select R2 from r)
This is stated in the relational algebra as:
R1 (r) R2 (r) = r
In other words, if we project r onto R1 and R2 and compute the natural join of the projection
results, we get back exactly r. A decomposition that is not a lossless decomposition is called a
lossy decomposition. The terms lossless-join decomposition and lossy-join decomposition are
sometimes used in place of lossless decomposition and lossy decomposition.
As an example of a lossy decomposition, consider the decomposition of the employee
schema into:
employee1(ID, name)
f) Dependency Preservation
Using the theory of functional dependencies, it is easier to characterize dependency
preservation than using the ad-hoc approach. Let F be a set of functional dependencies on a
schema R, and let R1, R2,...,Rn be a decomposition of R. The restriction of F to Ri is the set Fi of all
functional dependencies in F+ that include only attributes of Ri. Since all functional dependencies
in a restriction involve attributes of only one relation schema, it is possible to test such a
dependency for satisfaction by checking only one relation. Note that the definition of restriction
uses all dependencies in F+, not just those in F. For instance, suppose F={A→B, B→C}, and we
have a decomposition into AC and AB. The restriction of F to AC includes A→C, since A→C is
in F+, even though it is not in F.
The set of restrictions F1, F2,...,Fn is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let F' = F1 ∪ F2
∪ ··· ∪ Fn. F ' is a set of functional dependencies on schema R, but, in general, F ' F. However,
even if F ' F, it may be that F '+ = F+. If the latter is true, then every dependency in F is
logically implied by F ', and, if we verify that F ' is satisfied, we have verified that F is satisfied.
We say that a decomposition having the property F '+ = F+ is a dependency-preserving
decomposition.
Figure shows an algorithm for testing dependency preservation. The input is a set D={
R1, R2,...,Rn}of decomposed relation schemas, and a set F of functional dependencies. This
b) Normalization Forms
1. First normal form (1NF): A relation is said to be in the first normal form if it does not
contain any repeating columns or repeating groups of columns.
2. Second normal form (2NF): A relation is said to be in the second normal form if it is
already in the first normal form and it has no partial dependency.
3. Third normal form (3NF): A relation is said to be in the third normal form if it is already
in second normal form and it has no transitive dependency.
4. Boyce-Codd normal form (BCNF): A relation is said to be in Boyce-Codd normal form if
it is already in the third normal form and every determinant is a candidate key. It is a
stronger version of 3NF.
5. Fourth normal form (4NF): A relation is said to be in the fourth normal form if it is
already in BCNF and it has no multivalued dependency.
6. Fifth normal form (5NF): A relation is said to be in 5NF if it is in the fourth normal form
and every join dependency in the table is implied by the candidate keys.
The different terminologies used in various normal forms are:
i) Partial Dependency
If a relation having more than one key field, a subset of non-key fields may depend on all
the key fields but another subset/particular non-key field may depend on only one of the key
fields (i.e. may not depend on all the key fields). Such dependency is called partial dependency.
v) Join dependency
A relation which has a join dependency cannot be decomposed by projection into other
relations without any difficulty and undesirable results.
Join
S P L
S1 P1 L2
S1 P2 L1
S2 P1 L1
S1 P1 L1
Figure 10: SPL table after joining SP, PL and LS tables
We are able to obtain the original SPL table with the process and the bogus row too disappears.
This proves that in order to bring a table in 5NF, we need to decompose the original table into at
least three tables.
Call/WhatsApp
7972176178
Solution to Question Asked in University Examinations
Question 4: Use Armstrong's axioms to prove the soundness of the union rule
Solution:
To prove that:
if A BC then A B and A C
Proof:
A BC given
BC B reflexivity rule
AB transitivity rule
BC C reflexive rule
AC transitive rule
Hence Prove
Question 8: Write an assertion for the bank database to ensure that the assets value for the
Perryridge branch is equal to the sum of all the amounts lent by the Perryridge branch.
Solution:
The assertion-name is arbitrary. We have chosen the name perry. Note that since the
assertion applies only to the Perryridge branch we must restrict attention to only the Perryridge
tuple of the branch relation rather than writing a constraint on the entire relation.
create assertion perry check
(not exists (select *
from branch
where branch-name = ’Perryridge’ and
assets (select sum (amount)
from loan
where branch-name = ’Perryridge’)))