Unit 2 Functions of Database Management Systems
Unit 2 Functions of Database Management Systems
1
UNIT II | Functions of a Database Management System
Learning Objectives:
At the end of this module, you will be able to:
Specify how function might be implemented for a particular sample DBMS. (LO2 →
CO2)
2
UNIT II | Functions of a Database Management System
In this module, we discuss building these basic database functions at the physical level, and
how other functions, like cataloguing and indexing, can be added to an existing DBMS.
For example, a programmer told to store a database entity representing a bank customer
with, attributes name, phone no. and address, might use a record with fields name, phone
number, and address, each of the appropriate data type.
A list of such customer might be stored in an array of records, such that the database would
look something like this.
Here, records are stored one after the other in memory, and new records are added to the
end of the array. One disadvantage of this approach is the fixed size of an array, which
imposes a limit on the number of records one can store in the database. This can be solved
by using a linked list, instead of an array, with the resulting increase in program complexity.
Unfortunately, we cannot rely on mere records stored in memory for any database meant for
actual use, due the following limitations.
Current memory technology is volatile. This means one you turn off the power to
computer, all the data in memory is lost.
Memory size cannot hold a database large enough for most applications. Databases
comprising more than a few thousand records are already tool are already too large
to fit in most computer systems’ memory spaces.
For this reason, we have to store records on some non-volatile media, like magnetic disk or
tape. To prevent loss of large amounts of data, typically only one or two records are kept in
memory while editing or adding new records, and the rest are kept on disk. Updated records
are written to disk as soon as possible.
3
UNIT II | Functions of a Database Management System
Updating records on the on-disk database typically consists of three functions; add new
records, modifying existing records, and deleting or purging records. We’ll take a look at
each in turn.
Adding Records
A database programmed in a non-DBMS specific programming language would require a
function that;
Accepts data to be stored in the new record from the user. This is accomplished
typically by providing one or more “template” screens containing the fields of the
record, and having the user fill these fields in.
The program should check for correct input-i.e., numerical fields should have numbers, data
fields should make sense,etc. One way of checking for correct input is to have the program
accept all keyboard input as strings, and convert from string to the appropriate data type.
An error in the conversion routine will mean that the user has typed an invalid value.
Checks for dependencies in other, related databases. Some fields may be shared
between different relational tables, for instance. The function must find these other
tables and modify the appropriate tables.
Updates all tables affected by the new record, typically saving to disk.
Modifying Records
Modifying an existing record first requires that the user identify the particular record to be
updated. The user typically provides a primary key to search on. The record is loaded, and a
template screen similar to the one used for adding records is shown, allowing the user to
change a particular field. An example of this is the dBase EDIT command.
An alternate method can be provided by showing a list of records to the user, and allowing
him/her to pick the record to be edited as in the dBase BROWSE command. A table of
records is shown, and the user can choose one record by selecting it with a cursor.
As with adding new records, care must be taken that relational or functional dependencies
are also updated by the modify routine.
Deleting Records
Here, we have to identify the particular record or records to delete and search for them,
either by using a primary key, or allowing the user to pick records off a list. The records to be
deleted are typically only "marked" as deleted until a "purge" command (the dBase PACK
command example) is issued. This allows a user to "undelete" a record if o marked to be
deleted by mistake.
4
UNIT II | Functions of a Database Management System
The deleted records may also have functional or relational dependencies in other tables.
Two methods can be used in such an eventuality:
The affected records may also be deleted, if they are dependent on the original
record (weak entities). For example, deleting the record for a bank customer may
also delete the records for the accounts that customer holds, since an account must
have an owner.
The affected records may have "null" or "nonexistent" values entered in the affected
fields. Using the same example, the account might have "nonexistent" entered as its
owner, in which case there must also be a function that finds all such "floating"
accounts and either assign them a new owner, or delete them later.
The method used for a particular database depends on the nature of the relationship or
functional dependency.
For small databases, a linear search (check records from the start of the file until one is
found) is usually simple enough to implement and does not take too long to execute. The file
may also be sorted or indexed to facilitate searching.
5
UNIT II | Functions of a Database Management System
In the top diagram, the index file contains the record location of the records in the file. The
data file is unsorted. In the second, the index file has been sorted by alphabetical order of
the last name field, but the database file is still unsorted. We can therefore produce a
printout of the records of the database in sorted order simply by following the order in the
index file, without having to swap around the actual records in the file.
In actual use, the index file would typically also contain the primary key field, which can also
be used for quicker searching (since the entire index file is in memory, and does not have to
be loaded record by record from disk).
We can also maintain multiple indexes, one for each key. For example, in one index, we can
sort the database by last name, in another, by address or by account number. We can
therefore access the database as if it were sorted in several different ways at once.
6
UNIT II | Functions of a Database Management System
Learning Objectives:
At the end of this module, you will be able to:
7
UNIT II | Functions of a Database Management System
Multi-user operating systems, like UNIX and Windows NT, have had to develop features to
insure data coherency on their file systems. This is usually accomplished by file locking,
where a file on disk can be "locked" or "reserved" for writing by an application. During the
period of time that the file is "locked", no other application is allowed to write data to the file,
to insure that two users aren't overwriting each other's updates at the same time.
Multi-user database management systems, on the other hand, often require that multiple
users be allowed access to the same files at the same time; in order to read, edit and modify
records. A simple solution would be to leave data coherency up to the operating system, and
merely disallow access if a file is locked. A user (let's call him User A) trying to edit records
at the same time as User B would find that he has to wait for User B to finish editing, before
he is allowed to open the file.
This is a suboptimal solution (but one which is in use in many smaller database systems), as
a user could conceivably have to wait a long time before getting access to a particular
record, one which may not have been modified at all by the other user. Also, this is not
workable for databases with more than a few simultaneous users, as our hapless User A
may have to wait a long time before all the other users are finished with that particular file.
Transaction Processing
The concept of transaction processing involves separating individual database operations
into transactions, where each transaction consists of a single update to the database
system. For example, the operation:
8
UNIT II | Functions of a Database Management System
can be considered one transaction. Each transaction can consist of several database
functions (reading or writing data into the database, or modifying data previously read from
the database). Transaction T0 may be def consisting of the following functions:
Now assume we have a different transaction T1 (possible being entered at the same time by
some other clerk) such that:
9
UNIT II | Functions of a Database Management System
The end result would be the same if T1 was executed before T0.
Unfortunately, the same is not true in the rare case where the transactions are run almost
simultaneously. Of course, in a single-processor system, the computer can only execute one
command at a time, but even if we assume that the computer can only do one database
function at any given time, we might end up with something like this:
(For this example, let us assume A contains P5000 and B contains P4000 at the start.)
Here, in T0, P50 is deducted from memory variable balA. But before the updated quantity
can be written back to disk, T1 has already updated account A (to 5100). Unfortunately,
since T0 does not know that T1 has changed the disk data, it overwrites the disk data with its
own result, (marked by the *) losing whatever change T1 made to account A. Instead of A =
5050 and B = 3950 (as what should have been if T0 and T1 were not run simultaneously), we
have the erroneous value A = 4950. P100 is now missing from account A.
You can imagine how many errors would now appear in a system that handles dozens or
even hundreds of simultaneous transactions.
Critical Sections
The gray area between "read(A)" and "write(A)" in the table is what we call a critical section.
This is a portion of a transaction which we should not interrupt - in the table, if we had
10
UNIT II | Functions of a Database Management System
allowed T0 to write the new value to disk before allowing T 1 to continue, we would have
gotten the correct answer.
Therefore, we need to identify and preserve critical sections in our code. We need to make
sure that inside this critical section, only that transaction is allowed to manipulate that
particular data item. We can use file/ record locking in order to implement critical sections
properly.
For our example, we can define a function lock(A) such that when the function is called, the
file containing (A) is locked from access by any transaction other than the one that called the
lock () function. If a transaction attempts to lock () some file already locked by another
transaction, it will have to wait for that transaction to unlock () the file. The unlock(A)
function, on the other hand, frees up the file for use by other transactions, and allows any
waiting lock () by other transactions to proceed.
Our new transactions, using the lock () and unlock () functions, should now look like:
The asterisks (*) mark the point where T1 attempts to get a lock on A, but it is already locked
by T0. Therefore, T1 has to pause until T0 unlocks A, at the second asterisk. This prevents T1
from changing the value of A within Tots critical section (the gray area).
Summary
Multi-user multiprocessing database systems are systems that allow more than one user to
access the database at any given time. While this makes a database system more useful, it
also introduces problems with regards to ensuring data integrity and coherence.
11
UNIT II | Functions of a Database Management System
Simultaneous updates, in particular, have a chance of corrupting the data if executed in the
incorrect order.
We can identify critical sections in the code, portions where another transaction must not be
allowed to update a particular on-disk record or data item. We can use locking in order to
ensure that no other simultaneous transaction can edit a particular variable, where the other
transaction is paused or stopped if it attempts such an action (by trying to initiate its own
lock).
12
UNIT II | Functions of a Database Management System
Learning Objectives:
At the end of this module, you will be able to:
Analyze database integrity and how to manage it using a hypothetical language; and
(LO1 → CO2)
13
UNIT II | Functions of a Database Management System
Bound up with a multi-user (concurrent) database are the problems of recovery and the
notion of transaction processing. This module will explain to you what a transaction is and
what the term transaction processing (or transaction management) means, and introduce the
functions COMMIT and ROLLBACK. Then the problems of recovery and concurrency that
the transaction concept is intended to solve will be discussed. The examples and
discussions will be based specifically on an SQL system. However, the ideas that will be
presented are very general and will apply to numerous other systems, relational or
otherwise, with comparatively little change.
Database Integrity
As mentioned in the introduction, it will be a good idea to specify the constraints in database
integrity in some more declarative fashion and let the system do the checking instead of the
user writing long procedural codes. Almost all database designers would agree that the
specification for integrity constraints could account for as much as 90% of a typical database
definition. In this scenario, a system that will support these specifications would ease out for
application programmers the burden of writing codes for integrity checking. At the same
time, it will enable those programmers to become significantly more productive as they can
channel their efforts to other programming and data management tasks.
(for all STUDENTS, the GRADE must be positive). If the user attempts to execute an
operation that would violate the constraint, the system must then either reject the operation
or possibly (in more complicated situations) perform some compensating action on some
other part of the database to ensure that the overall result is still a correct state. Thus, any
language for specifying integrity constraints should include, not only the ability to specify
arbitrary conditions, but also facilities for specifying such compensating actions when
appropriate.
14
UNIT II | Functions of a Database Management System
Example 1. Attribute values (in this example, the attribute STATUS) must be positive:
The example above illustrates that integrity rules must include all four components (name,
checking time, constraint, and violation response). However, some obvious simplifications to
these rules will be applied in the su bsequent examples. These simplifications are:
The checking time(s) will usually be obvious. Thus, it is better to assume that the
system is capable of determining the checking time(s) for itself if there is no explicit
specification of any such time(s).
The constraint in the CHECK clause will almost always begin with a universal
quantifier. Thus, a variable without a quantifier is automatically quantified by the
FORALL quantifier.
Assume also that if the ELSE clause is omitted, the default violation response
REJECT the update operation (with a suitable return code) is implied.
Applying the above simplifications, the example above can be reduced to just
15
UNIT II | Functions of a Database Management System
When the CREATE INTEGRITY RULE statement is executed, the system first checks to see
whether the current state of the database satisfies the specified constraint. If it does not, the
new rule is rejected; otherwise it is accepted and enforced from that time on. Enforcement in
the example requires the system to monitor all operations that would insert a value into, or
update a value in, column AGE of table STUDENT.
Example 2. A constraint that is complex. In this example, assume that the table STUDENT
includes an additional set of attributes MONTH, DAY, and YEAR, each of type character of
widths 2, 2 and 4, respectively, representing a date (also assuming that the system does not
support a data type for dates):
In the example above (example 2), the existence of two built-in functions: IS_INTEGER that
tests a character string to see if it represents a legal decimal integer value; and NUM that
converts a character string that represents a decimal value to internal numeric form; are
assumed.
Rule R3 applies to the transition between two states rather than to database states. Note
that this rule requires an explicit checking time (BEFORE clause) to be specified to indicate
to the system when the checking is to be done. In this example, if a student is to be given a
new ID, the new ID must be greater than the old.
16
UNIT II | Functions of a Database Management System
Rule R4A says that it is illegal to insert an ST record or change the ID value in an ST record
if no corresponding STUDENT record exists after the operation. Rule R4B says that it is
illegal to delete a STUDENT record or change the ID value of a STUDENT record if any
corresponding ST record currently exists.
Crash Recovery
Crash recovery is synonymous to transaction recovery. Let us begin our discussions by
introducing the fundamental notion of a transaction. A transaction is a logical unit of work.
Consider the following example. Suppose that table TEACHER, includes the fields EMP_ID
and TOT_STUDENTS while the table ST includes the fields ID, and EMP_ID. Here the
TOT_STUDENTS represents the total number of students a certain teacher handles; in other
words, the value of TOT_STUDENTS for any given teacher is equal to the count of all ST.ID
values, taken over all ST records for that teacher. Now consider the following sequence of
operations, the intent of which is to add a student S5 to teacher TI to the database:
ON SQLERROR GO TO UNDO;
INSERT INTOST (ID, EMP_ID)
VALUES ('S5','T1');
UPDATE TEACHER
SET TOT_STUDENTS = TOT_STUDENTS + 1
WHERE EMP_ID = 'Tl';
COMMIT;
GO TO FINISH;
UNDO:
ROLLBACK;
FINISH: RETURN;
17
UNIT II | Functions of a Database Management System
The INSERT adds the new student (with student ID equal to 'S5') to the ST table, the
UPDATE updates the TOT_STUDENTS field for teacher Tl appropriately.
The point of this example is that what is presumably intended to be a single atomic
operation-"Add a new student"-in fact involves two updates to the database. What is more,
the database is not even consistent between those two updates; it temporarily violates the
requirement that the value of TOT_STUDENT for teacher Tl is supposed to be equal to the
count of all ST.ID values for teacher T1. Thus, a logical unit of work (i.e., a transaction) is not
necessarily just a single database operation; rather, it is a sequence of several such
operations, in general, that transforms a consistent state of the database into another
consistent state, without really being consistent at all intermediate points.
Now it is clear that what must not be allowed to happen in the example is for one of the two
updates to be executed and the other not (because that would leave the database in an
inconsistent state). What is needed ideally, of course, is a guarantee that both updates will
be executed. Unfortunately, it is impossible to provide any such guarantee—there is always
a chance that things will go wrong. For example, a system crash might occur between the
two updates, or an arithmetic overflow might occur on the second of them. But a system that
supports transactions processing provides the guarantee.
Specifically, it guarantees that if the transaction executes some then a failure occurs (for
whatever reason) before the transaction reaches its normal termination, then those updates
will be undone. Thus, either executes in its entirety or is totally canceled (as if it never
executed at all). In this way a sequence of operations that is fundamentally not atomic can
be made to look as if it really were atomic from afar.
he system component that provides this atomicity (or semblance of atoicity) is known as the
transaction manager, and the COMMIT and ROLLACK operations are the key to the way it
works:
In the example, therefore, a COMMIT statement is issued if the two updates were run
successfully, which will commit the changes in the database and make them permanent. If
anything goes wrong, however i.e., if either update statement raises the SQLERROR
condition then a ROLLBACK statement is issued instead, to undo any changes made so far.
18
UNIT II | Functions of a Database Management System
Note 1: This module has shown the COMMIT and ROLLBACK operation explicitly, for the
sake of the example. However, some systems will automatically issue a COMMIT for any
program that reaches normal termination, and will automatically issue a ROLLBACK for any
program that does not (regardless of the reason; in particular, if a program terminates
abnormally because of a system failure, a ROLLBACK will be issued on its behalf when the
system is restarted). In the example, therefore, the explicit COMMIT could have been
omitted, but not the explicit ROLLBACK.
Note 2: A realistic application program should not only update the database (or attempt to)
but should also send some kind of message back to the end user indicating what has
happened. In the example, the message "Student added" could have been sent if the
COMMIT is reached, or the message "Error - student not added" otherwise. Message-
handling, in turn, has additional implications for recovery.
At this time, you might be wondering how it is possible to undo an update. The answer is that
the system maintains a log or journal on disk, on which details of all update operations are
recorded. Thus, if it becomes necessary to undo some particular update, the system can use
the corresponding log entry to restore the updated object to its previous value.
One further point: In a relational system, data manipulation statements are set-level and
typically operate on multiple records at a time. What then if something goes wrong in the
middle of such a statement? For example, it is possible that a multiple-record UPDATE could
update some of its target records and then fail before updating the rest? The answer is no, it
is not. SQL statements are required to be individually atomic in so far as their effect on the
database is concerned. If an error occurs in the middle of such a statement, then the
database will remain totally unchanged.
Synchronization points
Executing either a COMMIT or a ROLLBACK operation establishes what is called a
synchronization point t. A synchronization point represents the boundary between two
consecutive transactions; it thus corresponds to the end of a logical unit of work, and hence
to a point at which the database is in a state of consistency. The only operations that
establish a synchronization point are COMMIT, ROLLBACK, and program initiation.
(Remember, however, that COMMIT and ROLLBACK may often be implicit.) When a
synchronization point is established:
All updates made by the program since the previous synchronization point are
committed (COMMIT) or undone (ROLLBACK).
All open cursors are closed and all database positioning is lost (this is true in most
systems, but not to all).
All record locks are released.
Note carefully that COMMIT and ROLLBACK terminate the transaction, not the program. In
general, a single program execution will consist of a sequence of several transactions,
running one after another, with each COMMIT or ROLLBACK operation terminating one
transaction and starting the next. However, it is true that very often one program execution
19
UNIT II | Functions of a Database Management System
will correspond to just one transaction; and if it does, then it will frequently be possible to
code that program without any explicit COMMIT or ROLLBACK statements at all.
In conclusion, you must now be able to see that transactions are not only the unit of work but
also the unit of recovery. For if a transaction successfully COMMITs, then the system must
guarantee that its updates will be permanently established in the database, even if the
system crashes the very next moment. It is quite possible, for instance, that the system
might crash after the COMMIT has been honored but before the updates have been
physically written to the database (they could still be waiting in a main storage buffer and so
be lost at the time of the crash for example). Even if that happens, the system's restart
procedure will still install those updates in the database; it is able to discover the values to
be written by examining the relevant entries in the log. (It follows then that the log must be
physically written before COMMIT processing can complete. This important rule is known as
the Write-Ahead Log Protocol.) Thus, the restart procedure will recover any units of work
(transactions) that completed successfully but did not manage to get their updates physically
written prior to the crash; hence, as stated earlier, transactions are indeed the unit of
recovery.
System failures (e.g., power failure) affect all transactions currently in progress but
do not physically damage the database. A system failure is also known as a soft
crash.
Media failures (e.g., head crash on the disk), which do cause damage to the
database, or to some portion of it, and affect at least those transactions currently
using that portion. A media failure is sometimes called a hard crash.
System failure
The critical point about system failure is that the contents of main storage are lost (strictly
speaking, it is the contents of database's buffer that are lost). At the time of the failure, the
precise state of any transaction that w as in progress is no longer known. When the system
restarts, this transaction can never be completed and so must be undone (or rolled back).
Furthermore, it is necessary at restart time to redo certain transactions that were
successfully completed prior to the crash but did not manage to get their updates transferred
from the database buffers to the physical database.
20
UNIT II | Functions of a Database Management System
Now, how does the system know at restart time which transactions to undo and which to
redo? The answer is as follows. At certain prescribed intervals, the system automatically
takes a checkpoint. Taking a checkpoint involves (a) physically writing the contents of the
database buffers to the physical database, and (b) physically writing a special checkpoint
record to the physical log. The checkpoint record lists all transactions that were in progress
The most recent checkpoint prior to time tfail was taken at time tcheck Transaction T1
was completed prior to time tcheck
Transaction T2 started prior to time tcheck and completed after time tcheck and before
time tfail
Transaction T3 also started prior to time tcheck but did not complete by time tcheck
Transaction T4 started after time and completed before time tfail Finally, transaction
T5 also started after time tcheck but did not complete by time tfail
When the system is restarted, transactions T3 and T5 must be undone, and transactions of
types T2 and T4 must be redone. Note, however, that transactions of type T1 do not enter into
the restart process at all, because their updates were physically written to the database at
time tcheck as part of the checkpoint process.
At restart time, therefore, the system goes through the following procedure in order to
identify all transactions T2 through T3
1. Start with two lists of transactions, the UNDO list and the REDO list. Set the UNDO
list equal to the list of all transactions given in the checkpoint record; set the REDO
list to empty.
2. Search forward through the log, starting from the checkpoint record.
3. If a "begin transaction" log entry is found for transaction T, add T to the UNDO list.
4. If a "commit" log entry is found for transaction T, move T from the UNDO list to the
REDO list.
5. When the end of the log is reached, the UNDO list identifies transactions T 3 and T5
while the REDO list identifies transactions T2 and T4.
6. The system now works backward through the log, undoing the transactions in the
UNDO list. Then it works forward again, redoing the transactions in the REDO list.
Finally, when all such recovery activity is complete, then (and only then) the system
is ready to accept new work.
21
UNIT II | Functions of a Database Management System
Media failure
A media failure is a failure (such as a disk head crash or a disk controller failure) in which
some portion of the database has been physically destroyed. Recovery from such a
failure involves reloading (or restoring) the database from a backup copy (or dump), and
then using the log, both active and archive portions, to redo all transactions that
completed since that backup copy was taken. There is no need to undo transactions that
were still in progress at the time of the failure, since by definition all updates of such
transactions have been "undone" (destroyed) anyway.
Note: The recovery implies the need for a dump/restore (or unload/reload) utility. The
dump portion of that utility is used to make backup copies of the database on demand
(backups can be kept on an archive disk or tape). The restore portion of the utility is then
used to recreate the database after a media failure from a specified backup copy.
Summary
In this module, database integrity and crash recovery were discussed. Hypothetical
language was defined, albeit not in detail, and examples of the use of this language were
presented for better understanding of the integrity of databases. Simplifications of the
hypothetical language were also discussed and used in the set of examples presented.
22