dbmt notes online
dbmt notes online
(DBMS)?
Functional Dependency
Functional dependency refers to the relation of one attribute of the database to another. With the
help of functional dependency, the quality of the data in the database can be maintained.
The symbol for representing functional dependency is -> (arrow).
Example of Functional Dependency
Consider the following table.
Example
SSN->ENAME read as SSN functionally dependent on ENAME or SSN
determines ENAME.
PNUMBER->{PNAME,PLOCATION} (PNUMBER determines PNAME and PLOCATION)
{SSN,PNUMBER}->HOURS (SSN and PNUMBER combined determines HOURS)
Transitive Dependency
The transitive dependency is being obtained by using the relation of more than three attributes.
These dependencies are being used to normalize the database in 3NF.
Hence, as per the transitivity, the {Book} -> {Age_of_Author}. Therefore, it one knows the book
then it must know the age of the Author.
Database Users:
Users are differentiated by the way they expect to interact with the system:
• Application programmers:
• Application programmers are computer professionals who write application
programs. Application programmers can choose from many tools to develop user
interfaces.
• Rapid application development (RAD) tools are tools that enable an application
programmer to construct forms and reports without writing a program.
• Sophisticated users:
• Sophisticated users interact with the system without writing programs. Instead, they
form their requests in a database query language.
• They submit each such query to a query processor, whose function is to break down
DML statements into instructions that the storage manager understands.
• Specialized users :
• Specialized users are sophisticated users who write specialized database applications
that do not fit into the traditional data-processing framework.
• Among these applications are computer-aided design systems, knowledge base and
expert systems, systems that store data with complex data types (for example,
graphics data and audio data), and environment-modeling systems.
• Naïve users :
• Naive users are unsophisticated users who interact with the system by invoking one
of the application programs that have been written previously.
• For example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer. This program asks the teller for the amount of
money to be transferred, the account from which the money is to be transferred, and
the account to which the money is to be transferred.
Database Administrator:
• Coordinates all the activities of the database system. The database administrator has a good
understanding of the enterprise’s information resources and needs.
• Database administrator's duties include:
• Schema definition: The DBA creates the original database schema by executing a
set of data definition statements in the DDL.
• Storage structure and access method definition.
• Schema and physical organization modification: The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
• Granting user authority to access the database: By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access.
• Specifying integrity constraints.
• Monitoring performance and responding to changes in requirements.
Query Processor:
The query processor will accept query from user and solves it by accessing the database.
Parts of Query processor:
• DDL interpreter
This will interprets DDL statements and fetch the definitions in the data dictionary.
• DML compiler
a. This will translates DML statements in a query language into low level instructions that
the query evaluation engine understands.
b. A query can usually be translated into any of a number of alternative evaluation plans for
same query result DML compiler will select best plan for query optimization.
• Query evaluation engine
This engine will execute low-level instructions generated by the DML compiler on DBMS.
Storage Manager/Storage Management:
• A storage manager is a program module which acts like interface between the data stored in
a database and the application programs and queries submitted to the system.
• Thus, the storage manager is responsible for storing, retrieving and updating data in the
database.
• The storage manager components include:
• Authorization and integrity manager: Checks for integrity constraints and
authority of users to access data.
• Transaction manager: Ensures that the database remains in a consistent state
although there are system failures.
• File manager: Manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
• Buffer manager: It is responsible for retrieving data from disk storage into main
memory. It enables the database to handle data sizes that are much larger than the
size of main memory.
• Data structures implemented by storage manager.
• Data files: Stored in the database itself.
• Data dictionary: Stores metadata about the structure of the database.
• Indices: Provide fast access to data items.
Levels of RAID
Many different ways of distributing data have been standardized into various RAID levels. Each
RAID level is offering a trade-off of data protection, system performance, and storage space. The
number of levels has been broken into three categories, standard, nested, and non-standard RAID
levels.
Standards RAID Levels
Below are the following most popular and standard RAID levels.
1. RAID 0 (striped disks)
RAID 0 is taking any number of disks and merging them into one large volume. It will increase
speeds as you're reading and writing from multiple disks at a time. But all data on all disks is lost if
any one disk fails. An individual file can then use the speed and capacity of all the drives of the
array. The downside to RAID 0, though, is that it is NOT redundant. The loss of any individual disk
will cause complete data loss. This RAID type is very much less reliable than having a single disk.
There is rarely a situation where you should use RAID 0 in a server environment. You can use it for
cache or other purposes where speed is essential, and reliability or data loss does not matter at all.
2. RAID 1 (mirrored disks)
It duplicates data across two disks in the array, providing full redundancy. Both disks are store
exactly the same data, at the same time, and at all times. Data is not lost as long as one disk
survives. The total capacity of the array equals the capacity of the smallest disk in the array. At any
given instant, the contents of both disks in the array are identical.
RAID 1 is capable of a much more complicated configuration. The point of RAID 1 is primarily for
redundancy. If you completely lose a drive, you can still stay up and running off the other drive.
If either drive fails, you can then replace the broken drive with little to no downtime. RAID 1 also
gives you the additional benefit of increased read performance, as data can read off any of the
drives in the array. The downsides are that you will have slightly higher write latency. Since the data
needs to be written to both drives in the array, you'll only have a single drive's available capacity
while needing two drives.
3. RAID 5(striped disks with single parity)
RAID 5 requires the use of at least three drives. It combines these disks to protect data against loss
of any one disk; the array's storage capacity is reduced by one disk. It strips data across multiple
drives to increase performance. But, it also adds the aspect of redundancy by distributing parity
information across the disks.
4. RAID 6 (Striped disks with double parity)
RAID 6 is similar to RAID 5, but the parity data are written to two drives. The use of additional
parity enables the array to continue to function even if two disks fail simultaneously. However, this
extra protection comes at a cost. RAID 6 has a slower write performance than RAID 5.
The chances that two drives break down at the same moment are minimal. However, if a drive in a
RAID 5 system died and was replaced by a new drive, it takes a lot of time to rebuild the swapped
drive. If another drive dies during that time, you still lose all of your data. With RAID 6, the RAID
array will even survive that second failure also.
Nested RAID levels
Some RAID levels are referred to as nested RAID because they are based on a combination of
RAID levels, such as:
1. RAID 10 (1+0)
This level Combines RAID 1 and RAID 0 in a single system, which offers higher performance than
RAID 1, but at a much higher cost.
This is a nested or hybrid RAID configuration. It provides security by mirroring all data on
secondary drives while using striping across each set of drives to speed up data transfers.
Benefits of RAID
Benefits of RAID include the following.
• An improvement in cost-effectiveness because lower-priced disks are used in large numbers.
• The use of multiple hard drives enables RAID to improve the performance of a single hard
drive.
• Increased computer speed and reliability after a crash depending on the configuration.
• There is increased availability and resiliency with RAID 5. With mirroring, RAID arrays can
have two drives containing the same data. It ensures one will continue to work if the other
fails.
Drawbacks of RAID
RAID has the following drawbacks or disadvantages:
• Nested RAID levels are more expensive to implement than traditional RAID levels because
they require many disks.
• The cost per gigabyte of storage devices is higher for nested RAID because many of the
drives are used for redundancy.
• When a drive fails, the probability that another drive in the array will also soon fail rises,
which would likely result in data loss. This is because all the drives in a RAID array are
installed simultaneously. So all the drives are subject to the same amount of wear.
• Some RAID levels, such as RAID 1 and 5, can only sustain a single drive failure.
• RAID arrays are in a vulnerable state until a failed drive is replaced and the new disk is
populated with data.
• When RAID was implemented, it takes a lot longer to rebuild failed drives because drives
have much greater capacity.
1. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we can significantly
increase the performance of the system.
2. High availability –
In the parallel database, nodes have less contact with each other, so the failure of one node
doesn’t cause for failure of the entire system. This amounts to significantly higher database
availability.
3. Proper resource utilization –
Due to parallel execution, the CPU will never be idle. Thus, proper utilization of resources is
there.
4. Increase Reliability –
When one site fails, the execution can continue with another available site which is having a
copy of data. Making the system more reliable.
Performance Measurement of Databases :
Here, we will emphasize the performance measurement factor-like Speedup and Scale-up. Let’s
understand it one by one with the help of examples.
Speedup –
The ability to execute the tasks in less time by increasing the number of resources is called
Speedup.
Speedup=time original/time parallel
Where ,
time original = time required to execute the task using 1 processor
time parallel = time required to execute the task using 'n' processors
fig. ‘n’ CPU requires 1 min to execute a process by dividing into smaller tasks
Scale-up –
The ability to maintain the performance of the system when both workload and resources increase
proportionally.
Scaleup = Volume Parallel/Volume Original
Where ,
Volume Parallel = volume executed in a given amount of time using 'n' processor
Volume Original = volume executed in a given amount of time using 1 processor
Examples Of Deadlock
1. The system has 2 tape drives. P1 and P2 each hold one tape drive and each needs another
one.
2. Semaphores A and B, initialized to 1, P0, and P1 are in deadlock as follows:
• P0 executes wait(A) and preempts.
• P1 executes wait(B).
• Now P0 and P1 enter in deadlock.
P0 P1
wait(A); wait(B)
wait(B); wait(A)
3. Assume the space is available for allocation of 200K bytes, and the following sequence of
events occurs.
P0 P1