0% found this document useful (0 votes)
11 views16 pages

dbmt notes online

Functional dependency in databases indicates the relationship between attributes, where one attribute uniquely determines another, while transitive dependency involves a chain of dependencies among three or more attributes, often used in normalization. Database users are categorized into application programmers, sophisticated users, specialized users, and naïve users, each interacting with the system differently. RAID technology enhances data storage by combining multiple disks for redundancy and performance, with various levels offering different trade-offs in terms of fault tolerance and performance.

Uploaded by

pratikkokate88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views16 pages

dbmt notes online

Functional dependency in databases indicates the relationship between attributes, where one attribute uniquely determines another, while transitive dependency involves a chain of dependencies among three or more attributes, often used in normalization. Database users are categorized into application programmers, sophisticated users, specialized users, and naïve users, each interacting with the system differently. RAID technology enhances data storage by combining multiple disks for redundancy and performance, with various levels offering different trade-offs in terms of fault tolerance and performance.

Uploaded by

pratikkokate88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 16

What is functional dependency and transitive dependency

(DBMS)?

Functional Dependency
Functional dependency refers to the relation of one attribute of the database to another. With the
help of functional dependency, the quality of the data in the database can be maintained.
The symbol for representing functional dependency is -> (arrow).
Example of Functional Dependency
Consider the following table.

Employee Number Name City Salary


1 bob Bangalore 25000
2 Lucky Delhi 40000
The details of the name of the employee, salary and city are obtained by the value of the number of
Employee (or id of an employee). So, it can be said that the city, salary and the name attributes are
functionally dependent on the attribute Employee Number.

Example
SSN->ENAME read as SSN functionally dependent on ENAME or SSN
determines ENAME.
PNUMBER->{PNAME,PLOCATION} (PNUMBER determines PNAME and PLOCATION)
{SSN,PNUMBER}->HOURS (SSN and PNUMBER combined determines HOURS)

Transitive Dependency
The transitive dependency is being obtained by using the relation of more than three attributes.
These dependencies are being used to normalize the database in 3NF.

Example of Transitive Dependency


Consider the following table −

Book Book_Author Age_of_Author


ABC Hari 45
PQR James 60
The dependencies are as follows −
{Book} -> {Book_Author}
{Book_Author} does not -> {Book}
{Book_Author} -> {Age_of_Author}

Hence, as per the transitivity, the {Book} -> {Age_of_Author}. Therefore, it one knows the book
then it must know the age of the Author.
Database Users:
Users are differentiated by the way they expect to interact with the system:
• Application programmers:
• Application programmers are computer professionals who write application
programs. Application programmers can choose from many tools to develop user
interfaces.
• Rapid application development (RAD) tools are tools that enable an application
programmer to construct forms and reports without writing a program.
• Sophisticated users:
• Sophisticated users interact with the system without writing programs. Instead, they
form their requests in a database query language.
• They submit each such query to a query processor, whose function is to break down
DML statements into instructions that the storage manager understands.
• Specialized users :
• Specialized users are sophisticated users who write specialized database applications
that do not fit into the traditional data-processing framework.
• Among these applications are computer-aided design systems, knowledge base and
expert systems, systems that store data with complex data types (for example,
graphics data and audio data), and environment-modeling systems.
• Naïve users :
• Naive users are unsophisticated users who interact with the system by invoking one
of the application programs that have been written previously.
• For example, a bank teller who needs to transfer $50 from account A to account B
invokes a program called transfer. This program asks the teller for the amount of
money to be transferred, the account from which the money is to be transferred, and
the account to which the money is to be transferred.
Database Administrator:
• Coordinates all the activities of the database system. The database administrator has a good
understanding of the enterprise’s information resources and needs.
• Database administrator's duties include:
• Schema definition: The DBA creates the original database schema by executing a
set of data definition statements in the DDL.
• Storage structure and access method definition.
• Schema and physical organization modification: The DBA carries out changes to
the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
• Granting user authority to access the database: By granting different types of
authorization, the database administrator can regulate which parts of the database
various users can access.
• Specifying integrity constraints.
• Monitoring performance and responding to changes in requirements.
Query Processor:
The query processor will accept query from user and solves it by accessing the database.
Parts of Query processor:
• DDL interpreter
This will interprets DDL statements and fetch the definitions in the data dictionary.
• DML compiler
a. This will translates DML statements in a query language into low level instructions that
the query evaluation engine understands.
b. A query can usually be translated into any of a number of alternative evaluation plans for
same query result DML compiler will select best plan for query optimization.
• Query evaluation engine
This engine will execute low-level instructions generated by the DML compiler on DBMS.
Storage Manager/Storage Management:
• A storage manager is a program module which acts like interface between the data stored in
a database and the application programs and queries submitted to the system.
• Thus, the storage manager is responsible for storing, retrieving and updating data in the
database.
• The storage manager components include:
• Authorization and integrity manager: Checks for integrity constraints and
authority of users to access data.
• Transaction manager: Ensures that the database remains in a consistent state
although there are system failures.
• File manager: Manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
• Buffer manager: It is responsible for retrieving data from disk storage into main
memory. It enables the database to handle data sizes that are much larger than the
size of main memory.
• Data structures implemented by storage manager.
• Data files: Stored in the database itself.
• Data dictionary: Stores metadata about the structure of the database.
• Indices: Provide fast access to data items.

What is RAID? Explain


RAID or redundant array of independent disks is a data storage virtualization technology
that combines multiple physical disk drive components into one or more logical units for data
redundancy, performance improvement, or both.
It is a way of storing the same data in different places on multiple hard disks or solid-state drives to
protect data in the case of a drive failure. A RAID system consists of two or more drives working in
parallel. These can be hard discs, but there is a trend to use SSD technology (Solid State Drives).
RAID combines several independent and relatively small disks into single storage of a large size.
The disks included in the array are called array members. The disks can combine into the array in
different ways, which are known as RAID levels. Each of RAID levels has its own characteristics
of:
• Fault-tolerance is the ability to survive one or several disk failures.
• Performance shows the change in the read and writes speed of the entire array compared to
a single disk.
• The array's capacity is determined by the amount of user data written to the array. The
array capacity depends on the RAID level and does not always match the sum of the RAID
member disks' sizes. To calculate the particular RAID type's capacity and a set of member
disks, you can use a free online RAID calculator.
RAID systems can use with several interfaces, including SATA, SCSI, IDE, or FC (fiber channel.)
Some systems use SATA disks internally but that have a FireWire or SCSI interface for the host
system.
Sometimes disks in a storage system are defined as JBOD, which stands for Just a Bunch of Disks.
This means that those disks do not use a specific RAID level and acts as stand-alone disks. This is
often done for drives that contain swap files or spooling data.

How RAID Works


RAID works by placing data on multiple disks and allowing input/output operations to overlap in a
balanced way, improving performance. Because various disks increase the mean time between
failures (MTBF), storing data redundantly also increases fault tolerance.
RAID arrays appear to the operating system as a single logical drive. RAID employs the techniques
of disk mirroring or disk striping.
• Disk Mirroring will copy identical data onto more than one drive.
• Disk Striping partitions help spread data over multiple disk drives. Each drive's storage
space is divided into units ranging from 512 bytes up to several megabytes. The stripes of all
the disks are interleaved and addressed in order.
• Disk mirroring and disk striping can also be combined in a RAID array.
In a single-user system where significant records are stored, the stripes are typically set up to be
small (512 bytes) so that a single record spans all the disks and can be accessed quickly by reading
all the disks at the same time.
In a multi-user system, better performance requires a stripe wide enough to hold the typical or
maximum size record, allowing overlapped disk I/O across drives.

Levels of RAID
Many different ways of distributing data have been standardized into various RAID levels. Each
RAID level is offering a trade-off of data protection, system performance, and storage space. The
number of levels has been broken into three categories, standard, nested, and non-standard RAID
levels.
Standards RAID Levels
Below are the following most popular and standard RAID levels.
1. RAID 0 (striped disks)
RAID 0 is taking any number of disks and merging them into one large volume. It will increase
speeds as you're reading and writing from multiple disks at a time. But all data on all disks is lost if
any one disk fails. An individual file can then use the speed and capacity of all the drives of the
array. The downside to RAID 0, though, is that it is NOT redundant. The loss of any individual disk
will cause complete data loss. This RAID type is very much less reliable than having a single disk.
There is rarely a situation where you should use RAID 0 in a server environment. You can use it for
cache or other purposes where speed is essential, and reliability or data loss does not matter at all.
2. RAID 1 (mirrored disks)
It duplicates data across two disks in the array, providing full redundancy. Both disks are store
exactly the same data, at the same time, and at all times. Data is not lost as long as one disk
survives. The total capacity of the array equals the capacity of the smallest disk in the array. At any
given instant, the contents of both disks in the array are identical.
RAID 1 is capable of a much more complicated configuration. The point of RAID 1 is primarily for
redundancy. If you completely lose a drive, you can still stay up and running off the other drive.
If either drive fails, you can then replace the broken drive with little to no downtime. RAID 1 also
gives you the additional benefit of increased read performance, as data can read off any of the
drives in the array. The downsides are that you will have slightly higher write latency. Since the data
needs to be written to both drives in the array, you'll only have a single drive's available capacity
while needing two drives.
3. RAID 5(striped disks with single parity)
RAID 5 requires the use of at least three drives. It combines these disks to protect data against loss
of any one disk; the array's storage capacity is reduced by one disk. It strips data across multiple
drives to increase performance. But, it also adds the aspect of redundancy by distributing parity
information across the disks.
4. RAID 6 (Striped disks with double parity)
RAID 6 is similar to RAID 5, but the parity data are written to two drives. The use of additional
parity enables the array to continue to function even if two disks fail simultaneously. However, this
extra protection comes at a cost. RAID 6 has a slower write performance than RAID 5.
The chances that two drives break down at the same moment are minimal. However, if a drive in a
RAID 5 system died and was replaced by a new drive, it takes a lot of time to rebuild the swapped
drive. If another drive dies during that time, you still lose all of your data. With RAID 6, the RAID
array will even survive that second failure also.
Nested RAID levels
Some RAID levels are referred to as nested RAID because they are based on a combination of
RAID levels, such as:
1. RAID 10 (1+0)
This level Combines RAID 1 and RAID 0 in a single system, which offers higher performance than
RAID 1, but at a much higher cost.
This is a nested or hybrid RAID configuration. It provides security by mirroring all data on
secondary drives while using striping across each set of drives to speed up data transfers.

Benefits of RAID
Benefits of RAID include the following.
• An improvement in cost-effectiveness because lower-priced disks are used in large numbers.
• The use of multiple hard drives enables RAID to improve the performance of a single hard
drive.
• Increased computer speed and reliability after a crash depending on the configuration.
• There is increased availability and resiliency with RAID 5. With mirroring, RAID arrays can
have two drives containing the same data. It ensures one will continue to work if the other
fails.
Drawbacks of RAID
RAID has the following drawbacks or disadvantages:
• Nested RAID levels are more expensive to implement than traditional RAID levels because
they require many disks.
• The cost per gigabyte of storage devices is higher for nested RAID because many of the
drives are used for redundancy.
• When a drive fails, the probability that another drive in the array will also soon fail rises,
which would likely result in data loss. This is because all the drives in a RAID array are
installed simultaneously. So all the drives are subject to the same amount of wear.
• Some RAID levels, such as RAID 1 and 5, can only sustain a single drive failure.
• RAID arrays are in a vulnerable state until a failed drive is replaced and the new disk is
populated with data.
• When RAID was implemented, it takes a lot longer to rebuild failed drives because drives
have much greater capacity.

Introduction of Parallel Database


Parallel Databases :
Nowadays organizations need to handle a huge amount of data with a high transfer rate. For such
requirements, the client-server or centralized system is not efficient. With the need to improve the
efficiency of the system, the concept of the parallel database comes in picture. A parallel database
system seeks to improve the performance of the system through parallelizing concept.
Need :
Multiple resources like CPUs and Disks are used in parallel. The operations are performed
simultaneously, as opposed to serial processing. A parallel server can allow access to a single
database by users on multiple machines. It also performs many parallelization operations like data
loading, query processing, building indexes, and evaluating queries.
Advantages :
Here, we will discuss the advantages of parallel databases. Let’s have a look.

1. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we can significantly
increase the performance of the system.
2. High availability –
In the parallel database, nodes have less contact with each other, so the failure of one node
doesn’t cause for failure of the entire system. This amounts to significantly higher database
availability.
3. Proper resource utilization –
Due to parallel execution, the CPU will never be idle. Thus, proper utilization of resources is
there.
4. Increase Reliability –
When one site fails, the execution can continue with another available site which is having a
copy of data. Making the system more reliable.
Performance Measurement of Databases :
Here, we will emphasize the performance measurement factor-like Speedup and Scale-up. Let’s
understand it one by one with the help of examples.
Speedup –
The ability to execute the tasks in less time by increasing the number of resources is called
Speedup.
Speedup=time original/time parallel
Where ,
time original = time required to execute the task using 1 processor
time parallel = time required to execute the task using 'n' processors

fig. Ideal Speedup curve


Example –

fig. A CPU requires 3 minutes to execute a process

fig. ‘n’ CPU requires 1 min to execute a process by dividing into smaller tasks
Scale-up –
The ability to maintain the performance of the system when both workload and resources increase
proportionally.
Scaleup = Volume Parallel/Volume Original
Where ,
Volume Parallel = volume executed in a given amount of time using 'n' processor
Volume Original = volume executed in a given amount of time using 1 processor

fig. Ideal Scaleup curve


What is deadlock ? Explain
A deadlock is a situation where a set of processes are blocked because each process is holding a
resource and waiting for another resource acquired by some other process.
Consider an example when two trains are coming toward each other on the same track and there is
only one track, none of the trains can move once they are in front of each other. A similar situation
occurs in operating systems when there are two or more processes that hold some resources and
wait for resources held by other(s). For example, in the below diagram, Process 1 is holding
Resource 1 and waiting for resource 2 which is acquired by process 2, and process 2 is waiting for
resource 1.

Examples Of Deadlock
1. The system has 2 tape drives. P1 and P2 each hold one tape drive and each needs another
one.
2. Semaphores A and B, initialized to 1, P0, and P1 are in deadlock as follows:
• P0 executes wait(A) and preempts.
• P1 executes wait(B).
• Now P0 and P1 enter in deadlock.

P0 P1

wait(A); wait(B)

wait(B); wait(A)

3. Assume the space is available for allocation of 200K bytes, and the following sequence of
events occurs.
P0 P1

Request 80KB; Request 70KB;

Request 60KB; Request 80KB;

Deadlock occurs if both processes progress to their second request.


Deadlock can arise if the following four conditions hold simultaneously (Necessary
Conditions)
Mutual Exclusion: Two or more resources are non-shareable (Only one process can use at a time)
Hold and Wait: A process is holding at least one resource and waiting for resources.
No Preemption: A resource cannot be taken from a process unless the process releases the
resource.
Circular Wait: A set of processes waiting for each other in circular form.
Methods for handling deadlock
There are three ways to handle deadlock
1) Deadlock prevention or avoidance:
Prevention:
The idea is to not let the system into a deadlock state. This system will make sure that above
mentioned four conditions will not arise. These techniques are very costly so we use this in cases
where our priority is making a system deadlock-free.
One can zoom into each category individually, Prevention is done by negating one of the above-
mentioned necessary conditions for deadlock. Prevention can be done in four different ways:
1. Eliminate mutual exclusion 3. Allow preemption
2. Solve hold and Wait 4. Circular wait Solution
Avoidance:
Avoidance is kind of futuristic. By using the strategy of “Avoidance”, we have to make an
assumption. We need to ensure that all information about resources that the process will need is
known to us before the execution of the process. We use Banker’s algorithm (Which is in turn a gift
from Dijkstra) to avoid deadlock.
In prevention and avoidance, we get the correctness of data but performance decreases.
2) Deadlock detection and recovery: If Deadlock prevention or avoidance is not applied to the
software then we can handle this by deadlock detection and recovery. which consist of two phases:
1. In the first phase, we examine the state of the process and check whether there is a deadlock
or not in the system.
2. If found deadlock in the first phase then we apply the algorithm for recovery of the
deadlock.
In Deadlock detection and recovery, we get the correctness of data but performance decreases.
Recovery from Deadlock
1. Manual Intervention:
When a deadlock is detected, one option is to inform the operator and let them handle the situation
manually. While this approach allows for human judgment and decision-making, it can be time-
consuming and may not be feasible in large-scale systems.
2. Automatic Recovery:
An alternative approach is to enable the system to recover from deadlock automatically. This
method involves breaking the deadlock cycle by either aborting processes or preempting resources.
Let’s delve into these strategies in more detail.
Recovery from Deadlock: Process Termination:
1. Abort all deadlocked processes:
This approach breaks the deadlock cycle, but it comes at a significant cost. The processes that were
aborted may have executed for a considerable amount of time, resulting in the loss of partial
computations. These computations may need to be recomputed later.
2. Abort one process at a time:
Instead of aborting all deadlocked processes simultaneously, this strategy involves selectively
aborting one process at a time until the deadlock cycle is eliminated. However, this incurs overhead
as a deadlock-detection algorithm must be invoked after each process termination to determine if
any processes are still deadlocked.
Factors for choosing the termination order:
– The process’s priority
– Completion time and the progress made so far
– Resources consumed by the process
– Resources required to complete the process
– Number of processes to be terminated
– Process type (interactive or batch)
Recovery from Deadlock: Resource Preemption:
1. Selecting a victim:
Resource preemption involves choosing which resources and processes should be preempted to
break the deadlock. The selection order aims to minimize the overall cost of recovery. Factors
considered for victim selection may include the number of resources held by a deadlocked process
and the amount of time the process has consumed.
2. Rollback:
If a resource is preempted from a process, the process cannot continue its normal execution as it
lacks the required resource. Rolling back the process to a safe state and restarting it is a common
approach. Determining a safe state can be challenging, leading to the use of total rollback, where
the process is aborted and restarted from scratch.
3. Starvation prevention:
To prevent resource starvation, it is essential to ensure that the same process is not always chosen as
a victim. If victim selection is solely based on cost factors, one process might repeatedly lose its
resources and never complete its designated task. To address this, it is advisable to limit the number
of times a process can be chosen as a victim, including the number of rollbacks in the cost factor.
3) Deadlock ignorance: If a deadlock is very rare, then let it happen and reboot the system. This is
the approach that both Windows and UNIX take. we use the ostrich algorithm for deadlock
ignorance.
In Deadlock, ignorance performance is better than the above two methods but the correctness of
data.
Safe State:
A safe state can be defined as a state in which there is no deadlock. It is achievable if:
• If a process needs an unavailable resource, it may wait until the same has been released by a
process to which it has already been allocated. if such a sequence does not exist, it is an
unsafe state.
• All the requested resources are allocated to the process.

You might also like