Dbms Chapter 6
Dbms Chapter 6
Unit-6
Storage Strategies
Computer uses special ROM called BIOS (Basic Input Output System) which
permanently stores the software needed to access computer hardware such
as hard disk and then load an operating system into RAM and start to execute
it.
What is Memory?
Computer memory is any physical device capable of storing information
temporarily or permanently.
Types of memory
3. Programmable Read-Only Memory (PROM), is a memory chip on which you
can store a program. But once the PROM has been used, you cannot wipe it
clean and use it to store something else. Like ROMs, PROMs are non-
volatile. E.g CD-R
• Faster
Register • Expensive
• Less
Capacity
L1 Cache
L2 Cache
Main Memory
RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.
These levels contain the following characteristics:
•It contains a set of physical disk drives.
•In this technology, the operating system views these separate disks as a single logical
disk.
•In this technology, data is distributed across the physical drives of the array.
•Redundancy disk capacity is used to store parity information.
•In case of disk failure, the parity information can be helped to recover the data.
Standard RAID levels
RAID 0
•RAID level 0 provides data stripping, i.e., a data can place across multiple
disks. It is based on stripping that means if one disk fails then all data in the array
is lost. Disk 0 Disk 1 Disk 2 Disk 3
•This level doesn't provide fault tolerance but increases the system performance.
Example: 20 21 22 23
In this figure, block 0, 1, 2, 3 form a stripe.
In this level, instead of placing just one block into a disk at a time, we can work 24 25 26 27
with two or more blocks placed it into a disk before moving on to the next one. 28 29 30 31
In this above figure, there is no duplication of data. Hence, a block once lost
cannot be recovered. 32 33 34 35
Pros of RAID 0:
•In this level, throughput is increased because multiple data requests probably
not on the same disk.
•This level full utilizes the disk space and provides high performance. Disk 0 Disk 1 Disk 2 Disk 3
•It requires minimum 2 drives.
20 22 24 26
Cons of RAID 0:
•It doesn't contain any error detection mechanism. 21 23 25 27
•The RAID 0 is not a true RAID because it is not fault-tolerance. 28 30 32 34
•In this level, failure of either disk results in complete data loss in respective
29 31 33 35
array.
RAID 1
This level is called mirroring of data as it copies the data from drive
1 to drive 2. It provides 100% redundancy in case of a failure.
Example:
Only half space of the drive is used to store the data. The other half
of drive is just a mirror to the already stored data.
Pros of RAID 1:
•The main advantage of RAID 1 is fault tolerance. In this level, if
Disk 0 Disk 1 Disk 2 Disk 3
one disk fails, then the other automatically takes over.
•In this level, the array will function even if any one of the drives A A B B
fails.
C C D D
E E F F
Cons of RAID 1:
•In this level, one extra drive is required per drive for mirroring, so G G H H
the expense is higher.
RAID 2
•RAID 2 consists of bit-level striping using hamming code parity.
•In this level, each data bit in a word is recorded on a separate disk and ECC
code of data words is stored on different set disks.
•Due to its high cost and complex structure, this level is not commercially
used. This same performance can be achieved by RAID 3 at a lower cost.
Pros of RAID 2:
•This level uses one designated drive to store parity.
•It uses the hamming code for error detection.
Cons of RAID 2:
•It requires an additional drive for error detection.
RAID 3
•RAID 3 consists of byte-level striping with dedicated parity. In
this level, the parity information is stored for each disk section
and written to a dedicated parity drive.
•In case of drive failure, the parity drive is accessed, and data is
reconstructed from the remaining devices. Once the failed drive
is replaced, the missing data can be restored on the new drive. Disk 0 Disk 1 Disk 2 Disk 3
•In this level, data can be transferred in bulk. Thus high-speed
A B C P(A, B, C)
data transmission is possible.
D E F P(D, E, F)
Cons of RAID 3:
•It required an additional drive for parity.
•It gives a slow performance for operating on small sized files.
RAID 4
•RAID 4 consists of block-level stripping with a parity disk. Disk 0 Disk 1 Disk 2 Disk 3
Instead of duplicating data, the RAID 4 adopts a parity-based
approach. A B C P0
D E F P1
•This level allows recovery of at most 1 disk failure due to the G H I P2
way parity works. In this level, if more than one disk fails, then
there is no way to recover the data. J K L P3
•Level 3 and level 4 both are required at least three disks to
implement RAID.
C1 C2 C3 C4 Parity
In this figure, we can observe one disk dedicated to parity.
In this level, parity can be calculated using an XOR function. If 0 1 0 0 1
the data bits are 0,0,0,1 then the parity bits is XOR(0,1,0,0) = 1. 0 0 1 1 0
If the parity bits are 0,0,1,1 then the parity bit is XOR(0,0,1,1)= 0.
That means, even number of one results in parity 0 and an odd
number of one results in parity 1.
Suppose that in the above figure, C2 is lost due to some disk
failure. Then using the values of all the other columns and the
parity bit, we can re-compute the data bit stored in C2. This level
allows us to recover lost data.
RAID 5
•RAID 5 is a slight modification of the RAID 4 system. The only
difference is that in RAID 5, the parity rotates among the drives.
•It consists of block-level striping with DISTRIBUTED parity.
•Same as RAID 4, this level allows recovery of at most 1 disk failure. If
more than one disk fails, then there is no way for data recovery. Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
This figure shows that how parity bit rotates.
This level was introduced to make the random write performance 0 1 2 3 P0
better. 5 6 7 P1 4
10 11 P2 8 9
Pros of RAID 5: 15 P3 12 13 14
•This level is cost effective and provides high performance.
•In this level, parity is distributed across the disks in an array. P4 16 17 18 19
•It is used to make the random write performance better.
Cons of RAID 5:
•In this level, disk failure recovery takes longer time as parity has to be
calculated from all available drives.
•This level cannot survive in concurrent drive failure.
RAID 6
•This level is an extension of RAID 5. It contains block-level
stripping with 2 parity bits.
•In RAID 6, you can survive 2 concurrent disk failures. Suppose
you are using RAID 5, and RAID 1. When your disks fail, you
Disk 1 Disk 2 Disk 3 Disk 4
need to replace the failed disk because if simultaneously
another disk fails then you won't be able to recover any of the A0 B0 Q0 P0
data, so in this case RAID 6 plays its part where you can
A1 Q1 P1 D1
survive two concurrent disk failures before you run out of
options. Q2 P2 C2 D2
P3 B3 C3 Q3
Pros of RAID 6:
•This level performs RAID 0 to strip data and RAID 1 to mirror.
In this level, stripping is performed before mirroring.
•In this level, drives required should be multiple of 2.
Cons of RAID 6:
•It is not utilized 100% disk capability as half is used for
mirroring.
•It contains very limited scalability.
What is database Index?
Indexes are special lookup tables that the database search engine can use to speed
up data retrieval.
A database index is a data structure that improves the speed of data retrieval
operations on a database table.
An index in a database is very similar to an index in the back of a book.
Indexes are used to retrieve data from the database very fast. The users cannot see
the indexes, they are just used to speed up searches/queries.
Updating a table with indexes takes more time than updating a table without
(because the indexes also need an update). So, only create indexes on columns that
will be frequently searched against.
Syntax to create and drop an Index
Syntax to create an index:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
search-key pointer
The first column is the search key that contains a copy of the primary key or candidate key of
the table. These values are stored in sorted order so that the corresponding data can be
accessed quickly.
The second column is the data reference or pointer which contains a set of pointers holding
the address of the disk block where that particular key value can be found.
Index
Dense Sparse
Primary Index (Ordered Index)
If the index is created on the primary key of the table, then it is known as primary index.
These primary keys are unique to each record.
As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
Student(RollNo, Name, Address, City, MobileNo)
CREATE INDEX idx_StudentRno
ON Student (RollNo);
In this, the number of records in the index table is same as the 103 103 Suresh
number of records in the main table. 104 104 Mira
Index records contain search key value and a pointer to the actual 105 105 Nita
record on the disk.
106 106 Om
107 107 Ajay
108 108 Amit
109 109 ravi
110 110 Nayan
Index
Dense Sparse
Secondary Index (How to find a particular record?)
If you want to find the record of roll 112, then it will search Rno Name
the highest entry which is smaller than or equal to 112 in the 101 Raj
first level index. It will get 101 at this level.
102 ram
Then in the second index level, again it does max (112) <=
101
112 and gets 111. Now using the address 111, it goes to the
data block and starts searching each record till it gets 112. 101 111 111 Mira
This is how a search is performed in this method. 201 112 Nita
Primary Secondary
Main Table
Index Index
Indexing Methods (Types)
Index
Dense Sparse
Clustering Index
Sometimes the index is created on non-primary key Dept Name
columns which may not be unique for each record. CE Raj
In this case, to identify the record faster, we will group two CE ram
or more columns to get the unique value and create index
out of them. This method is called a clustering index.
CE EE Mira
The records which have similar characteristics are grouped,
and indexes are created for these group. EE EE Nita
EC
ME EC Ajay
EC Amit
ME Nayan
ME dipen
Index Main
Table Table
B-tree
B-tree is a data structure that store data in its node in sorted order. We can represent
sample B-tree as follows.
Root Node
Intermediary Node 11
3, 6 16, 20
Leaf Node
B-tree stores data in such a way that each node contains keys in ascending order.
Each of these keys has two references to another two child nodes.
The left side child node keys are less than the current keys and the right side child node
keys are greater than the current keys.
B-tree (How to search a particular node?)
Root Node
Intermediary Node 11
3, 6 16, 20
Leaf Node
D1 0
D2
1
D3
Dynamic hashing
But the problem is that no bucket address is remaining for D3.
The bucket has to grow dynamically to accommodate D3.
So it changes the address have 2 bits rather than 1 bit, and then it updates the
existing data to have 2 bit address.
Then it tries to accommodate D3.
00
D1 01
D2
10
D3
11
Questions asked in TU
1. Explain indexing and different types of indexes.
2. Explain binary tree, B-tree and B+ tree.
3. Explain hashing.
Database Management Systems (DBMS)
RAID 2 Data bits(4) Parity bits(3)
•RAID 2 consists of bit-level striping using
hamming code parity. 7 6 5 4 3 2 1
D7 D6 D5 P4 D3 P2 P1
•In this level, each data bit in a word is recorded on
a separate disk and ECC code of data words is
stored on different set disks.
Transmitted bit
D7 D6 D5 P4 D3 P2 P1
1 1 1 0 1 0 1
D7 D6 D5 P4 D3 P2 P1
1 1 1 0 1 0 1
Correcting error
P4 P2 P1
1 1 0
Corrected data is
D7 D6 D5 P4 D3 P2 P1
1 0 1 0 1 0 1