0% found this document useful (0 votes)
26 views

Dbms Chapter 6

The document discusses different types of computer memory and storage strategies used in database management systems. It defines random access memory (RAM), read only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), and electrically erasable programmable read-only memory (EEPROM). It also discusses the memory hierarchy and different RAID levels including RAID 0, 1, 2, 3, 4, and 5 that provide data redundancy and allow recovery from disk failures through parity information.

Uploaded by

GplsHub.com
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Dbms Chapter 6

The document discusses different types of computer memory and storage strategies used in database management systems. It defines random access memory (RAM), read only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), and electrically erasable programmable read-only memory (EEPROM). It also discusses the memory hierarchy and different RAID levels including RAID 0, 1, 2, 3, 4, and 5 that provide data redundancy and allow recovery from disk failures through parity information.

Uploaded by

GplsHub.com
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

 Database Management Systems (DBMS)

Unit-6
Storage Strategies

Sr. Lec. Sujan Shrestha


9801104103
[email protected]
 Outline
Looping
• Records organizations
• Disks and storage
• Remote backup system
• Hashing concepts, static and dynamic hashing
• Order Index (Indexing)
• B-tree index
What is Memory?
 Computer memory is any physical device capable of storing information
temporarily or permanently.
 Types of memory
1. Random Access Memory (RAM), is a volatile memory that loses its
contents when the computer or hardware device loses power.

2. Read Only Memory (ROM), is a non-volatile memory, sometimes


abbreviated as NVRAM, is a memory that keeps its contents even if the
power is lost.

 Computer uses special ROM called BIOS (Basic Input Output System) which
permanently stores the software needed to access computer hardware such
as hard disk and then load an operating system into RAM and start to execute
it.
What is Memory?
 Computer memory is any physical device capable of storing information
temporarily or permanently.
 Types of memory
3. Programmable Read-Only Memory (PROM), is a memory chip on which you
can store a program. But once the PROM has been used, you cannot wipe it
clean and use it to store something else. Like ROMs, PROMs are non-
volatile. E.g CD-R

4. Erasable Programmable Read-Only Memory (EPROM), is a special type of


PROM that can be erased by exposing it to ultraviolet light. E.g CD-RW

5. Electrically Erasable Programmable Read-Only Memory (EEPROM), is a


special type of PROM that can be erased by exposing it to an electrical
charge. E.g Pendrive
What is Memory Hierarchy?
 The hierarchical arrangement of storage in current computer architectures is called
the memory hierarchy.

• Faster
Register • Expensive
• Less
Capacity
L1 Cache

L2 Cache

Main Memory

Local secondary storage • Slower


• Cheaper
Remote secondary storage • More
Capacity
RAID
RAID refers to redundancy array of the independent disk. It is a technology which is used
to connect multiple secondary storage devices for increased performance, data
redundancy or both.
It gives you the ability to survive one or more drive failure depending upon the RAID level
used.
It consists of an array of disks in which multiple disks are connected to achieve different
goals.

RAID technology
There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.
These levels contain the following characteristics:
•It contains a set of physical disk drives.
•In this technology, the operating system views these separate disks as a single logical
disk.
•In this technology, data is distributed across the physical drives of the array.
•Redundancy disk capacity is used to store parity information.
•In case of disk failure, the parity information can be helped to recover the data.
Standard RAID levels
RAID 0
•RAID level 0 provides data stripping, i.e., a data can place across multiple
disks. It is based on stripping that means if one disk fails then all data in the array
is lost. Disk 0 Disk 1 Disk 2 Disk 3
•This level doesn't provide fault tolerance but increases the system performance.
Example: 20 21 22 23
In this figure, block 0, 1, 2, 3 form a stripe.
In this level, instead of placing just one block into a disk at a time, we can work 24 25 26 27
with two or more blocks placed it into a disk before moving on to the next one. 28 29 30 31
In this above figure, there is no duplication of data. Hence, a block once lost
cannot be recovered. 32 33 34 35
Pros of RAID 0:
•In this level, throughput is increased because multiple data requests probably
not on the same disk.
•This level full utilizes the disk space and provides high performance. Disk 0 Disk 1 Disk 2 Disk 3
•It requires minimum 2 drives.
20 22 24 26
Cons of RAID 0:
•It doesn't contain any error detection mechanism. 21 23 25 27
•The RAID 0 is not a true RAID because it is not fault-tolerance. 28 30 32 34
•In this level, failure of either disk results in complete data loss in respective
29 31 33 35
array.
RAID 1
This level is called mirroring of data as it copies the data from drive
1 to drive 2. It provides 100% redundancy in case of a failure.
Example:
Only half space of the drive is used to store the data. The other half
of drive is just a mirror to the already stored data.
Pros of RAID 1:
•The main advantage of RAID 1 is fault tolerance. In this level, if
Disk 0 Disk 1 Disk 2 Disk 3
one disk fails, then the other automatically takes over.
•In this level, the array will function even if any one of the drives A A B B
fails.
C C D D
E E F F
Cons of RAID 1:
•In this level, one extra drive is required per drive for mirroring, so G G H H
the expense is higher.
RAID 2
•RAID 2 consists of bit-level striping using hamming code parity.

•In this level, each data bit in a word is recorded on a separate disk and ECC
code of data words is stored on different set disks.

•Due to its high cost and complex structure, this level is not commercially
used. This same performance can be achieved by RAID 3 at a lower cost.

Pros of RAID 2:
•This level uses one designated drive to store parity.
•It uses the hamming code for error detection.

Cons of RAID 2:
•It requires an additional drive for error detection.
RAID 3
•RAID 3 consists of byte-level striping with dedicated parity. In
this level, the parity information is stored for each disk section
and written to a dedicated parity drive.
•In case of drive failure, the parity drive is accessed, and data is
reconstructed from the remaining devices. Once the failed drive
is replaced, the missing data can be restored on the new drive. Disk 0 Disk 1 Disk 2 Disk 3
•In this level, data can be transferred in bulk. Thus high-speed
A B C P(A, B, C)
data transmission is possible.
D E F P(D, E, F)

Pros of RAID 3: G H I P(G, H, I)


•In this level, data is regenerated using parity drive. J K L P(J, K, L)
•It contains high data transfer rates.
•In this level, data is accessed in parallel.

Cons of RAID 3:
•It required an additional drive for parity.
•It gives a slow performance for operating on small sized files.
RAID 4
•RAID 4 consists of block-level stripping with a parity disk. Disk 0 Disk 1 Disk 2 Disk 3
Instead of duplicating data, the RAID 4 adopts a parity-based
approach. A B C P0
D E F P1
•This level allows recovery of at most 1 disk failure due to the G H I P2
way parity works. In this level, if more than one disk fails, then
there is no way to recover the data. J K L P3
•Level 3 and level 4 both are required at least three disks to
implement RAID.
C1 C2 C3 C4 Parity
In this figure, we can observe one disk dedicated to parity.
In this level, parity can be calculated using an XOR function. If 0 1 0 0 1
the data bits are 0,0,0,1 then the parity bits is XOR(0,1,0,0) = 1. 0 0 1 1 0
If the parity bits are 0,0,1,1 then the parity bit is XOR(0,0,1,1)= 0.
That means, even number of one results in parity 0 and an odd
number of one results in parity 1.
Suppose that in the above figure, C2 is lost due to some disk
failure. Then using the values of all the other columns and the
parity bit, we can re-compute the data bit stored in C2. This level
allows us to recover lost data.
RAID 5
•RAID 5 is a slight modification of the RAID 4 system. The only
difference is that in RAID 5, the parity rotates among the drives.
•It consists of block-level striping with DISTRIBUTED parity.
•Same as RAID 4, this level allows recovery of at most 1 disk failure. If
more than one disk fails, then there is no way for data recovery. Disk 0 Disk 1 Disk 2 Disk 3 Disk 4
This figure shows that how parity bit rotates.
This level was introduced to make the random write performance 0 1 2 3 P0
better. 5 6 7 P1 4
10 11 P2 8 9
Pros of RAID 5: 15 P3 12 13 14
•This level is cost effective and provides high performance.
•In this level, parity is distributed across the disks in an array. P4 16 17 18 19
•It is used to make the random write performance better.

Cons of RAID 5:
•In this level, disk failure recovery takes longer time as parity has to be
calculated from all available drives.
•This level cannot survive in concurrent drive failure.
RAID 6
•This level is an extension of RAID 5. It contains block-level
stripping with 2 parity bits.
•In RAID 6, you can survive 2 concurrent disk failures. Suppose
you are using RAID 5, and RAID 1. When your disks fail, you
Disk 1 Disk 2 Disk 3 Disk 4
need to replace the failed disk because if simultaneously
another disk fails then you won't be able to recover any of the A0 B0 Q0 P0
data, so in this case RAID 6 plays its part where you can
A1 Q1 P1 D1
survive two concurrent disk failures before you run out of
options. Q2 P2 C2 D2
P3 B3 C3 Q3
Pros of RAID 6:
•This level performs RAID 0 to strip data and RAID 1 to mirror.
In this level, stripping is performed before mirroring.
•In this level, drives required should be multiple of 2.

Cons of RAID 6:
•It is not utilized 100% disk capability as half is used for
mirroring.
•It contains very limited scalability.
What is database Index?
 Indexes are special lookup tables that the database search engine can use to speed
up data retrieval.
 A database index is a data structure that improves the speed of data retrieval
operations on a database table.
 An index in a database is very similar to an index in the back of a book.
 Indexes are used to retrieve data from the database very fast. The users cannot see
the indexes, they are just used to speed up searches/queries.
 Updating a table with indexes takes more time than updating a table without
(because the indexes also need an update). So, only create indexes on columns that
will be frequently searched against.
Syntax to create and drop an Index
 Syntax to create an index:
CREATE INDEX index_name
ON table_name (column1, column2, ...);

 Example to create an index :


CREATE INDEX idx_studentname
ON Student (Studentname);

 Syntax to drop an index:


DROP INDEX table_name.index_name;

 Example to drop an index :


DROP INDEX Student.idx_studentname;
What is Indexing?
 Indexing is a way to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
 It is a data structure technique which is used to quickly locate and access the
data in a database.
Structure of Index in database
 Indexes are created using a few database columns.

search-key pointer
 The first column is the search key that contains a copy of the primary key or candidate key of
the table. These values are stored in sorted order so that the corresponding data can be
accessed quickly.
 The second column is the data reference or pointer which contains a set of pointers holding
the address of the disk block where that particular key value can be found.

 The indexing has various attributes:


 Access Types: This refers to the type of access such as value based search, range access, etc.
 Access Time: It refers to the time needed to find particular data element or set of elements.
 Insertion Time: It refers to the time taken to find the appropriate space and insert a new
data.
 Deletion Time: Time taken to find an item and delete it as well as update the index
structure.
 Space Overhead: It refers to the additional space required by the index.
Indexing Methods (Types)

Index

Primary Secondary Clustering

Dense Sparse
Primary Index (Ordered Index)
 If the index is created on the primary key of the table, then it is known as primary index.
These primary keys are unique to each record.
 As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
 Student(RollNo, Name, Address, City, MobileNo)
CREATE INDEX idx_StudentRno
ON Student (RollNo);

 The primary index can be classified into two types:


 Dense index
 Sparse index

Exercise Create an Primary Index for Employee(EID, Name, Address, City).


Dense Index
 In dense index, there is an index record for every search key value Rno Name
in the database.
101 101 Raj
 This makes searching faster but requires more space to store index
records. 102 102 ram

 In this, the number of records in the index table is same as the 103 103 Suresh
number of records in the main table. 104 104 Mira
 Index records contain search key value and a pointer to the actual 105 105 Nita
record on the disk.
106 106 Om
107 107 Ajay
108 108 Amit
109 109 ravi
110 110 Nayan

Index Table Main Table


Sparse Index
 In sparse index, index records are not created for every Rno Name
search key. 101 Raj
 The index record appears only for a few items in the data 102 ram
file.
103 Suresh
 It requires less space, less maintenance overhead for
101 104 Mira
insertion, and deletions but is slower compared to the dense
index for locating records. 104 105 Nita
 To search a record in sparse index we search for a value that 107 106 Om
is less than or equal to value in index for which we are 110 107 Ajay
looking.
108 Amit
 After getting the first record, linear search is performed to 109 ravi
retrieve the desired record.
110 Nayan
 In the sparse indexing, as the size of the main table grows,
the size of index table also grows.
Index Table Main Table
Indexing Methods (Types)

Index

Primary Secondary Clustering

Dense Sparse
Secondary Index (How to find a particular record?)
 If you want to find the record of roll 112, then it will search Rno Name
the highest entry which is smaller than or equal to 112 in the 101 Raj
first level index. It will get 101 at this level.
102 ram
 Then in the second index level, again it does max (112) <=
101
112 and gets 111. Now using the address 111, it goes to the
data block and starts searching each record till it gets 112. 101 111 111 Mira
 This is how a search is performed in this method. 201 112 Nita

 Inserting, updating or deleting is also done in the same 301 201


manner. 401 211 201 Ajay
202 Amit
301
311 211 Nayan
212 dipen

Primary Secondary
Main Table
Index Index
Indexing Methods (Types)

Index

Primary Secondary Clustering

Dense Sparse
Clustering Index
 Sometimes the index is created on non-primary key Dept Name
columns which may not be unique for each record. CE Raj
 In this case, to identify the record faster, we will group two CE ram
or more columns to get the unique value and create index
out of them. This method is called a clustering index.
CE EE Mira
 The records which have similar characteristics are grouped,
and indexes are created for these group. EE EE Nita
EC
ME EC Ajay
EC Amit

ME Nayan
ME dipen

Index Main
Table Table
B-tree
 B-tree is a data structure that store data in its node in sorted order. We can represent
sample B-tree as follows.

Root Node
Intermediary Node 11

3, 6 16, 20
Leaf Node

1,2 4,5 7,10 12, 13, 14 18, 19 24, 25

 B-tree stores data in such a way that each node contains keys in ascending order.
 Each of these keys has two references to another two child nodes.
 The left side child node keys are less than the current keys and the right side child node
keys are greater than the current keys.
B-tree (How to search a particular node?)

Root Node
Intermediary Node 11

3, 6 16, 20
Leaf Node

1,2 4,5 7,10 12, 13, 14 18, 19 24, 25

 Suppose we want to search 18 in the above B tree structure.


 First, we will fetch for the intermediary node which will direct to the leaf node that can
contain a record for 18.
 So, in the intermediary node, we will find a branch between 16 and 20 nodes.
 Then at the end, we will be redirected to the fifth leaf node. Here DBMS will perform a
sequential search to find 18.
Hashing
 For a huge database, it can be almost next to impossible to search all the index values
through all its level and then reach the destination data block to retrieve the desired
data.
 Hashing is a technique to directly search the location of desired data on the disk
without using index structure.
 Data is stored in the form of data blocks whose address is generated by applying a
hash function in the memory location where these records are stored known as a data
block or data bucket.
 Hashing uses hash functions with search keys as parameters to generate the address of a
data record.
 Data bucket: Data buckets are the memory locations where the records are stored.
 Hash Function: Hash function is a mapping function that maps all the set of search keys
to actual record address. Generally, hash function uses primary key to generate the
hash index – address of the data block.
 Types of hashing methods are Static hashing and Dynamic hashing
Static hashing
 In the static hashing, the resultant data bucket address will always remain the
same.
 Therefore, if you generate an address for say Student_ID = 10 using hashing function
mod(3), the resultant bucket address will always be 1. So, you will not see any
change in the bucket address.
 Therefore, in this static hashing method, the number of data buckets in memory
always remains constant.
Dynamic hashing
 The drawback of static hashing is that that it does not expand or shrink dynamically
as the size of the database grows or shrinks.
 In dynamic hashing, data buckets grows or shrinks (added or removed dynamically) as
the records increases or decreases.
 Dynamic hashing is also known as extended hashing.
Dynamic hashing
 In dynamic hashing, the hash function is made to produce a large number of values.
 For Example, there are three data records D1, D2 and D3 .
 The hash function generates three addresses 0101, 1001 and 1010 respectively.
 This method of storing considers only part of this address – especially only first one
bit to store the data.
 So it tries to load three of them at address 0 and 1.

D1 0
D2
1
D3
Dynamic hashing
 But the problem is that no bucket address is remaining for D3.
 The bucket has to grow dynamically to accommodate D3.
 So it changes the address have 2 bits rather than 1 bit, and then it updates the
existing data to have 2 bit address.
 Then it tries to accommodate D3.

00

D1 01
D2
10
D3
11
Questions asked in TU
1. Explain indexing and different types of indexes.
2. Explain binary tree, B-tree and B+ tree.
3. Explain hashing.
 Database Management Systems (DBMS)
RAID 2 Data bits(4) Parity bits(3)
•RAID 2 consists of bit-level striping using
hamming code parity. 7 6 5 4 3 2 1
D7 D6 D5 P4 D3 P2 P1
•In this level, each data bit in a word is recorded on
a separate disk and ECC code of data words is
stored on different set disks.

•Due to its high cost and complex structure, this


level is not commercially used. This same
performance can be achieved by RAID 3 at a lower We P1  D3 D5 D7 check 1 bit skip 1 bit
cost. maintain
even P2  D3 D6 D7 check 2 bit skip 2 bit
parity P4  D5 D6 D7 check 4 bit skip 4 bit
Pros of RAID 2:
•This level uses one designated drive to store
D7 D6 D5 P4 D3 P2 P1
parity.
•It uses the hamming code for error detection. 1 0 1 1
P1  D3 D5 D7  1
Cons of RAID 2:
•It requires an additional drive for error detection. P2  D3 D6 D7 0
P4  D5 D6 D7 0
D7 D6 D5 P4 D3 P2 P1
1 0 1 0 1 0 1

We represent this way


And if any bit of data is altered then
error data can be retrieved

Transmitted bit

D7 D6 D5 P4 D3 P2 P1
1 1 1 0 1 0 1

P1D3D5D7  1111  0 no error as even parity, D3,D5,D7

P2D3D6D7  0111  1 but the p2 is 0 so there is error in D 3, D6, D7

P4D5D6D7  0111  1 so no error in these bits D5,D6,D7

D7 D6 D5 P4 D3 P2 P1
1 1 1 0 1 0 1
Correcting error

Change the parity bits to be even parity

P4 P2 P1
1 1 0

So the error bit is D6

Corrected data is

D7 D6 D5 P4 D3 P2 P1
1 0 1 0 1 0 1

You might also like