0% found this document useful (0 votes)
102 views

DBMS Unit5

This document appears to contain sample questions and answers about database storage and indexing. It includes 3 questions and answers about the properties of indexes, when a non-clustering index should be created, and the difference between indexing and hashing. The answers provide details about how indexes enhance database performance, when a non-clustering index is needed, and the key differences between indexing and hashing techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

DBMS Unit5

This document appears to contain sample questions and answers about database storage and indexing. It includes 3 questions and answers about the properties of indexes, when a non-clustering index should be created, and the difference between indexing and hashing. The answers provide details about how indexes enhance database performance, when a non-clustering index is needed, and the key differences between indexing and hashing techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

enseleeTe spAAB 0460fAC

Marketed by:

UNIT
STORAGE AND INDEXING
SIA GROUP

PART-A
SHORT QUESTIONS WITH SOLUTIONS
01. Explain the properties of indexes.
Answer
The properties of indexes are as follows,
Indexes enhance the performance level of the databases
They can retrieve the records in a particular sequence order, thereby
reducing the work load of the database manager.
2
They are capable of addressing the requirements of the application
program.
They consume less time to locate the file records.
They eliminate the need to analyse each entry during the query execution
i.e., they reduce the amount of file that is to be
5
searched.
They increase thespeed of accessing the data records.
Indexes can be added or removed by the database designers without
modifying the application logic, due to which the
is increased.
maintenance costs is decreased whenever the size of the database
They can perform binary search on variable-length file records
of a clustering index? Explain.
When must we create non-clustering index despite the advantages
Answer
an order which differs
records in a file are arranged in sequential order. But if the search key of the index defines
the unclustered index. Both clustered and unclustered indexes
L Scquental order of the files. Then such an index is known as slower, as the data
data addition and modification process
advantages and disadvantages. Clustered indexes make
Wn also have to be sorted along with that in theindex. In
these type of situations, we must create
non-clustering index.
ie
03. What
is Index and Hashing?
May-19(R16), Q1(0
Answer
Indexing

(indexes) or storing arecord


record in a file using one or more index or
refers to the process of finding a particular
inanyAIng
order
(randomly on the disk).
Indexing lb dr soied e
Hash-based boeTree-based
indexing indexing
Hashing

(entries) positions in a
Hashin function for mapping pairs to the corresponding
of a hásh table position can store
tableg 1S a technique that makes use stored at a position fK) by the hash function f. Each hash
dlly a pair
pair. with a key 'K is
P
WESTR
STUDENTS SIA GROUP
ALL-IN-ONE JOURNALFOR ENGINEERING
TU-HYDERABAD
MANAGEMENT SYSTEMS IJNTU-H TM
5.2
DATABASE torage and
Indexing C3 28A0e
UNIT-
Differentiate between primary indexing' and 'secondary indexing 5.3
4
Answer:
What is the difference between indexing and
shing?
(Model Paper-, a1(0)J
April-18(R
R16),
aT e
Hashing O1 Answer
Primary Index
Indexing Secondary Index
Itis a technique ofusing hash function for deteermining
index that contains a primary key is called a An index that does not contain
lt is a technique of using a table or a data structure to location ofrows in file, in
a which we obtaint a primary key is called
determine the location of rows in a file. thea the primary index. secondary index.
a
of the disk block containing desired record directly.
have unique
may not have any duplicates 1.e., 1t
It may have duplicates,
In indexing performance degrades as file grows. In hashing, a bad hash function result in lool
record.
time proportional to the number of search keveaking values in each
ne lookup time is proportional to log of number of| 3. | The average 100kup time of of hashing
hashing isis con
constant
and It
consumes less storage space. It consumes more storage space.
values in database relation. independent of the database size.
a is less.
The time required to Search particular record The time required to search a particular
The values are stored sequentiallyin the sorted order.4The values are scatteread randomly acrosS many bucket record is more.
block
Suitable for queries that lookup for records based on a Suitable for queries that lookup 1or records based An index entry 1s created for each An index entry is created for each
record present
rangeof key.values. Single key value. within the data file.
Q5. What is primary and secondary indexing? April-18R16), Primary index can also be referred to as sparse 6. Secondary index can also be referred as
01 to dense index.
OR
Block anchors (the first record in each data block) 1. Block anchors cannotbe used,
Discuss about primary indexes. can be used.
CReferOnly Topic: Primary Index)
The file records are physically ordered based on the 8. The file records are not physically ordered due to
OR the
primary key field. absence of the primary key field.
What is meant by secondary index?
(ReferOnly Topic: Secondary Index) i 08. Explain what are the differences between tree-based and hash-based indexes.
Answer
Primary Index
270 a p May-15(R13), Q10 Answer sis May-19(R16), a10)

Tree-based Indexing
An index that is defined based on ordering key field an ordered
field is to place file records sequentially on the
of file is called primary index. The use
disk block, where each record is guaranteed to of ordering key of indexing, the records are arranged in a tree-like structure. The data entries are sorted according to the search
In this type
have unique value for that field.
It follows the same ordering as that of the file. It is key values and are arranged in a hierarchal structure (by the hierarchical search data structure) so as to find the correct page of
same data type as that of the ordering an ordered file consisting of two fields.
The first field maintains the the data entries.
key field called primary key of the data file. The second field
block (block address). refers to pointer to a disk
Hash-based Indexing
Secondary Index
Hashing is an organization approach, wherein it is possible to find the desired records quickly, based on the search key value.
An index that is defined based on non-ordering field
than the one of the file. It like primary of the data file is called secondary index. It
index, it is also an ordered file consisting has different orderng In this type
of indexing, a group of the file records, known as BUCKET contains a primary page, along with the other
data type as that of the non-ordering field of two fields. The first field maintains the same
of the data file. The secondfield refers topointer a additional pages that are chained together. In order to determine, to which bucket a record belongs to a special function called a
Q6. When would users use a tree-based index? to disk block.
HASHFUNCTION can be applied to search key value. By providing a bucket number, the primary page for the respective bucket
Answer can be retrieved
in one or more disk 1/O operations.
vo Tree-based Indexing is used in the following ineél sro grisei Model Paper-l, a
situations, are tree-structure indexes are good for searches, especially range selections?
Data ordering is required. Why
2. Answer: May-17(R15), a10)
Search, insert and delete operations are
to be performed.
two tree data structures called ISAM and B- tree. The main focus of these tree structures on insertion and
3. Some data structure is employed for Ther 1s
searching. are
4. Data elements/records are stored at letion of data entries. For example, a file of employees record. Assume that thefle is sorted by gpa field. Ifaquery to be
leaf nodes to obtain a fast access/search. processed processed by applying8a
5 Researches are to be pêrformed in hi eludes a range selection such as "Find all employees with gpa greater than 2.5". Then it be
the multidimensional access Dinary
search for such employeesaand then scanning the file from that points onward. Such a query processing will be expensive,
methods. Such employees
Geometric shapes need to be stored. if the cost 0r
of binary search is proportional to the number of pages fetched.
Range as well as equality searches need
to be performed. original (data) file in the form of index
ave approach is to create a second file with one record per page in the
An
8. Base tables need to be accessed. Cntries
our example)
9. Automatic reorganization of the files is required. page) and then sort the file by the search key (gpa field in
An index page number of
Each key acts as separator for each
a for eacn left and right contents
of the pages printed by the pointers,
10 Index file degradation issue is to be solved. pointers acts file.
ore uses a simple one level indexing
11. Good space utilization is needed i.e., less number than the number of keys. Hence, query
of tree nodes is to be used. Because an than the page size in the data file,
there will beonly one entry in the
in the index file will be smaller
12. Search-key value is to be determined before reaching
the leaf nodes. dex file for an entry size ln
per page of the data file.
SIA GROUP
20fLook for the SIA GROUP LOGO ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS
on the TITLE COVER
before you buy
ru-HYDERABAD
[JNTU-HYDEs rage and IndexingY2
5.4 DATABASE MANAGEMENT SYSTEMS Storag F0313ASAM S AYA
UNIT-5
what is the main difference between M and B-tree 5.5
10. Differentiate between sparse and dense indices. exes?
a14.
Answer Model
Paper-, Answer
Dense Indice Q1 May-17(R15), 010)
Sparse Indice ISAM
B-tree
Index entry is available for every search
rchkey value
Index entry is available for only few of the search 1
in ISAM (Indexed Sequential Access Method) is a static 1.B-tree is a dymamic indexing structure.
key values. the file.
indexing structure. no aislts
It is applicable only when the relation is present in 2. It is applicable in either case.
Applicable for static files 2
3.
sorted order of the search key.
It is complex. It is simple.
2.

3. The leaf
pages are allocated sequenti 3.
Applicablefordynamic
files.9
The leaf pages are allocated randomly
d e
4. It consume more time. It consume less time. 4. Due to static size of
ISAM, overflow chains may Due to dynamic size of B"-trees,
overfilow chains
rarely occur. may frequently occur.
.It locate the records indirectly. It locates the records directly.
Only leaf pages can be modified. 5. Leaf as well as indexlevel pages can
Q11.Differentiate between clustered and unclustered indices. be modified.
6. Scanning is done more efficiently Scanning is done less effñiciently.
Answer
Insertions lead to long over flow chains. bo 7. Insertions are handled elegantly without overflow
Clustered Indice Unclustered Indice
chains.
Afile organization where the order of data records is
similar to the order of data entries in some index IS
A file organization where the order of
data record is
not similar to the order of data entries in some
The number of nodes to be examined is equal to . The umber of nodes to be examined is equal to the
called clutered index index B-trees plus the number of overflow pages. height of tree.
is called unclustered index.
Therecan be only one clustered index a on data
The performance ofISAM is less eficient. 9 The perfomance of B-trees is more efficient.
file.2. There can be several unclustered indexes on a data file.
The rid in qualifying data entries point to a contiguous
The rid in qualifying data entries point to a distinct
10 Locking over head of ISAM is less 10. Locking overhead of B"-trees is more.
contiguous collection of record which lead to retrieve
data pages which lead to retrieve several pages. How to compute the disk
only few pages. Q15. access time?
The data addition and modification process is Answer: May-15(R13), Q10)
slower. 4. The data addition and modification process is
not
slower as cluster index. Disk access time is computed by using the following equation.
Clustered index is relatively expensive
to maintain 5. Unclustered index is cost effective. Access time = Seek time +Latency+ Transfer timeiledi
when the file is updated.
s besteeb 1
or)(or)
a12. What is the relationship between files
and indexes? oebat
Access time =
Disk controller processing time t+Rotational delay +Transfer time.
Answer
01 Where,
Model Paper-|,
A file is a collection or sequence of records
that can be built and destroyed
that form a disk based data structure. Allocation whereas, an index is list of keys or keywo time (or) disk controller processing time refers to the amount of time required by à hard disk controller to search a
of an index onto a file is done
ofrecords that satisfy search conditions on thesearchkey fields ofthe index. in order to speed up the searching and reui
eekparticular block of stored data.
Q13. What are the advantages Kotational delay refers to the time taken to position the proper sector under read/write head.
and disadvantages of B-trees?
Answer olobeiiulda ranster time refers to the time taken for completion of data transfer
Nov./Dec.-18(R16), Q10
a bicl a
TA
Advantages of B"-trees
The advantages of B-trees are as
1
tollows,
It offers a moderate performance for
direct access.
r st iaint2 b1
2. It offers fast searching.
3. It offers an exceptional performance
for range and sequential accesses.
Disadvantages of B-trees
The disadvantages ofB"-trees are as follows,
1 It leads to memory wastage when duplicate key
values are searched.
2 It has complicated insertion process.
3. t has complicated deletion process.
SPECTRUM
Look for the SIA GROUP LOGO HLL-IN-ONE JOURNAL FOR ENGINEERING STODENTS
ALL-IN SIA GROUP
2on the TITLE COVER before you buy
SYSTEMS TU-HYDERABAD
IJNTU-HV. Index TelkaoKAS
DATABASE MANAGEMENT Storage and
dexing
5.6 UNIT-5
Magnetic Disks There are different type
5.7
PART-B pesof
Magnetic disks can be
classified into two respect to their storage hard disk that difer with
capacity. Haid disk
ESSAY QUESTIONs WITH
SOLUTIONs Disk from several hundreds holds data ranging
of megabytes to several
()Floppy disk which is circular,, speed of hard disk is measured gigabytes. The
in terms of access time.
STORAGE (FlopPy ylar plastic. This disk is coated with ferric
5.1 DATA ON EXTERNAL of mylar
made Storage Organization of Hard Disk
inely. Give the speed
iece
ut peis enclosed in a plastic cover tnat act as a protective Information is stored in
Q16. List the physical storage media available on the computers you use routinely, Give speed with
wis
which
oxidea
Floppy disk
are readand written by a floppy disk drive
using a circular, flat platters.
the form of magnetic
patterns by
data can be accessed on each medium. hield. which is responsible for carrying out all the These platters are made either
ghielrive of
tthe
drive glass or metals coated with a special
Answer Model Paper-4, is operations like, rotating the disk, reading data O information is stored in tracks. These
material on both sides. The
Q10 data onto the disk. tracks areinturn divided
Physical Storage Media disk and Writingwere into small sectors. The rotation
the basic
basically used in personal computerS of platters is done by a spindle
The different physical storage media that are used in out daily life are, Floppy disk were transfer ad smotor motor ithat is connected to the spindle
mounted onthe platters
oas to perform software distribution, data
1. Magnetic disks Read/write operation is carried
mall backups.
out by using a special
2.Optical disks. of electromagnetic read/write devices present
types ot floppy disks that were used, on the sliders, which
There were two is mounted onto a actuator arms. These arms
Magnetic Disks This floppy disk has very limited storage are connected into a
Magnetic disks are the most commonly used secondary storage medium. The advantages of these disk over the ma
5einch
snace and the rate of data transter is very slow, when
single assembly and placed on the disk surface. Tbhe
positioning
e magnetic of the arms is done by a device called actuator.
tapes are compared to the other disk.
Write operation is carried out in the same way
(a) They provide high storage capacity. 3-inch- To overcome the problems faced while
using 5Ve-inch floppy disk. 3V-inch floppy disk were the floppy disk. The disk surface consists of array of magnetized
as done in
() They are much reliable
developed. This floppy disk have 1.44 MB of storage and demagnetized dots. Binary I is used to represent magnetized
()They have the ability to directly access the stored data. dot and Binary 0 is used to represent demagnetized dot. In order
space and the data transfer rate is high.
Sorage to read data from disk, Virtual File Allocation Table (VFA
Manetic disk comprises circular plate madé of either plasticor metal. This plate is coated with magnetic oxide Organization of Floppy Disk
in these disk are stored either on the magnetized or demagnetized layer. The lave. Data or File Allocation Table (FAT) is initially read into Windows
bit-value ot Is represented on a magnetized
spot and For performing read/write operation a floppy disk
operating system, when the hard disk is being partitioned.
value of0 is represented on demagnetized spot. In order to carry out the
read operation, the data present on the magnetized surface needsto be inserted into the floppy disk drive. When the disk
is converted into electrical impulses which are then send to the processor for s inserted, Reading of VFAT, FAT helps the operating system in
execution. On the holds it and rotate it inside a plastic jacket. A
be carried out by converting those electrical impulses into magnetic spots. Magnetic disk is other hand, the write operation can system drive
in order to protect it from dust or other interferences.
engraved inside a protective shield of levers are pressed whenever a disk is inserted. One knowing the sector and the track in which data is stores. Using
lever is responsible for opening the metal plate while the other this information read/write head can perform the read operation.
Storage Organization of Magnetic Disk levers and gears are responsible for moving two read/write The entire data stored on the disk is read sequentially
Data is organized on a normal disk by dividing the surface of heads. The heads move until they come in contact with the disk In hard disk, read'write head does not come in contact
disk into various regions. The disk surface is fragmented into
many imaginary tracks and sectors. Tracks are On either sides. Signals
are received by the circuit board. These with the disk surface (i.e., floats slightly off the surface). The
the concentric circles along which the data is stored. Sectors
that are accessible by the read/wnite heads of any disk drive. are fixed-sized areas sgnals include data as well as instructions needed to perform distance between the read/write head and the disk surface is
Track sector refers to an area where both track and sector intersect
Every sector possess certain identification information
at its initiation referred to as sector header. In order to access data from
eadwrite operation. Ifthe signal signifies write operation, then equal to the thickness of a human hair. In case, if both read
a disk, proper location path has to be specified. This path tis the responsibility of circuit board to initially check that the write head comes in contact with disksurface then a head crash
is a combination of surface, track and sector number respectively. Ihe
capacity of each sector can exceed 512 bytes of data. For ghtis invisible in the floppy disk drive. Ifa light is detected by is caused. In order to reduce the chances of a head crash, many
ensuring that data stored in the disk is free from errors, the secios a
maintain a provision for error detection and error correction.
The
he photo sensor present on
the opposite side of floppy dísk, then disk controllers place the readiwrite head onto track which is
sectors on the disk surface does not exist in
they are separated by suitable gaps called as inter-sector succession ralner to know that disk is write-protected because.of unused
gap. These gaps are useful in locating desired
sector on a given track.
Comes
Whichdata cannot
be recorded.
Accessing Data from Magnetic Disk Data Access Speed
Data is stored on the circular track by using Amotor which is present below the disk rotates the The access time a had disk is 12-19 ms.
multiple read/write heads. These heads This shaft comes in contact with a notch on disk nud tnat
simultaneously. The read/write head is mounted are capable of accessing the auja traca
on the access am assembly, that can be
placed in both inwards and ourwaru Ses the disk to rotate. Magnetic field is created by electnical 2 Optical Disks
Following are the steps performed while accessing
data from magnetic disk. Ses whenever the read/write head is positioned correctiy. Optical disk is a form of external storage device, which
Seek-In this step, the read/write head is placed on the desired ingnetic fields is created in any one of the read/write head is most widely used today for.storing large volume of data sucn
track. The time taken positioning te heads Order to write of optical disk
from one track to a specific track on the disk is in the read/wi data either on top or bottom of tne sk as multimedia data (audio, video). The advantage
referred to as seek time. The seek lisecond. Read can be stored in very less space.
Latency-In this step, the read/write head associated with time may range from 6 to l m can be carried out when the electrical is that, massive volume of data
the desired platter is activated.
impulne peration many
wait until the required sector comes under, is The time taken by
ead 1o
ransterred to the computer from the respective There are optical disks available in the market that diner
refered as latency time or rotational delay nei
nagnetic storage capacities. As the storage capaciy ot
from 4.2 to 6.7 milliseconds. meranges Particle me magnetic field is produced by a metallic in their sizes and
time. The laten
cle nis
present on the optical disk is large, the cost of storing a single bit
is very low
Data Transfer Data disk
Access Speed device is the Compact
One of the popular optical storage
In this step, the read/write head moves the
data from the disk to the primary memory. data from
The ac
time of a floppy disk drive is 12-25 ms. Disk Read Only Memory
a
(CD-ROM), which is round, tat-
the disk or writing data onto the disk iS referred to as The time taken to b)HardDisk This disk is coated with a
malenal on
data transfer rate. Data transfer second
seco piece of plastic disk.
rate is measured in kilobits p highly retlective areas. The
Access time is the combination of all the three times Hard di which data is wnitten in a form of a
i.e., syster
Is the primary storage unit of the commputer stored data may be read trom
less reflective areas by using
Access time = Seek time + Latency time + Data transfer
rate ComprisessO known as hard drive or fixed disk. This disk diode. In order to read or write data
trom and t the cD
Memory access time can be delined as the time taken Of disk platters. These platters are made oflaser CD-ROM drive is required. The drawback
to transfer a single character from memory to the processor or frol inium It also ROM, a drive called
processor to memory. On the other hand, disk access time ory Consist
ave a magnetie material coating. not allow a usèr to write data into
is the time taken to position
to prusired
desire d h of prot being of CD-ROM is that, it does
available on the disk.-The access time of RAM is 80 the read/write head over i ed. yer, that protect the disk from it. It only allows to read data
written by manufacturer.
nanoseconds and that of a hard
disk is 12 to 19 milliseconds SPECTRU SIA GROUP
Look for the SlA GROUP LOGo
on the TITLE CovER before you buy NLIN-ONE JOURNAL FOR ENGINEERING
STuDENTS
NTU-HYDERABAD
DATABASE MANAGEMENT SYSTEMS JNTU-HYn Indexir
exing
5.8
UNIT-5
Storage and
Following two reels are used for operatine Systems
Therefore, another version, WORM (Write Once Read themaghe Disk Space through OS File data that is read 5.9
any) 1s used for storing archival data. This disk allows the tape, MaB ing nerating system can also manage the disk words, the unit from or written
to disk (in other
space. of I/O) is considered
data to be written only once but can be read several times. Ihe (a) Supply reel sequence of bytes. The operating system or 8 kB. The cost page a page whose s
files as of 1/O i.e., the cost pages size is4 kB
data written can neither be erased nor overwritten. wORM nas Take-up reel. quests such as Read byte b. of hief into of written from
(b) to main memory and read f
longer life-span when compared to other devices. sates i ruction: "Read block of rack t of cylinderitsaSK from main memory
Themovement of the tape is done from evel c usually high than the cost of typical to disk is
Data Access Speed supply insules can also be used to built the database disk cost can be reduced database operations.
This
the take-up reel. The side of the magnetic tape ifa optimized databases
The access time of optical disk is 8-12 ms. with magnetic oxide is passed to the read/write hesCo
teel
whi er
of disk For example, the entire data is stored in one or 1se system is built.

Q17. Explain about tertiary storage media in detail. Whenever the tape comes beneath the read/write
head, en s
p fles and the block are allocated and initialize (by OS)
these fil Now,
this is the job of the disk space manager to
5.2
FILE ORGANIZATION
CLUSTER INDEXES,
AND INDEXING,
Answer read/write operation can be performned. e for
anage the space these OS files.
in SECONDARY
PRIMARY AND
INDEXES
Tertiary Storage Media ii) Magnetic Disks
Bu
nv database systems do not relay on
the OS file Z0. LIst several ways of organizingrecords a
lertiary storage is a type of storage media which lies at For answer refer Unit-V, Q16, Topic: Magnetic and perform their own way or disk management, One of Explain sequential file organization. in file.
the bottom level of the storage device hierarchy, organized Disk sons is that the DBMS may want to access a single file
inQ18. Write a detailed note on disk space managen
accordance to the speed and cost of devices. Followingg are the hnse size is greater than the maximum size of a file which is OR
nent
two devices of tertiary storage media, system. State and explain various file organisation
Answer 3 ApriuMay-12,
Set-4,Q8 GBonthe 32-bit
) Optical disk Disk Space Management 019. How the data is stored external storage?
in methods. Give suitable examples to
each them.
(i) Magnetic tapes The disk space manager manages space (Model Paper-1l, Q10(@) | April-11, Set-1, a2(a)Answer
on disk. The Answer Nov/Dec.-18R16), Q10
(ii) Magnetic disks. space manager uses a page as a unit of data, which
commands to allocate or deallocate a page provides
di Database consists of large volumes of data that cannotOrganizing Records in File
hestored into the main memory. Such persistent data is stored
a
Optical Disk and read or write File organizationisamechanism of physically arringing
page. The page size is cqual to the size of a disk DBMS on some external storage devices like diskS ks and
and
block and the or organizing the records of a file onto a secondary storage
For answer refer Unit-V, Q16, Topic: Optical Disks. pages are stored as disk blocks which
requires one disk inpul uupes. Disks provide random access of data and tapes provide se-
devices such as magnetic disk, tapes or CD-ROM. Some
(i) output to perform the reading or writing of a page. quential access of data. The cost of accessing the data randomly ile of the
Magnetic Tapes organizations supported by DBMS inchude,
more than accessing the data sequentially. The data stored
Magnetic tapes are plastic tapes that have a magnetic The sequence.of pages are stored as contiguous block is in
the disks is usually in the form of files. These files consist I. Sequential file organization
coating around it. In such tapes, data is stored in the form to hold the data that is frequently accessed in sequential order of records that
have a unique identifier known as a record id or 2.
of smal portion of magnetized and demagnetized layer. Heap file organization
The This is advantageous for sequentially accessing disk block id This identifier is used to determine the address of the page
magnetized portion signifies the bit value as 1 whereas the This capability must also be provided to the higher layers 3. Hash (or) direct file organization
demagnetized portion signifies the bit value as 0. There are the DBMS by the disk space manager.
ofie, the record in which the page is stored.
different types of magnetic tapes, each of which differ in their Consider, a database that consists of I million pages. To 4 Indexed sequential file organization.
sizes and their speed (with which the tape moves the read/write The disk space manager hides all the underlying CKecute certain query, the entire database need to
be scanned. 1. Sequential File Organization
hardware details and make the higher layers to think of data as Ifthe main memory contain only 1000
head). Magnetic tapes also differ with respect pages to hold the data
to recording collection of pages. In sequential file organization, records are stored in
density that specifies the amount of data that can
be stored on t
then becomes impossible to bring all the
data into th
a linear inch of tape. Handling of Free Blocks
nemory one at
a time. Thus, whenever required, the DBMS aparticular sequence (i.e. ascending or descending) based
a
tring data into the memory
for processing. But, if there is no on the Search key values. Basically, a search key is neither
The advantage of magnetic tapes is that they The disk space manager keeps track of the space on le pace in the main memory, then some existing page key nor a superkey instead it is a set of attributes or a
durable. The magnetic tapes can be erased and even
are very from the primary are
reused aisK. Ihe database may grow or shrink when the insertionu uCmory must be replaced by the new page by adopting certain single attribute. In this type of organization all the records
many number of times. Magnetic tapes are very (consecutively) stored onto a physical storage
much reliabledeletion operations are performed on it. To manage the ds policy. In this way the DBMS can bring data into contiguouslyoperations
and are inexpensive when compared to other secondary storage pace,the disk space manager will keep track of used disko hemain memory
for processing device. 1he that can be performed on sequental file
devices. as well as which pages are on which disk blocks. The deletin records are
TheDBMS components that read and write data irom
Magnetic tapes are sequential in nature and operation on the disk may create "holes. main memory
are, (a) Search
cannot
erform random access. The data is transmitted at very
Thisoperation is performed so to locate a particular
I. Buffer Manager as
slow here are two ways to determine block usage,
speed incomparable to the magnetic disks. on
1. Using a list of free blocks 1s asoftware layer whose major responsibility is t record by using a binary search technique the search
Storage Organization of Magnetic Tapes
Magnetic tapes are fragmented into vertical 2 Using bitmap. feto
eu g rom the main memory whenever, it receives a
m the files and access methods layer. The pages are (b)
key value.

referred as frames and horizontal rows referred as


columns
Using a List of Free Blocks ied based on the pages rid, which is associated with them. Insert
1s necessar
tracks. The In order to perform the insert operation, il
data is organized in the form of column string with one requested page is not found in the main memoy space in the file block.
data per In this method, whenever the Blocks are de-allocad c to ensure whether there is a free
frame.Frames are in turn fragmented into rows ortracks. One
frame can store one byte of data and individual track
can store
are added to the list for the future reference The
1 stored in known location on the disk which
po s Outfer
Disk
manager fetches it from the disK.
Space Manager
Thecan be done by perforninga
to insert operation. It there is
search operation prioO
a free space, the new record
otherwise, an overilow
a single bit. The rest of the track is treated as a parity Ccan be inserted in sequential order
track the first block on the free list. be disk ne Ware layer that is used to allocate or deallocate sequcutially.
When write operation is to be performed, number wnen new records in a file are to be written on the block is created and the record is inserted
of 1's 2. Using Bitmap sk or a
in the byte is counted. Later parity bit is appended in OCk.This bit Tesp e on the disk is not in use. The disk space mana
(c) Delete
order to pages
make the even parity or odd parity. In even parity the number Bitmap maintains one bit for each disk block is freeo Keep the information regarding all the In order to perform the delete
operation, it is necessary
l's are even and in odd parity the number of1's are odd, When
of will help into determiningal"hetherth cofblockson hetheprocessed by the file and access methods layer record to be deleted exists in the
to ensure whether the
read operation is performed, the parity bit is checked to know
and
not, which identifies and allocates com the
page has
wh beenpr processed, the coresponding
disk space
file. If the desired record
exists then it is deleted from
if there is any loss of bit. disk very fast. This is very difficult to besfree
Hored which can be used again when a new page 1s the file block.
the linked list. wECTR disk.
ENGINEERING STUDENTS
SIAGROUP
Look for the SIA GROUP LOGO
Yon the TITLE COVER before you Dy iN-ONE JouRNAL FOR
NTU-HYDERABAD
SYSTEMS (JNTu-
DATABASE MANAGEMENT E AMAR 3e
5.10 egStorage and Indexing tA
Sequential file organization supports the concepts o
a
The efficiency,of sequential file organization
proceso ation
UNIT
5.11
dependent on the type o query being i Emp_ID NAME
ponters using which diferent records are linked together. The Date of Joining
advantage of using pointers is that it allows the Simple query Is Deng proceca tor specific.record
user to perforn tast a
at
isi
101 ABC
retrieval of records based on the search key order. Every record
searched tSing a record key, then such query e Cs the
t
e 102
15-10-2005
the address of the fileto be searched sequentially until the desired ecurd PQR 17-12-2011
a ie 1s assigned with a pointer that contains
retrieval requires aticast half of the file islocatel 103
XYZ
next sequential record in the file. This sort of file organization This
CThapies to perfom read operation on the records on certain sortedthereby resuling
in inemcIent and time consumin.
sumingto
t

be ses
Table: Employees Records
12-1-2016
processing File
hand, il a batch query is beino
order. For example, the file structure generated after sequentially On the other inserted into the above ile, a new page
eeords are insertedi
more records
arranging the "student" records using pointer as follows, multiple record then such query can be processed withd Iftwo 1s created and the last record
pass. This is done by initially sorting all the recorde single is stored in the second page
as follows, age
21 11B.cA OU search key values. This retrieval improves the efficiency of Emp_ID NAME
22 13 M.CA OU as well as reduces the cost of processing. leve
doining
101 ABC
24 15 B. Tech JNTU Advantages of Sequential File Organization 15-10-2005
28 11 BCA OU Thetimeconsumed for retrieving the record 102 PQR
) Pages-2 17-12-2011
uentialy 103
30 19 M. Tech JNTU based on search key is very less. XYZ 12-1-2016
2 1 BcA OU (i) It isvery casy to access the next sequential record
usin
Sing
104 MNO 6-7-2017
pointers.
Figure: Sequential File Organization for Student Records (in) It has the ability of creating automatic backup copies Pages-2 105 STU 21-5-2019
ot
In the above structure, the pointer in each record points
to the address of the next sequential record.
the1file.
Disadvantages of Sequential file Organization Table: Employees Records File After Inserting New
Sequential file organization even reduces the block accesS, Records
) The time taken to search a specific record in a large fl Hash or Direct File Organization
while perioming sequential file processing by storing the records is very high.
physically onto the storage devices based on the search Hash file organization organizes the file records in a random order based on the hash function
key The entire file needs to be scanned white perfoming which is computed for every
order. However, it is diffñcult to maintain the physical sequential (i) search key value. The operations that can be perlormed on hash file records are,
order whenever insertions and deletions are performed. This is multiple key retrieval.
0) Search
because of the high cost incurred in moving several records when (11) Newfile needs to be created while
preforming insertion This operation is performed so as to locate a particular record by
a single insertion or deletion is perfomed. Therefore, the concept updates. computing the hash function or the search key value.
b) Insert
of pointer chains can be used while performing the deletion (V) Insertion and deletion are expensive as the recordsneed
operation. On the other hand, the following two rules must be to remain in a physical sequential order. This operation is performed by initially searching the
bucket in which the record is inserted. This is done by computing
applied while performming the insertion operation, 2. the hash function. Once the bucket with the required space
is found, the new record can inserted. However, there no
Heap File Organization if is
Search for the record, which is placed before the record enough space in the file block then a new overflow block is created and is chained with
Heap file organization organizes the respective file block.
records in a
that is to be inserted. This search must be performed random order i.e., the records are stored inthethehleorder in which
(c) Delete
using search key order.
they are created. The operations that can be performed on heap This operation is performed by initially searching the bucket in which the record is present. This
is done by computing the
) Check whether there exists any free space for inserting records are,
hash function. Once the bucket with the desired record is found, the respective record
can be deleted from the file block,
record, If there is suffñcient space, then insert the record.anie(a)
thereby creating a free space that can be used for ihserting another record.
Search
Otherwise insert the respective record in an overfiow adition to these file organizations. there is another file organization referred to as "Multitable clustering fleorganizatio
block. All the record in the overfilow block are This operation is performed so as to locate a particular wnerein the interrelated records of different relations are stored in separate file.
linked
together using pointers. record by using a linear search technique on the search Example
Example key value.
(b) Consider employees table as an example,
If a new record is to be inserted after record2, then Insert
overflow block is used because the file doesn't an
contain any free ln order to perform the insert operation, it is necessui Name E_SAL
pace. After appending
the overflow block, record 2, now points to ensure whether there is a free space in the fileDio ID
to the address of new record, which in turn points
to the address This can be done by performing search operation p
to insert operation. If there is a free space then tne u E_29 Jack 2000
of its next sequential record 1.e., record3.
record can be inserted.
21 11 BCA OU
C) Delete
E_36 Donakd 4000
22 13 M.CA O.U
ln order to perform the delete operation, it is neces E_45 John 9000
24 15 B.Tech JNTU
to ensure whether the record to be deleted
the
28 11 B.CA O.U from
hle. If the desired record exists, then it is delc E 85 Mickey 4500
30 19 M.Toch JNTU the file block.
32 11 B.CA 0.U Example E55 Rhino 5000
puge of
Consider Employer records file carrying one page 3500
d the E_92 Smike
data. If the capacity of a page is 4 recor
27 175 0u rd makes
carries 3 records, then inserted one more rec
Sume that blocks. Each block contains multiple records.
Figure: Sequential File Organization with an Overfiow Block the page ful. So, additional insertions requ at the file information is stored in different
to be inserted into the file. record.
ample, consider salary attribute as desired
ETRUM SIA GROUP
Look for the Sla GROUP LOGO
on the TITLE COVER before you Duy
ALL-AN-ONE
NLL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS
JNTU-HYDERABAL
MANAGEMENT SYSTEMS (JNTU.HV
5.12 DATABASE
Storage andindexing doA
UNIT-5
Organization
Bucket 0 fadexed Sequential File 5.13
E_SAL 9000 sequential file organization 15 an organization
d that enables the user
4500 organizatio rords are stored in physical storage to access the records
directly. In this
both sequentially
mary key is used to order the records that are
key
devices like magnetic
disks along withassociated and
Generally, primary stored on the disk. primary
Bucket 1 E 29 Jack 2000 Com Drimary key, associated index forevery records
part is being stored
records directly and inthe disk. This indexresponds
accessing ne
the records and as well as sequentsally. based onuser
E36 Donakd 4000 ries, by access. Among these two accesses,
direct accessis most preferable
sequental
than
ucka E45 John 9000
ETampie
2000 E_85 Mickey 4500 Track Index for Cylinder
500 1

Track HighestKey
3500 E 55
Rhino 5000
Bucket 3
E 92 Smike 3500

4000 Track Index for Cylinder


2
Track HighestKey
Cylinder Index
Bucket 5
Cylinder Highestkey
38 Track Key Record Key
Kecard

ack
Track Index for Cylinder 3
244 ec4446Fec46
Track Hihest Key

Figure
Inthisfigure, there are several buckets ranging from 0 6,
record is accessed by using fae hash funcdion (E SALJMODD6.to The records available in the table are 6. Therefore, the desid
f the desired recondhas salary 2500, then the range query (2500
Initially, the record value (2500) underyoes E Sal <4000) of that record is to be executed on thet
modulo division with 6.
Track Index for Cylider 4
Track Highest Key

The remainder is 4, Therefore, the 1ecord


which is to be determined
Consider another second 2000 and is avaílable at bucket 4, Figure: Indexed Sequential File Organization
modulo divide ít by 6.
o tnk ordsthat are available in the disk are arranged sequentially, Inthisexample, thedesired recondissearched in the
h nder-2recordssequentiallyby proceeding from starting record to ending records.
yValte
value in th sequential access, records are accessed directly. In direct access, desired record is compared with the highest
2 in
Based
the
cylinder index.
whest on the matched with specífic
key the records are searched in specific cylinder. If desired key value is
al Comparison,
n the track index, then the key value is searched in that track and the associated record available in that track
ed
ore
Tesult. n a key 46 is to be searched, initially 46 is compared with 2.highest key values in the cylinder index.
Depending
The semainder is 2, Therelote, Ihe COTenponding next a
record wiI be at bucket
2, Carching process is performed on track index of cylinder The
This process is applicable for every If found, then that particular track is accessed.
1end in the employees table, is compared with highest key values of track index.
wecTRUIM 4,46
46 is record that holds highest key value "46' is accessed.
track, so, the corresponding
Look tor the SlA GROUP LOGO Ond SIA GROUP
on the TITLE COVER before you Du H-ONE JoURHAL FOR ENGINEERING STUDERIS
5.14 DATABASE MANAGEMENT
SYSTEMS TU-HYDERABAD
[JNTL.L
Storage and Indexing
OA AT
UNIT-5
21 11B.C.A Overhead 5.15
Advantages of Indexed Sequential File d0 0U 2
Access
the block is greater than the hxed s1ze, thenthe records of the
lt is possible to access the records in both sequentialil Record 22 13 M.CA
and directly
O.U
the naurred since more than one block must block need to be stored
sto in imultiple blocks.
Record 2 24 15 B.Tech which, a be accessed while Due
JNTUu pertorming read or
Accessing speed is more to records. write operations on
Disadvantages of Indexed Sequential File Record 3 28 11 B.C.A O.L these
ercome these overhead issues, It is necessary
order to ov
1. t is difñicult to maintain and search indexes Recore 4 30 19| M.Tech JNTU
ta
e existing block cture,
"File header is a header which iS
to introduce additional structures
placed at the beginning
likefile hea
header and free
within d to
related the respective file such as the address of the first list
of every file. This header consists
It occupies more storage space. Record 5 32 11 B.C.A information deleted record. This record
O.U of del deleted record which in turn stores the address of next deleted record then stores the addres
Q21. Explain about fixed-length file organization with nd linked list called
Cond led the "free list". The stored addresses can and so on. This way
of chaining the deleted records
Figure (1: File with Student Record crealesa be considered as pointers
an example. record. as they point to
next deleted
the address ofthe
Answer Record 0 2r 11 B.C.A O.U wishes to perform an insertion operation
Fixed-length File Organizntion u they must initially access
Record 1
22 13 M.CA O.U RCord 1s nerted
within available space. After insertion, the record pointedby thefile header. The new
Fixed-length file organization is a way of arranging hxed- Record 224 ble deleted record. lf no space is available tortheinserting
pointer address of the header
is updatedsuch thatit now pointsW

length records within a file. Basically, fixed length records are the 15| B.Tech JNTU D Aleted records are utilized) hen a record (1.e., if no recorders are
deleted or ifthe spaces
in such situation, the new
Record 430| ofall d record is inserted at the end of the file.
records that have same fixed number of bytes, same of number of 19 M.Tech JNTU
fields. In these type of records, the record slots are unifom and Record
are organized in a sequential manner within the file.
The primary
532 11 B.C.A O.Uu header
advantageoffixed-length records is that, it is very easy to pertorm Record 0 21 11 B.C.A O.U
Figure (2: File Structure Generated
insertion and deletion. This is because the space created after Deleting Record 3
by deleting In the above example, the deletion of record Record 1

a record is equal to the space required for storing the new 3, resuled
record in moving all the records following
However, these sort of records waste lot memory
of record3 Record 2 24 15 B.Tech JNTU
if the default 4, record 5) such that each of the record is i.e,
reconi
record sze set is greater than the actual Record 3
size ot record. and inserted into the space that was
moved ahead
Example occupied by its
corresponding previous record Record 4 30 19 M.Tech JNTU
Consider a file of Student" records for college
database. However, this way of reusing the Record5 32 11 B.C.AOU
Every record in the file is defined in the space is highly
following manner, inefficient because it requires more
number records
pe course = record
tobe relocated. Therefore, another approach of is to move obeiiertFigure (4;: File
Structure Generated Using Free List After Deleting Records 1,3
stu_id number (10,2) the last record of the file and insert it within the space
In the above file structure, file header points to the address of the first deleted
c id char(10); formerly occupied by deleted record.
Dtheaddress of second record (i.e, record 1), which in turm points
deleted record (i.e., records) thereby creating a free
name char (20); Record list of deleted records.
021 11 B.C.A
O.U 022. Explain
about variable-length file organization with an example.
unversity_name char (30); Record 1 22 13
M.CA O.U Answer
end Record 2 24 15 B.Tech JNTU ariable-length
For the above record, let us assume File Organization
occupied by numeric and character that the space Record 5 32
data types is lbyte and 8
11 B.C.A O.U length file organization is a way of arranging variable length records within a file. Basically, vanable
byte respectively. This implies that cords
bytes. In order to store all the
a student record is of size
68
student records in a file, a simple
Record 4 30 19 M.Tech JNTU a
e the records of multiple sizes. In
length
contrast to fixed-length records, variable length records incur some overhead wnile
approach is to reserve 68 bytes Figure (3: File Structure Generated by
aihe SCruon and deletion operation. This is because of the difference between the space created after deleting the record
for every ndividual record Moving the Last Record e equired for inserting the record. While inserting a variable length record, it is possible that entire space may be lef
reserve 68 bytes for studentI (i.e.,
record, 68 for student 2 record after Deleting Record 3
so on). Despite of being easily
implemented, this encounters
and 0rdS1ze of new
is less than availablerecord is greater than the available space) or the space may be partially filled (since the suze ot nEW
the following problems: In the above structure,
the last record i.e., record space).
. Deletion Overhead moved and inserted into the space formerly
by the deleted record i.e., record 3.
occup ari
le owever.length file organization is used for organizing the databases that storedata whose size S greater u
Ifa record is to be deleted from the fixed RSuch reet ational databases impose a restriction on the size of the record such that it is less than or equal to the block
then it is necessary for a user to block structure, Owever, Such way of moving the record req iable le on helpsmayin be simplifying the buffer management and free space managements. There are diflerent ways m winich
nitially (prior to deletion) ensure additional block accesses as insertions are hle stored in the database system. In general, variable length records include a recerd consistang ol,
whether it is possible, record
performed than deletion. Therefore, it is preferab Multiple records
To Reuse the Space of Deleted type in a file
other Records Record for Storing eave the space created after deleting a record as u
space and to wait for subsequent insertion pra0 9, Record pes
in which it is possible to define. Variable length fields
Ifit is possible,thenspace utilization the space. pes
types in whicn
the record which is following is done by moving Slotted- which it is possible to reuse the same field multiple times.
the deleted record To use Marker on Deleted Recora length records. This structure is basically used
inserting it within the space and
icture is a technique aznployed for implementing variable
utilization is done
by moving Though, it is possible to usea marker on deleted
recor iine stor header which is placed at the beginning of
every individual block
the record which is following
the deleted record and it der* slores thesin a block. Slotted-page consists ofa
inserting it within the space created
after
so as to ignore those records while processfbe information regarding.
all the remaining records of the fhiles are deletion. Next. not considered as an effective approach oengce i
E
the similar fashion. moved ahead in
difficulty incurred in searching for avala ndoffree space ber
of record entries present in the header
order to perform new insertions. Array within the block
Look for the SIA GROUP LOGO that C the location and size of the
record
TRUM ontains entries specifying
T on the TITLE COVER before you buy
JOURNAL FOR ENGINEERING STUDENTS
SIA GROUP
-ONE
NTU-HYDERABA
SYSTEMS IJNTL
DATABASE MANAGEMENT Indexing
5.16 INIT-6
Storageand RA
wolain about byte-string
Explain representation in de 5.17
Records a23.
Block header nswer
tring representation is a technique used 1or implementing variable-length
Size entries
Location
Free space
-RR A
mbol, L,
which signifies end-of-record is attachedto
n the form of sequential bytess
end of every record.
string. The disadvantages
records. In this representatio
Once the symbol special
of using this technique are, has been added, the records can
e stor w difficult to reuse the space of recently deleted
record.
3 e provided to the variable-length records
increased
increased must be moved
if their size increases. In such situation,
moved. Moving tne records the records whose
Figure (1: Slotted Page Structure
ize has been irom their actual location
ecially when these records are pinned. leads to high maintenance
cOst
manner such that the first entry in t ome these issues, another version of byte-string
The allocation of records within the block is done in a sequential Thus, to ov
starting from the end of bloct Ock he fsing 'L' symbol, a header is stored
1nstead ofusing
representation called "slotted-page
Speciies the size and location of last record in the block (i.e. the allocation done
is structure"isused. In this
llocat representatron, at the beginning of every record. This header
wh
between the last entry in the block header and the first record.ennever
allrecords of a file, the free space left is placed er a recoaboul,
storestheinformation
sto De inserted, aofspace Cord an ent
is allocated for a new record at the end of free space. After inserting the record ( Total number ot record entries present in the header
size and location respective record is appended to the header On the other hand, a
if record deleted from
is the h
the space occupied by that record is freed and the corresponding entry in the header is marked as deleted. Then the c) Position where the free space in the block ends
array
the records placed before the deleted records within the block are changed i.e., the record before the deleted record ooccupies i) Location and size of record, which is stored in an
space freed by the deleted record. Next, all the remaining records are moved such that each record occupies thesspace the
moving its immediate following record. For instance if record R, is deleted, then record R, occupies the space of R,. The ra
Tecord Size Number of
occupies the space of record R, record R, occupies the space of record R, and record R, occupies the space of record R R
spacefreed after inserting the record R, in record R, position is added to the free space, which again lies between the last record entries
Location
the block header and the first record. While changing the position and of the records, the intormation about the
end-of-freee space
pointer is also modifñed simultaneously.
In contrast to fixed length records, the cost of moving the records from one position to another is less
limited block size. costly because o

Slotted-page structure doesn't support direct pointers to the records instead support
indirect pointer to the record. These Figure: Structure of Slotted-page Representation
indirect pointer initially points to the entry in the header where the current location of the Technique
record is maintained and then points
to the intended record based on its location. The usage of indirect pointers
helps in preventing the space iragmentation 1ssue thainecessary
The allocation of records within the block initiates
from the end of the block and is done contiguous manner.
in
can encounter within the block. to ensure before allocation that the free space It is
in the block is between the last entry in the header array
record. When a
record is to be inserted, the block is searched from the end so as and the first
Example to locate a free space. Once the space has been
found, the entry
associated with the location and size of the record is stored in
Consider Student records to be of variable length. Using slotted-page structure, the header. On the other hand, when a record is to
these records are organzed in the followmg bedeleted, the block is searched so as to find the record. After finding the record, the space occupied
tanne S Corresponding entry set is deleted. All the records that are present by that record is freed and
pace occupied by
before the deleted record are moved in order to make the free
those available records. It must be ensured that the free space is occupied in such way
a that it between the
Block header
Records
S entry present in the header
array and the first record. As the block size is limited (i.e., 4 kB), the cost of movingisthe available
Sizel Entries
EEEEEE SInto the free space after deletion is not very high. When the free space is being occupied, the end-of-free space
asoupdated at pointer is
Free space the same time.
Location
21 11 B.C.A O.U nis representation technique, instead of pointing to the records directly, pointer points to the entry defined in the header
1S
cords
the entry that maintains the information about the location of the record. The advantage this
can be moved so as of level of mdirecton
to avoid the fragmentation issue
22 13 M.CA 0U an
ofof iindexingindex. What
are the different kinds of Explain
indices? based on which the technique
the factors
is evaluated.
24 15 B.Tech UNTU OR
aint
Answer with suitable examples the Cluster Indexes, primary and secondary indexes.
(Model Paper-1, Q11(a)I AprilMay-12, Set-2, Q8)
28 11 B.CA OU Index

nd base, operations periormed on


base tahl aindex can be defined as a data structure that increases the speed of data retrieval
230 19 MTech JNTU
Indexes keyword,
index of a text book enables us to find the desired
g the ne abases are analogous to indexes in text book. An index of a database enables the database
prografh to find
hedata,
avoidin Sequential scan through complete book. Likewise,
32|11 B.C.A O.Ug uisiiun io
Figure (2): Slotted-page he
Containin index se need of sequential scan through the entire table.
is usually specified on one field of the file called
an indexing field. One form of an
index is a file of entries
structure for "Student"
records ually spec the term used to describe the indexes
In the above structure, the free space is between
the last entry of block header ard
firstrreco
of hE eld
makes ata file
ordered by field value. The index file is
and pointer to record, which is records. Usually, index file is relatively smaller than of the data ile, This
(i.e, Record0). The entry E, points to the last record of (i-e an
the file (ie. record5), 'E,' points record 4, E) record 3 anu used to refer to the data
1

to 'E, poin o recore procee term


th
Searching the index using binary search
efficient.
O
TRUN ALLAN-ONE SIA GROUP
Look for the SIA GROUP LOGO
on the TITLE COVER before you uy -ONE 10
JOuRNAL FOR
ENGINEERING STUDENTS
NTU-HYDERABA
DATABASE MANAGEMENT SYSTEMS (JNTU. Ine
5.18 Storage and Indexing
UNIT-5
Types of Indices Hence
Sal
011E al dary index stores large number of entries. it .19
In DBMS, there are three types of indices, Ramana 021E 550 Seconua space relative to primary index However, the 5.3INDEX
RUCTURES,
650 ds moreearching an arbitrary record using secondary index BASED 1INDEXING,
. Primary index
022EKamana400
031E Shivani
c uken
to the primary index. The
reason is, linear search
a25. Explain in detail
TREE BASE HASH
INDEXING
700 isless d out on the data ile, n case it the secondary index about
2. Secondary index 032E Gurudev S00
be ca in primary index, binary search is carried approaches for organizing the two basic
S00 whereas data entries.
026EBhavani
3. Clustering index. 600
does
noin file in case if the primary index does not exist. OR
auf on the
016EBhavani 350
1 Primary Index
An index that is defined based on ordering key ficld of
an ordered file is called primary index. The use of ordering key
2 025E Laxmi
024E Laxmi
Figure (2): Sparse Index
200
300
o Kample
or example sider secondary index consist of two
ndex field and pointer to the block
Secondary Block
of file.
What are the indexed
any one of them. data structures? Explain
(Refer Only Topics: Indexing,
Hash-based Indesing)
field is to place file records sequentially on the disk block, where
attribute value number Answer (Model Papar-, Q10(a)
Example | May-16(R13). O11)
cach record is guaranteed to have unique value for that field. Indexing
20
Consider an example
Primary index follows the same ordering as that of the ot primary index 30
table. This table is specified on the shown Indexing refers to the process
file. Primary index is an ordered file consisting of two fields. ordered key attributein s record in a file using one or more of finding a particular
fileand the index contains two fields such as index th a record in any order (randomly on
index or (indexes) or storing
The first field maintains the same data type as that of the pointer to the primary key field of the file. entry anda 50 the disk).
ordering key field called primary key of the data file Secondary index attribute can have duplicate values Indexing
Greatest record Block
The second field refers to pointer to a disk block (block number hich needs an entry to be created for each of them. The index
number
address). i generally ordered according to the attribute even though the

0 fleis not ordered so that index can be searched quickly. Hash-based Tree-based
Primary index includes single index entry for each Clustering Index ndexing ndexing
3.
160
block in the data file. Each index entry; i holds two field values An index defined on the ordering file of an ordered file
250 Index Data Structures
represented as < K(i). P() > where K(i) refers to primary key index.
iscalled clustering The two methods in which file data entries can be
field value of the first record in a block called block anchor and In order to search a row with key "50', the Clustering index has the same ordering as the one of the file
PO) refers to pointer to the other block. first entryis organized (arranged) are
examined in the given table which is a less Inclustering index, ordering field of the data file can
than
the process is proceeded to the block "B'. Then '50'is searched
and then
havesame values for several records in the file But, in primary 1. Hash-based indexing and
Primary indices are of two types, '50'
on the block. index, ordering field of the data file must contain unique value
) Dense index for each record in the file.
2 Tree-based indexing.
Secondary Index Hash-based Indexing
(Gi) Sparse index. Likeprimary index and secondary index, clustering index
An index that is defined based on non-ordering field of isan ordered file consisting of two fields. Hashing is an organization approach, wherein it is possible
the data file is called secondary index.
Dense Index Thefirst field maintains the same data type as the he tofindthe desired records quickly,based on thesearchkey value.
Secondary index has different ordering than the one of elustering field of the data file. In this type of indexing, a group of the fle records, known
tDense index has an index record for every search
key the file.
value in the file. The index record contains The second field refers to pointer to a disk block. Clustering as BUCKET contains a primary page, along with the other
the search ndex includes one index entry for to
each distinct value of the additional pages that are chained together. In order determine
key value and a pointer to the first data record with that Like primary index, secondary index is also an ordercd
file consisting of two fields. than for every record. The index entry points to which bucket a record belongs to a special function called
scarch key value. to the fr ddata rather a HASH FUNCTION can be applied to search key value. By
block that contains the record with that field value.
011E
The first field maintains the same data type as that oft Example providinga bucket number, the primary page for the respectve
Sai
Sai
021E Ramana
550 non-ordering field of the data file.
Consider a clustering index shown in the below table
bucket can be retrieved in one or more disk VO operations
Ramana
650 The second field refers to pointer to a disk block
022E Ramána Sing of two fields, an index entry and a pointer to the (a) Inserting Records
400 The field on which the secondary index is constructe
block of a file. ) into the bucket by
Sluvani
031E The records are inserted (or added to
Shivani 700 needed"OVERFLOW" pages
Gurudev called indexing field. A file can have several secondary ind Clustering Block (allocating) assigning the
032P Gurudev 500 in addition to its indexing fields. attribute umber
Bhavani b) Searching Records
9ZEBhavani 600 value
Laxmi Secondary index on a key field is sometimes cu The record can be searched by
applying a hash function
016E Bhavani 350 records, Then
secondary key. The key field is guaranteed to have unique va Dinitially locate the bucket
containing the
025E Laxmi 200 for each record in the data file. are scanned so as to hnd the
all the pages in this bucket
024E Laxni 300 cordin search key. However, if the
Secondary index includes index entry for each rec desired record with a given pages
mary value then all the
Figure (1): Dense Index the data file rather than for each block as in the case o pr record does not have search keý
This table in the file needs to be scanned.
index. The reason is records of the data file are no ittrih consist of a entry for every record, For the
(i) Sparse Index ue '30' with block number B' have to be searched.
according to the values of the secondary key theld But th Retrieving Records
Sparse index has an index record for only some resented Value
he olomight contain various records with the attribute function to the record's
search key,
of the Each index entry, i holds two field values 30 By applyng a hash can be identified
search key values in the file. It is used when records
e Buch index result to be expensive for maintaining the required record
the page containing
arranged sequentially according to search key value.
are K), PG)> Where K() refers to the secondaryblock. eeXact g process involves number of queries to find and retrieved in
one disk l/O.
he record and P() refers to pointer to the disk D act record.
sPECTRUM
SIA GROUP
Look for the SIA GROUP LOGO ENGINEERING STUDENTS
2 on the TITLE COVER before you Duy ALL-IN-ONE JOURNAL FOR
TU-HYDERABAD
DATABASE MANAGEMENT SYSTEMS IJNTU-HVn ryss
5.20 storage and Indexinge
UNIT-b
Example d students whe
whose roll nos lies between '19
to find the and 24 the directio
Consider a file student (which represents data records) with a hash key mo. Applying the hash function to the
In order of the
5.21
h is shown
hat contains the needed record. The hash fiunction 'h' uses the last two digits of the binary value of the mo as the bueLds the
et identifie 520IRoot node in the figure.
at 19Smo<24
A search key index of marks obtained i.e., mrks, contain <mrks, rid pairs as data entries in an auxiliary
is shown in the figure (b). The rid (record id) points to the record whose search key value 1s mrks
index
flewhi o c barrel g P 2045 da
mrks uaudsti
h (mrks)-00/ h (mrks)11 a
--L -
Leaf node
Swetha, 19,75
Aasia, 22,35
nbesopmr ameca, 21,66
Lamab, 23, 88
Zainab, 34, 80 Swetha, 15, 80 Neelima, 42, 74
Shazia, 30, 83 Ansia, 31, 43|Juveria, 32, 83 Figure: Searching for the Roll
Numbers Between 19
and 24
Uzma, 34, 74 Jameela, 19, 67 Tabbu, 46, 87(6)
Nishat, 44, 73 Inorder to find all the students roll numbers lying between 17 and 40.
L Afshan, 25,70
rthis, we first direct the search to the node A, and after analyzing
its contents,
.. he leaf node
L', which actually contains required data entry. The other leafwe
ne then forward the search
to B, followed
hmo) 00 hrmo)=1 10 entriest fulills our search criteria. For thiS, all the leaf pages nodes L',, and L', also contains the data
enched
must be designed using double linked
using the NEXT Pointer on and L' can be list. Thus, L, can be
L obtained using the NEXT pointer on L'
mo
Number of disk I/Os Length of the path from The number of eaf
Figure: Auxiliary Index File (a) (occurs iñ search) the root to a leaf pages satistying data
< rno, mrks> Pairs Hashed on mrks (b) Index File ('Student)
Based on rno Key entres
Tree-based Indexing
In this type of indexing, the records are arranged 026. Explain the distinction between closed and open hashing. Discuss relative
key values and are arranged in a hierarchal structure
in a tree-like structure. The data entries are
sorted according to the search in database applications. the merits of each technique
(by the hierarchical search data structure) so
the data entries. as to find the correct page of
Answer :
Example Closed
Hashing
Consider the student record with a search oi
key rno is arranged in a tree-structured index. In Closed hashing is a type of hásh
B,L, L and L',) we need to perform disk I/O. order to retrieve the nodes (A structure in which every data entry is stored in the array of bucket. Basically, closed handles
e SSue of bucket overflow by using a concept called overflow chaining. In this overflow buckets (0) are chained together in
The lowest leaf level contains the data entries. the form
of linked list.
The additional records with rno's
of the leaf node L',, and to the right of the leaf node L' 15 and> 40 are added to the let su
The root node is responsible for initiating the search. DOre inserting a record into the bucket, the bucket must be scanned to know whether there is an unoccupied space for the
non-leaf pages which contains node pointers These searches are then directed
separated by the search key values.
to the correct leaf pages y
the
noT space is found (i.e., ifthe bucket is full) then an overflow bucket is chained with the existing bucket and the data cniry
key value k, are pointed to by the left node
pointer of k, Similarly, the data entries
The data entries (in a subtree) smalnerua iet t bucket.However, if this overflow bucket is full then, another overflow bucket is chained with the fist overflow
pointer of k, greater than k, ng are pointed to by the athe overfilow buckets are chained together, due to which the issue of bucket overflow is handled efficiently

heor arching for a data entry, the bucket including the overflow bucketsarescannedin the sequential manner(1.e, in
StartScarch ether t uckets are inserted) using the hash function, computed based on thesearch key.Thesearch processcontinues unl
ind the seae or an unused array space is found. This unused space specifies that no such key exist in the hash table
mo15 mo
rno /0 CCord
the search terminates.
15<
Adyantages

Closed
hashing is preferred hashing technique for database systems.
It isvery easy to perform deletion operation.
Falak6,70Mahi9,64 -
Leaflevol Duadvantages
n

better performance when there are larger size records.


Razia,10,50 Swetha,19,75Ayesha,30,49Shazia 35,91
Aash,22,35 Juveria,33,90 Asma,36,76 Closed which it is difficult to serialize the
Jamoca21.66 Noelima,34,72 Uzma,40,54 hashing chaining the overflow buckets, due to
Zainab,23,88 17 hash tabing technique requires pointers for
te
P Zoy,A254 le.
igure: Tree Structured lndoxing Itrequire chained bucket.
the initial entry of the
e Look for the SlA GROUP LOGO CXtra indireetion for accessing SiA GROUP
on the TITLE COVER before you Duy ENGINEERING STUDENTS
N-ONE JOURNAL FOR
5.22 DATABASE MANAGEMENT SYSTEMS JNTU.A DERABAD
uIT-5 Storageand Indexing
Open Hashing
oifferentiate between sequential, direct, indexed
Open hashing is another type of hashing technique which contain set of buckets "B. The size of B
228. sequent file organizatlon 5.23
isfixed OR
NO possible to add or delete the bucket from the set B). In contrast to closed hashing, open hashing doesdoes not (i.
not Use
theconicenN comparison of various file organizations.
Chaining. While inserting a new data entry, if the bucket is full, then the new entry is inserted in any of the
buel
set B. For instance, assume that the set B consists of buckets.b,, b,, b
, lf a new entry 18 to be inserted in bucket n WSWe
13 SCanned to check whether
there is required amount of space for inserting the new entry. If the space is foundnbu Sequential File Organization Direct File Organization
(Model apor-l,
Q11(a)J
18 inserted. However, April-1
18(R10), Q10(a))
ifthe bucket b, is full, then the next consecutive bucket 1.e., b, 1s SCanned. The process cons C Indexed Sequential
bucket is found wherein, there is a space for inserting the entry. This process of searching lor space File
within the bueLu In sequential file organization, Organization
the other is referred as linear probing. oneate In direct file organization,
records are stored in sequential-
records are stored in Direct In indexed sequential
access storage devices. Access Storage Devices file organi
Searching is an open hashing technique performed using a hash function that maps to the search zation often,
records are
search key
key values
valhn. Example: Magnetic tapes (DASD). in direct access stored
with the addresses of bucket present in the set B. assoclated (Audio cassettes). devices
Example: Magnetic disks Example: Magnetic
disks
Advantages (Hard disks). (Hard disk).
2. In this, required records are In this, required
records are
0 Open hashing technique is capable of reducing the time overhend incurred
while allocating new entry being accessed by searching In this, desired
record
searched randomly using records are
from beginning of the file to the keys searched either
(i)Itdoes not require extra indirection for accessing the initial entry of every bucket in end of the file till the records is
sequentially or
the B.set randomly.
found.
(ii) It provides better locality of reference.
Before processing transactions, 3. Before processing transactions,
(iv) This technique does not use pointers and therefore it is very easy records must be sorted in either
to serialize the hash tables. it is not necessary to 3.| Before processingtransactions,
ascending or descending order, sort the even though sequential
Disadvantages records stored in memory access
is used, there is no need
to sort
records.
() In open hashing, it is diflicult to perform deletion operation. 4 AccesSing speed is very less 4. Accessing specd is more
compared to both direct and Accessing speed is more when
(in) Open hashing technique is not preferred for database
lepioia trewskootanitalb oeaia indexed sequential file
compared to sequential access
compared with both sequential
applications, because compilers and assemblers and less when compared to
constructed can organizations. access and direct access since
perform insertion and search operations, but fail to perform indexed sequential access.
deletion operation on symbol tables. index is used.
This organization is economi-
(ii) The hash function once selected cannot be changed cally low compared to both 5.| This organization is more This organization is very expen-
due to which either space is wasted or expensive compared to
encountered. bucket ovérflow issues are direct access and indexed sive compared to both sequen-
scquential access and less tial access and direct access
sequential access.
expensive compared to direct since it requires special software.
5.4 coMPARISON OF FILE ORGANIZATiONS access.
Q27. Compare heap file organization Time consumption is more
with hashed file organization.d Time consumption is less
o rl compared to both sequential file when6. Time consumption is very less
compared with sequential file compared to both sequential file
Answer organization and indexed sequen
organization and more compared
May-15(R13), af10) tial file organization. organization and direct file
to indexed sequential file organization.
Thecomparison of heap, sorted and hashed files is done based on the
lefinitions and operations performed on those organization.
Heap Files Organization uGS. Explain
heap file with unclustered hash index.
Hashed Files Organization Answer
1. Records can be placed anywhere in thefile.
2 The cost of scanning is B(D + RC).
1. Record shoud be placed according HCapFile
with Unclustered Hash Index
The cost of scanning is 1.25B(D+RC).
Selection is specified on a candidate key, The cost 2. e le with unclustered hash index, assume the size of each data entry to be one tenth of the size of the data record. Also,
of| 3. Selection is based on the search key. The cost o1 searcning
cOsSt of 5sume that
the
ne statie
searching with equality selection is 0.5 B(D+RC) hashing method is used to locate data records and for simplicity consider that no overtiow chains cxL
static hash
with equality selection is H + D +0.5 RC stat so
The entire file must be scanned for search with range 4. The entire file must be scanned, and tneu selection
Overlow atic fle hashing, only 80% of the pages are occupied and the remaining 20% isleftfor he fitureinsertions
selection. Thecost is B(D+ RC). the range lthe ti Can be minimized upon the increase in the file size. This can be possible by including a new page entiry in ever
on thesearch key. The costis 1.25 BD+RO) current page in the file is 80%% full. Thus,
Records are inserted at the end of the file. The cost is then e number Crecords are initially inserted in the hash file structure and when the
er of pages ata c es are
5. Appropriate page must be located, mo Nensely in which data entries can be stored is 1.25 times more than the number of pages in wnien n
2D +C. Written back. Cost Cost of search + C+D. Packed. This can be numerically represented as,
Search for the record, then remove it from the page page
and| 6. First the record is searched, removea 1.25(0.10 B) = 0.125 B
write the modified page back for simplicity.
then modified page is written back.
it occupancy 15,
to its relative size and
CostSearch cost + C+ D. he number of data entries stored in every page with respect
Cost= cost of search +C+D. 10(0.80 R) 8R
ECTRUM
ALL-IN-ONE
SIA GROUP
Look for the SIA GROUP LOGO on the TITLE COVER before you buy JOURNAL FOR ENGINEERING STuDENTS
IJNTU-HYDE
MANAGEMENT SYSTEMS DERABAD
and lndexing
DATABASE UNIT-5 Storage suar
5.24 tions of Heap File with Unclustered Tree Index
Operation
Operations of Heap File with Unclustered Hash Index are,
unclustered hash index erntions that are to be performed on the
operat 5.25
the heap file with he heap file withunclustered
unclustered tree
uons that can be performed on tree itindex are,
Scanning
Seanning search index. the ret
ftence he
file without considering
he retrieved ne that the records are scanned irom the heap
ne records are randomly scanned from the heap records Asun nothave any order. But, if the records
file without considerin
lering the search
do not have any order. are scanned key index.
cost for each search key order withthesearch Key
data entries and one /o h data
datar
record.Hence keyindex Hence, the
incurred in the retrieval of all then theretrie retrieved
The total eost is the sum of the cost Following are the steps, for scanning a student's file, records have
it is given as, a
Scan the index's leaf level.
0.125B(D+SRO)+ BR(D+O 0
G Get the relevant record from the file for each
data entry.
Insertion Obtain soried data records according
heap file, the cost of hnding the page cost of addinoa (i) to <no, mrks>
t involves the cost Qf inserting a record at 2D+ Cin the entry
weading all the data entries is 0.15B
and rewriting of the page. which is expressed as, (D+6.7 RC)
10.Hence, the cost 1s (D+C) because each leaf page's 1/Os. For eachindexentrya record to
entry points to some has be fetchedin one
2D+C+20+ can be sorted in 4 B operations. otherpage.Hence
afile with B' pages
Deletion Insertion

the cost of finding the data record and the data entry at H+ 2D+4RC and writing back the changed . a record is first inserted at 2D +
Cin students heap file and the
ittheinvolves
index and file at 2D. Hence, the total cost is,
he found in
D log, 0.15 B+ Clog, 6.7 R followed associated entry
by the addition of a new entry intheindex. Thecorectleaf p
af pageCan
and rewritingin D
Deletion
204RO-20
The cost of deletion includes,
Equity Selection Seareh
Costof finding the Cost of finding the Costof rewniting
If the selection is not based on the search key value then the entire file need to be scanned. But, if the selection is ha record in a file
themodified
on the search key value then hash function must be used to locate the bucket with the corresponding data entry. By doing entryinan index Pageintheindex
andthefile.
this, the page containing the record can be retrained. Which corresponds to,
The total cost incurred in this search accounts to the following D log, 0.15B+C log, 6.7R+D+2D
0 The page containing the qualifying entries is identifed at the cost H. The record that is to be
deleted is searched using the index, thereby
i) Retrieval of the page assuming that it is the only page present in the bucket occurs at D, in the file record. modifying or rewriting back to
it therelevant pages
(i) The cost of finding an entry after scanning half page is 4RC. 4 Equality Selection Search
(iv) Fetching a record from the file is D. Hence, the total cost is, H+ D+ 4RC +D Ifthe selection of page containing the record is not based on the search key value, then the entirefileneeds
CH+2D+4RC But, if the selection of
page to be scanned.
is done based on the search key value then by using
record is found. the index the corresponding leaf page the

In case of many matched records the cost is, b Thecost involved in this
operation is the sum of,
H+D+4RC+ One 1/0for cach
recordthatqualifies
) The cost of finding the page containing a matched entry(ies).
) The cost of finding the first matched entry and
Range Selection Search (in) The cost of finding the first matched record.
the file s
Hash indices does not support the selections that are based on ranges. Thus, the cost incurred while scanning
Which is given as,
B(D+RC)
Q30. Explain heap file with unclustered tree index. Dlog, 0.15B+Clog, 6.7R+D
Answer Case of several qualifying entries that are nonconsecutive, the cost 1s given by,
Heap File with Unclustered Tree Index Dlog, 0.158+Clog, 6.7R+One VO for each matched record.
rentriesmad
In heap file with unclustered tree index, the number of leaf pages in an index is based on the number of daaecupan Range Selection
67% 0
Search
in pages. Consider the size of each data entry to be one tenth the size of a data record. If the index pages have satisfies the tange selection 1s retneved
the range selection is matched with the composite key. Now, the record thal
then the number of leaf pages in the index will be, and alatcOther subsequent data entries are sequentialy located until the range
selection is not satistied by a data ctilug
0.1(1.5 B)= 0.15 B To fe for fetching the dala entries depend on
occupancy will e one cost is needed. Thus, the cost required
and o
umbeseach data entry, input/output
tecords instance, if the selection condition is
satisfied by 10% of the data
In the same way, the number of data cniries stored on each page with respect to its relative size und s then al ccords that satisty the range selection. For
10(0.67 R)-6.7R PECTRUN these reçords are retrieved, sorted and then maintained.
H ALL-IN-ONE JOURNAL FOR ENGINEERING sruDENTS
SIAGROUP
Look for the SlA GROUP LOGO on the TITLE covER before you
buy
MANAGEMENT SYSTEMS JNTU-HY
J.HYDERABAD
DATABASE
5.26 and Indexing
UNIT-5 Storage
PERFORMANCE TUNING Operations
5.5 INDEXES AND Aggregnte
Useof
in detall. Consider the given example. 5.27
Q81. Explain indexes and performance tuning Model Paper-l, SELECT S.sno, COUNT (") from
Answer Q106)
Student S GROUP BY S.sno.
Indexes and Performance Tuning we cho0Se and be explained
can be explained in
in terms
tern of cthis query is to count the number
1he pertormance of the system
depends greatly on the indexes the expect
ected For each value of students in
of sno we count each section sno,
sause the number if hash o
work load. then
ofthe absence otthe retrieval operation.
Decause of of index's data ash or B+ tree index
matter entries. The type exists
Work Load Impact means of indexe oftheir of Sno
criteria can be retrieved effectively by 19What is a composite search key? What are the pros index
Data entries that qualifies a particular selection and cons of comp
doesn't
Answer: composite
Two selection types are search keys?
Composite Search
Keys
) Equality and
A arch key nat contains many fields is called
(i) Range selection. a composite/concatenated
as we as nserts, deletes and updates whereas search key.
Tree-based indexing supports both the selection criteria y
onls.
equality
Example
apart from insertion, deletion and updation. record with the fields name,
selection is supported by Harsh-based indexing rno and
shown in the figure. mrks which is sorted
Advantages of Using Tree-Structured Indexes keys is by name composite
entries can be
be handled
handled effectively
effectively. indexwith the various
By using tree-structured indexes, insertion and deletion of data rno,mrks>
in a sorted file.
2. It finds the corect leaf page faster than binary search Index1 75
mrks, mo>

Disadvantage 14 70 70, 14 ndex


5, 1
The sorted Sle pages are in accordance with the disk's order hence sequential retrieval of such pages is quicke
not possible in tree-structured indexes. This drawback can be overcome by the use of ISAM that provides fast searching along
80, 19
E
90, 1
with the sequential allocation of the leaf pages.
Namne
Clustered Index Organization Aasa Neelma
Lamab
We must have at least one clustered index in addition to several unclustered indexes in order to prevent the duplication of 14
19
mrks
large data records.
Example
Thesearch key mo in student records is a cluttered index whereas mrks is an unclustered index.
Using clustered indexes for indexing cheap and an addition of a new record in a leaf page that is full causes the creation of
a new page with theassignment of some old records to that new page. All the database pointers must new point to the new page 15 15
which involves many disk I/Os. Hence, it is used rarely. Index 19
Index Only Evaluation Index
As the name implies the query evaluation here is done solely through the file indexes rather than accessing all the daa Figure: Composite Key Indexes
DiferenceBetween
records. Equality and Range Queries
Advantage
It uses only unclustered indexes. Equality Query
Range Query
Example uCquality query for a composite search key is defined
tnen D asa search key in which 1.
A range query for a composite search key is delined asa
we want to find out the average marks obtained by the students in an exam apart from having an index mrks Constant (bounded
each field is associated with search key in which all the fields are not bounded to the
finds this by using index data entries. to a constant). constants.
Example
Clustered Indexes-Example 2. Example
Dala entries (in
Consider the following database query, student file) where rno = 15 and marks Data entries (in a student, file) where mo = 15 with any
0 can be retrieved
by usingequality query mrks can be retrieved.
SELECT S.no From students S where S.mrks>70. ditionand nis 1s supported range
depends on tne co by Hash-file organization. The use of hash-file organization is not suitable for
IfB+ tree index on rno exists then all the students whose mrks> 70 can be retrieved but this
the number of students, who scored marks> 70. Two cases arise Advantage queries.
Case
It
If all the students obtained marks> 70 then sequential scanning is advantagcous. ndex-only evaluation technique because more selection conditions gets satisfied leading to a large number of
queries.
Case (i) sadvantag
Iff only few students secured> 70 marks then it depends on the type of index. eie
Ifitisan unclustered index then it involves one IO per qualifying student which would be expensive Onan
because of the change in its search key.
scan (insert/delete/update) (composite key) it needs to be updated
index then it requires only 10% of the I/Os incurred in lon
tis largerthana a single-attribute search key which leads to a
number levels.
3Look for the SIA GROUP LOGo on the TITLE CoVER before
you buy weTRUN

i-ONE JoURNAL FOR ENGINEERING STUDENS


SLAGROUP
Indexing THU AM 22HATAO
SYSTEMS IJNTU-HYDERA
TU-HYDERABAD) Storage and
DATABASE MANAGEMENT T.5
UNIT-5
5.28 Root
SEQUENTIAL ACCESS METHODS (ISA 5.29
9.6 INTUITIONS FOR TREE INDEXES, INDEXED
indexes?
Q33. What is the intention behind tree-structured Model Paper4, Non-leaf
Q11( elPages
Answer
Intuition for Tree Indexes focus ot these tree structures is on insea
structures called ISAM and B'-tree. The main gpa field. If a gueand
here are two tree data record. Assume that the nle is sorted by Leaf
For example, a file of employees
deletion ot data entries. tnen it be processed by anni.De Pages
employees with gpa greater than 4.
processed includes a range selection such as "Find all onward. such a query prOcessing will be
evn a

search lor such employees and then scanning the file from that pOints
inary nurmber of pages fetched. -Overflow
utne cost of binary search is proportional to the Prmary
is to create a second file with one record
per page in the ongmal (data). file in the form of indev Pages Pages
Analternative approach in our example). Such an mdex hle wll have the followineformat.
for es
ll Figure (1: Structure of ISAM
(Key, page)and then sort the file by the search key (gpa field Index
The ISAM index can be created in one of the following alternmatives.
Index Entry Adata entry in index is actually a data record with the search kev
k=Key k
P-Pointer to a Page data entry is a pair of (k, rid) where k is the search key value
and rid is the idof a data record.
A data entry is a pair of (&, _lust) where k is the search key
rid
value and rid list is alistof datarecord
cecond alternative is used then the datà records id
Figure: Index Page Format are stored in a separate file
Each key acts as a separator for each left and right contents of the pages printed by the pointers. An index page number of pa
f the ISAM index. When the
ile is created then irst all the and the pairs(k,rid)are storedin
theleaf
value k. (If the file is created leal pages are sequentially allocated
using choice (2) or (5) then and thensortedusingthe
pointers one more than the number of keys. first the data records are created
of ISAM index are allocated). Then all the non-leal pages are
andsorted and then the leaf
To this example query, a binary search of the index file is done to identify a page that cotains the records starting with overfiow area to leaf page if
there are more insertions. The figure
allocated. The additional overfilow pages are
added from an
the search key (gpa) value and then following the pointer to the page to get the first data record with that key value. Thereby the (2) shows page allocation.
are scanned to identify the conditions that satisfy.
2 nofbdare Data Pages sunl i
Hence, query uses a simple ong level indexing file. The structure of one level index file is shown below, ohs
elinokleetadut
Index F
heitn2shhe
Overtlow Pages
Data File
Figure (2): Allocation of Page in ISAM
Operations of ISAM
Figure: Structure of One-level Index
Because an entry size in the index file will be smaller than the page size in the data file, there will be only one entry in the An ISAM structure supports the
basic operations i.e., insertion, deletion and search very well. An
pocessed by starting the search at the root node and equality selection search
indexfile for per page ofthe data file, Hence the index file will be smaller than the data file. Therefore, the binary search on index determining which of the subtree to search by comparng the
HSnt in both search values
filewill much fasterthan binary search on data file. However, the index file will still larger to perform insertion or deletion and it the given record and the curent node. If match
is found then the search is successful otherwise search failed.
will be very expensive. The large size of index file brings on idea of tree indexing, in which one-level index structure is repeated a range query' is also appropriate in the same way by determining
that leads to atree structure. Tetmeving
the data pages sequentially. the starting point in the data (or leat) level and then
Q34. Explain about ISAM along with its pros and cons.
OR FOr
on and deletion, the approximate page is determined similarly as for search and then the record is
insertion, if leaf page is full then overflow pages are
inserted or
Example added.
Explain deletion and insertion operations in ISAM with examples. (May-19(R16), Q11(a) May-17(R15), 0100)
|

(Refer Only Topic Operation of IS4M


OR
Sder the tree shown in the following figure to illustrate the ISAM index structure
and Cons of ISAM? Root-
What are the Pros
Refer Only Topies: Pros of ISAM. Cons of ISAM
Nov./Dec.18(R16), Q110D)
Answer
Indexed Sequential Access Method (ISAM) Non-ea|30|43
(1.e,
In 1SAM data structure the number of leaf pages in a tree is fixed at the tin e of file creation and the data ries Pages
records of index file) of the ISAM index and stored in leaf pages of the tree. ISAM is a static structure hence additiona nages
pages are added to the leaf page if page is full and more entries are to be inserted in a single leaf page. All these ovei Allthe
are chained the leal pages. Every node is a tree corresponds to the disk page and all the data is stored in lear pus
pages in the 1SAM tree are organized carefully so that the page boundaries corresponds to the properties of the undery
device. Figure (1) illustrates the ISAM index structure:r ro e ECTRUM
ALL-AN-ONE
Figure (3: Example of ISAM Tree
SIA GROUP
Look for the SIA GROUP LOGo on the TITLE COVER before you buy HL-IN-ONE JoURNAL FOR ENGINEERING STuDENTS
U-HYDERABAD)
|JNTU-HYDEDA Storage and Indexing
Ev c
MANAGEMENT SYSTEMS
DATABASE UNIT-5
5.30
To search, the search begins
at the root. Since th 5.7 B'TREES: A DYNAMIC INDEX
STRUCTURE 5.31
an example to finda record with the key value 37. we ollow tne madre polnter since 37 lies
Value

Stakeunan the key in the root we


follow the left pointer and then
range search again we start
search from the root tocen 30
B-tree and the structure of B-tree in detail
value is found. For a Explain about the with
ence the record with the search key as the primary pages
are assumed
assumed to bebe mine an example.
the leaf pages sequentially organized
the firstlying data entry and then start retrieving page Answer:
'next leat
Sequentally therefore no pointer is needed to find the econd data page in which aa
belongs to tne second data in which
a
NOW, consider the insertion of record with
key value 33. This entry leaf paoe -free structure represents a balanced tree satistying the following
only 2 entries, ience tne record
With key 33 isinsertd tree index properties,
that a leaf page can contain shoed by
All paths from root to node follow the same lengr
dircdy contains two entries. It is assumed in that overfiow page. The ISAM structure ater insertion of aa record record is
is
shown below. 0
anovertiow page and putting this 33
ach node that is not a root or leaf has between and n children
Root

t: A leaf node has between and n-1 values.

Noo-leaf| 3014 Structure


ofB"-tree
Pages refers to multilevel index, but-its structure is very much different
to multilevel index senent
The structure a
of typical node of B"-tree is shown below.
Primary Leat 20 25 30r 37
0 47 50 5661-|6573107
Pages
Figure (1): Node of a B tree
Overflow
35 58 51 siltuae e
eh s bdinppe
pages ib fo the above figure, K, Ky.. Kn- Tepresent search-Key varues and
P, P.P, represent pointerto records or
bucketof
records
key values in a node are arranged in sequential order 1.e.
Figure 14): ISAM Tree after Insertion of a Record
tkies
K, <K,.<K,.i
page. It this record is in an overflow page The structure of B-tree understood by considering the structure of
A record with data entry * is deleted by simply removing the entry from the is
leat node as well as the structure of non leaf node.
leaf page then the space
than after deletion if this page is empty than the page can be removed. If the page belongs to the primary Structure of Leaf Node
created by such deletion is left unchanged to be used for the future insertion. Since the records from overtiow page can't be moved Pointer,P for i = 1,2,. n refers to either a file record having search-key
to the leaf page. When deletion on the primary page create space. Hence the number of primary leat pages are fixed in the ISAM value K, or to a bucket of pointers, where each
pointer points to a file record having search-key value K
structure
The following figure illustrates one leaf node ofa B*-tree for an employee file. In this example, the search-key
Overfiow Pages and Locking nameand the value
of n is 3. The employee file is
is employee
ordered according to search-key value i.e; by employee name. Hence, in
pages are fixed. The contents of leaf pages is affected only by the insertion leaf node, pointers point directly to the file. the
Once the ISAM fle is created the number of leaf
and deletion operations. Such a design of a file results in the long chains of overfiow pages when the number of records are Leaf node
inserted in the same page. This will increase the time to retrieve a record since all the overffow pages needs to be searches when
page. So such an operation is effecting the time to retrieve a record. This problem can be solved by Sai Ramana
the search comes to this leaf
keeping 20% of space free in each page at the time of creating a file. Once this free space is full the insertion can't take place
until a record is deleted. One possible solution to eliminate the overfiow of chains is to reorganize the file.
Since only the leaf pages can be modified it has an important advantage with respect to concurrent access. A page 15 lockeu O11E Sai 12000
by a requestor, when it gets the access to t so that it can't be modified by some other users. A page must be locked in "exclust 021E Ramana 13000
node"to modify its contents. The other users will be waiting in a queue to access a page locked by some user. Such ques aM 031E Shiva 11000
become performance botleneck if all the users are waiting to access a page near the root of an index structure. In tne
structure this locking step is omitted by knowing the fact that index-level pages are never modified. So the ISAM Struettie
a
an advantage of not locking index-level pages over a dynamic structure like Bt++ tree. If data distribution and size
arc s
Figure (2): Leaf Node for Employee B* tree Index
which means rare overflow chains) then ISAM structure is preferable to B+ trees.
ros of ISAM Gurudev
The pros of ISAM are as follows,
It consumes less time for searching a record in a large database Shivani BhavaniL

It offers both partial retrieval and range retrieval of records.


ns of ISAM
S Ramana Shiani -rdev BaLami
The cons of ISAM are as follows, B-tree for Employee File
Figure (3): Ly
overlap with each other. If
It needs additional space for storing index value in the disk Each leaf values. The values present
in leaf node do not to arrange the lcdu
node can value. Pointer P, is used
It requires reconstruction of fle while inserting new records so as to maintain the sequence. k
aflected.
teprese

lot leaf n c Can carry upto (n -1)


J then L, s search
key value is less than
L, 's search key

release the space on deleting a record so as to prevent the database performance from getu Ording to the value of search key. GROUP
It requires to WECTRLUM SIA
ENGINEERING STUDENTS
Look for the SIA GROUP LOoGo on the TITLE COVER before you
buy nLL-IN-ONE JOURNAL FOR
(JNTU-HYDEDAL
DATABASE MANAGEMENT SYSTEMS ERABAD)
32 5Storage and Indexing T
ructure of Non-leaf Node elseif Kim K, mP represents i tree
pointer The insertion algorithm is as
follows,
in node m /
In B'-ree, the rion-leaf nodes form a multilevel sparse
aex on the leaf nodes. In the structure of non-leaf node, the then m: m.P m-root niode;
ointera point to troc tiode else read m
ADon-leaf can carry upto n pointers and must carry at begin setS /S represents stack/
castpointers search node n for an entry i such that while (m is not a leaf node) do
For a non-eaf with m pointers, m. K,,K ám.K; begin
0) All search keys in the subtree to whichP, points are less
m mP Push address of node "m' on stack S;
than K
For 2 i s n-1, all the search keys in the subtree to
end; t-number oftree pointers in node m;
which P, points have values greater than or equal to K., read n
and less than K
ifK mk
end; then m m.P,
Gii) All search keys in the subtree t6 which P. points have
Searchblock m for entry (K, Pr) with K
values. greater than or equal to
r K else ifK mk,
Q36. Explain all the operations on B'-trees by taking if found then m mP,
a sample example. then read data file block with addres Pr and else
OR
retrieve a record, begin
Discuss insert, delete, search operations on
Btrees, else seatch node m for recordi such
Hov DecA8R10), 011) that mk,, <K<mk
OR record with search key value "K' is not in the m m.P
Describe the insertion and deletion operations cnd;
in B' trees. data file;
Apri-18(R10), Q10(h)
GReforOnly Topica nzertion, Delerion (ii) read n
Insertion
OR end;
Inserting a record in B'-tree inítially assumes that tree
Explain deletion and insertion operation in B contains only root node, which is also treated as leaf node. When Search m for record
K, Pr, with K K; searching forleaf node
trees. May-17(P15), 116) the level of the tree is incremented by 1, the tree is divided into if found
Refer Only Topies: Insertion, Deletio) leaf nodes and non-leaf (internal) nodes
OR It is important to note that every search-key value appears
then record already exists in file,
Explain the insertion and deletion operations in leaf node. The reason is all leaf nodes contain pointers to dala clse
in Btrees with eKample, records,However, some search key values exist at non-leaf
insert record in B-tree
begin
(Refer Only Topies: Tnsertion Deletion nodesto guide the search for records ín the index. Another point
be noted is every search-key value that exists in non-leaf node Create record
(Model Paperai, 011() 1 May-19(R16), a10h)) 1o
Answer (K, Pr)
is also exists as fight most-value in the leaf level of the Subire
The operations on BAree includes,
pointed at by the tree pointer to the left of the value.
*
Pr points to the new record/
() Scarch if leaf node m is
To insert a record, if leaf node is not full then insert the not full
(ii) Insertion theninsert entry (K, Pr) in correct position
TeCord in correct position in leaf node. Otherwise i.e., if leat
Gii) Deletíon. in m
node is full with Precord pointers then split the node into else
( Search two nodes. /leafnodeisfull/
begin
Scarching aB'-tree for a key value always begins at the are
root node. A search for single key value always follows one After spliting. j=Put the first j' records temp: n,
path from the root node to leaf node.
placed in the original node and the left of the records are mov T
temp refers to over size leaf node
The search algorithm for B'-trees is as follows, 1O the new leaf node, In
the non-leaf node of the parent, insert entry (K, Pr) in temp;
m: root node; search value is inserted. In addition to this, an extra pom
read n the new node is created and inserted in the parent noue hew-a new empty leaf node,
ew.
while (m is not leaf node) do begin f the internal node is not full, then insert the
cOrrect position in internal node. Otherwise i.c, if the
recor
Pn.P
tnumber oftree pointers in.node m; node is full with p tree pointers then split the node anced
ifKS mK, mK, represents i search field value are placed
in node m */
nodes. After splitting, the records upto tree pom moved
in the same node and the records from tree pointer yare m-firstj recordsin temp,
then m: mP to the new internal node. Wremaining entries in temp,
PECTRI

LODk for the SlA GROUP LOGO on the TITLE CovER before you buy LL-IN-ONE JOURNAL FOR ENGINEERING STaDENTS SIA GROUP 2
UNIT-6
Storage and Indexing 6A
TU-HYDERABADI
5.34 DATABASE MANAGEMENT SYSTEMS Value 6
IoertngKey 5.35
K-K contal one leaf node, L, and it is empty. So, insert key value 6 inleafnodeIL,
Bree
finiahed-false
repeat
f stack N is empty Value 9
asertngKey
then
noparent node;" To insert a key
value 9, search for the locatic where the key is expected to occur,
It is foundto be
begin value 9 in leaf
node L leafnode
the key L,So,insert
root -8 new empty internal node.
root n, K, new >
finished
true
aserting Key Value
2
end
else value 2, search for the location where the key 1s expected occur.
iea inucrt a key
to It is found to leaf node Bua lesf oe
k full i.e, it contains maxXimumtwo records, >0, Inserting a new record results in overflow. split the leaf L,
begin The first node contains first half of the keys and the second S, ode
odes node contains second half of the keys.
npop stack S;
if internal node m is not full
hen
begin Now, a new root node is required to point to leaf nodes L, and L,. So, create root node
a
insert(K, new ) in node m
finished true;

else
begin / internal node m is full/ de ach lenf node is half ful,
temmpm Inserting Key Value 8
insert(%, new ) in temp
new a new empty intenal node;
onrt
the key
akey value 8, nearch for the location where the key is expected to oecur. tis
value 8 in leaf node L
found to be loaf node
l,So,inen
j-2
m-records up lo tree ponter
now4-records from tree ponter P,
yitatefe
nserting Key
K-K Value 4

end
ond
,, 10 Bplit the n
split the
40
node into
, BCrch for the location where the key is expected to occur.
two noden L, and iy
The location is found to D

until finished

end
nd;

Exmple ght most key in L, i.e, 4 is now moved to the tree.


In this example, asaume that the order size is 3 und maximum number of keys in ench leaf node
6,9,2, 8,4,
In 2 10,
a 2).

The insert equence is 13


rign
Initially 19'-1ree contains single lenf node, The leaf node consists of one or more data-pointers and a poimr
sibling GROUP
-SIA
HLLIN-ONE JouRNAL FOR ENGINEERING STUDENTS

.ook for the SIA GROUP L000 on the TTLE COVER before you buy
L
5.38 DATABASE MANAGEMENT SYSTEMS
[JNTU.HY
DERABAD

mee
because records take nore
usually
e nie organization, maintaining good space utilization is necessarymore siribution ce
keys and pointers. The space utilization can be improved in a B' tree by adding sibling nodes in redistribut
during splits and metges. This technique can be used for leaf nodes as well as non lent noues.
oirocore
Daring insertion, itblock does not contain enough memory space for new entry, then, in order to create space for
stributes some of its records to one oftheadjacent block. If the adjacent block is also ful, then systemmdi divides
th
block into two blocks and redistributes the records to one among the adjacent blocks. Eachblock must hold at least 2
records
where 'n represents the number of records that the block can hold.

Daring deletion, if the number of records in a block are less than 2n


then system borrows a record from oneofihe
thesiblin
nodes. f both sibling nodes containrecords then instead of borrowing an entry the system redistributes
the recorde the
in
node and in the two siblings evenly between two of the nodes and removes the third node.

This technique can be used becauise the total number


of records is
and it is less than 2n. If three adjacentble

are involved in redistribution then each node


will haverecords. If m' nodes are used in distribution then each
nodewillh
atleast records.

B-tree file organization can be used to store large binary objects.


(blobs) as well as character large objects (clobs).Thee
large objects are divided into smaller records in order
to store and organize them in a B'-tree file organization.

EXERCISE QUESTIONS
Discuss about dynamic multi-level indexing.
2 Write the differences between variable-length
and fixed-length file organization.
Implement open hashing technique with
an example.
Distinguish between B-tree and B+-tree.

Construct a B'-tree for the following key


values (3, 5, 9, 10, 16, 18, 21, 25). tree
initially empty and value are added Assume that, u
in ascending order. o
Consider the given cases wheretn nber
pointers that will fit in one node is as follows,

(a) Four

(b) Five

(c) Eight.

You might also like