DBMS Unit5
DBMS Unit5
Marketed by:
UNIT
STORAGE AND INDEXING
SIA GROUP
PART-A
SHORT QUESTIONS WITH SOLUTIONS
01. Explain the properties of indexes.
Answer
The properties of indexes are as follows,
Indexes enhance the performance level of the databases
They can retrieve the records in a particular sequence order, thereby
reducing the work load of the database manager.
2
They are capable of addressing the requirements of the application
program.
They consume less time to locate the file records.
They eliminate the need to analyse each entry during the query execution
i.e., they reduce the amount of file that is to be
5
searched.
They increase thespeed of accessing the data records.
Indexes can be added or removed by the database designers without
modifying the application logic, due to which the
is increased.
maintenance costs is decreased whenever the size of the database
They can perform binary search on variable-length file records
of a clustering index? Explain.
When must we create non-clustering index despite the advantages
Answer
an order which differs
records in a file are arranged in sequential order. But if the search key of the index defines
the unclustered index. Both clustered and unclustered indexes
L Scquental order of the files. Then such an index is known as slower, as the data
data addition and modification process
advantages and disadvantages. Clustered indexes make
Wn also have to be sorted along with that in theindex. In
these type of situations, we must create
non-clustering index.
ie
03. What
is Index and Hashing?
May-19(R16), Q1(0
Answer
Indexing
(entries) positions in a
Hashin function for mapping pairs to the corresponding
of a hásh table position can store
tableg 1S a technique that makes use stored at a position fK) by the hash function f. Each hash
dlly a pair
pair. with a key 'K is
P
WESTR
STUDENTS SIA GROUP
ALL-IN-ONE JOURNALFOR ENGINEERING
TU-HYDERABAD
MANAGEMENT SYSTEMS IJNTU-H TM
5.2
DATABASE torage and
Indexing C3 28A0e
UNIT-
Differentiate between primary indexing' and 'secondary indexing 5.3
4
Answer:
What is the difference between indexing and
shing?
(Model Paper-, a1(0)J
April-18(R
R16),
aT e
Hashing O1 Answer
Primary Index
Indexing Secondary Index
Itis a technique ofusing hash function for deteermining
index that contains a primary key is called a An index that does not contain
lt is a technique of using a table or a data structure to location ofrows in file, in
a which we obtaint a primary key is called
determine the location of rows in a file. thea the primary index. secondary index.
a
of the disk block containing desired record directly.
have unique
may not have any duplicates 1.e., 1t
It may have duplicates,
In indexing performance degrades as file grows. In hashing, a bad hash function result in lool
record.
time proportional to the number of search keveaking values in each
ne lookup time is proportional to log of number of| 3. | The average 100kup time of of hashing
hashing isis con
constant
and It
consumes less storage space. It consumes more storage space.
values in database relation. independent of the database size.
a is less.
The time required to Search particular record The time required to search a particular
The values are stored sequentiallyin the sorted order.4The values are scatteread randomly acrosS many bucket record is more.
block
Suitable for queries that lookup for records based on a Suitable for queries that lookup 1or records based An index entry 1s created for each An index entry is created for each
record present
rangeof key.values. Single key value. within the data file.
Q5. What is primary and secondary indexing? April-18R16), Primary index can also be referred to as sparse 6. Secondary index can also be referred as
01 to dense index.
OR
Block anchors (the first record in each data block) 1. Block anchors cannotbe used,
Discuss about primary indexes. can be used.
CReferOnly Topic: Primary Index)
The file records are physically ordered based on the 8. The file records are not physically ordered due to
OR the
primary key field. absence of the primary key field.
What is meant by secondary index?
(ReferOnly Topic: Secondary Index) i 08. Explain what are the differences between tree-based and hash-based indexes.
Answer
Primary Index
270 a p May-15(R13), Q10 Answer sis May-19(R16), a10)
Tree-based Indexing
An index that is defined based on ordering key field an ordered
field is to place file records sequentially on the
of file is called primary index. The use
disk block, where each record is guaranteed to of ordering key of indexing, the records are arranged in a tree-like structure. The data entries are sorted according to the search
In this type
have unique value for that field.
It follows the same ordering as that of the file. It is key values and are arranged in a hierarchal structure (by the hierarchical search data structure) so as to find the correct page of
same data type as that of the ordering an ordered file consisting of two fields.
The first field maintains the the data entries.
key field called primary key of the data file. The second field
block (block address). refers to pointer to a disk
Hash-based Indexing
Secondary Index
Hashing is an organization approach, wherein it is possible to find the desired records quickly, based on the search key value.
An index that is defined based on non-ordering field
than the one of the file. It like primary of the data file is called secondary index. It
index, it is also an ordered file consisting has different orderng In this type
of indexing, a group of the file records, known as BUCKET contains a primary page, along with the other
data type as that of the non-ordering field of two fields. The first field maintains the same
of the data file. The secondfield refers topointer a additional pages that are chained together. In order to determine, to which bucket a record belongs to a special function called a
Q6. When would users use a tree-based index? to disk block.
HASHFUNCTION can be applied to search key value. By providing a bucket number, the primary page for the respective bucket
Answer can be retrieved
in one or more disk 1/O operations.
vo Tree-based Indexing is used in the following ineél sro grisei Model Paper-l, a
situations, are tree-structure indexes are good for searches, especially range selections?
Data ordering is required. Why
2. Answer: May-17(R15), a10)
Search, insert and delete operations are
to be performed.
two tree data structures called ISAM and B- tree. The main focus of these tree structures on insertion and
3. Some data structure is employed for Ther 1s
searching. are
4. Data elements/records are stored at letion of data entries. For example, a file of employees record. Assume that thefle is sorted by gpa field. Ifaquery to be
leaf nodes to obtain a fast access/search. processed processed by applying8a
5 Researches are to be pêrformed in hi eludes a range selection such as "Find all employees with gpa greater than 2.5". Then it be
the multidimensional access Dinary
search for such employeesaand then scanning the file from that points onward. Such a query processing will be expensive,
methods. Such employees
Geometric shapes need to be stored. if the cost 0r
of binary search is proportional to the number of pages fetched.
Range as well as equality searches need
to be performed. original (data) file in the form of index
ave approach is to create a second file with one record per page in the
An
8. Base tables need to be accessed. Cntries
our example)
9. Automatic reorganization of the files is required. page) and then sort the file by the search key (gpa field in
An index page number of
Each key acts as separator for each
a for eacn left and right contents
of the pages printed by the pointers,
10 Index file degradation issue is to be solved. pointers acts file.
ore uses a simple one level indexing
11. Good space utilization is needed i.e., less number than the number of keys. Hence, query
of tree nodes is to be used. Because an than the page size in the data file,
there will beonly one entry in the
in the index file will be smaller
12. Search-key value is to be determined before reaching
the leaf nodes. dex file for an entry size ln
per page of the data file.
SIA GROUP
20fLook for the SIA GROUP LOGO ALL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS
on the TITLE COVER
before you buy
ru-HYDERABAD
[JNTU-HYDEs rage and IndexingY2
5.4 DATABASE MANAGEMENT SYSTEMS Storag F0313ASAM S AYA
UNIT-5
what is the main difference between M and B-tree 5.5
10. Differentiate between sparse and dense indices. exes?
a14.
Answer Model
Paper-, Answer
Dense Indice Q1 May-17(R15), 010)
Sparse Indice ISAM
B-tree
Index entry is available for every search
rchkey value
Index entry is available for only few of the search 1
in ISAM (Indexed Sequential Access Method) is a static 1.B-tree is a dymamic indexing structure.
key values. the file.
indexing structure. no aislts
It is applicable only when the relation is present in 2. It is applicable in either case.
Applicable for static files 2
3.
sorted order of the search key.
It is complex. It is simple.
2.
3. The leaf
pages are allocated sequenti 3.
Applicablefordynamic
files.9
The leaf pages are allocated randomly
d e
4. It consume more time. It consume less time. 4. Due to static size of
ISAM, overflow chains may Due to dynamic size of B"-trees,
overfilow chains
rarely occur. may frequently occur.
.It locate the records indirectly. It locates the records directly.
Only leaf pages can be modified. 5. Leaf as well as indexlevel pages can
Q11.Differentiate between clustered and unclustered indices. be modified.
6. Scanning is done more efficiently Scanning is done less effñiciently.
Answer
Insertions lead to long over flow chains. bo 7. Insertions are handled elegantly without overflow
Clustered Indice Unclustered Indice
chains.
Afile organization where the order of data records is
similar to the order of data entries in some index IS
A file organization where the order of
data record is
not similar to the order of data entries in some
The number of nodes to be examined is equal to . The umber of nodes to be examined is equal to the
called clutered index index B-trees plus the number of overflow pages. height of tree.
is called unclustered index.
Therecan be only one clustered index a on data
The performance ofISAM is less eficient. 9 The perfomance of B-trees is more efficient.
file.2. There can be several unclustered indexes on a data file.
The rid in qualifying data entries point to a contiguous
The rid in qualifying data entries point to a distinct
10 Locking over head of ISAM is less 10. Locking overhead of B"-trees is more.
contiguous collection of record which lead to retrieve
data pages which lead to retrieve several pages. How to compute the disk
only few pages. Q15. access time?
The data addition and modification process is Answer: May-15(R13), Q10)
slower. 4. The data addition and modification process is
not
slower as cluster index. Disk access time is computed by using the following equation.
Clustered index is relatively expensive
to maintain 5. Unclustered index is cost effective. Access time = Seek time +Latency+ Transfer timeiledi
when the file is updated.
s besteeb 1
or)(or)
a12. What is the relationship between files
and indexes? oebat
Access time =
Disk controller processing time t+Rotational delay +Transfer time.
Answer
01 Where,
Model Paper-|,
A file is a collection or sequence of records
that can be built and destroyed
that form a disk based data structure. Allocation whereas, an index is list of keys or keywo time (or) disk controller processing time refers to the amount of time required by à hard disk controller to search a
of an index onto a file is done
ofrecords that satisfy search conditions on thesearchkey fields ofthe index. in order to speed up the searching and reui
eekparticular block of stored data.
Q13. What are the advantages Kotational delay refers to the time taken to position the proper sector under read/write head.
and disadvantages of B-trees?
Answer olobeiiulda ranster time refers to the time taken for completion of data transfer
Nov./Dec.-18(R16), Q10
a bicl a
TA
Advantages of B"-trees
The advantages of B-trees are as
1
tollows,
It offers a moderate performance for
direct access.
r st iaint2 b1
2. It offers fast searching.
3. It offers an exceptional performance
for range and sequential accesses.
Disadvantages of B-trees
The disadvantages ofB"-trees are as follows,
1 It leads to memory wastage when duplicate key
values are searched.
2 It has complicated insertion process.
3. t has complicated deletion process.
SPECTRUM
Look for the SIA GROUP LOGO HLL-IN-ONE JOURNAL FOR ENGINEERING STODENTS
ALL-IN SIA GROUP
2on the TITLE COVER before you buy
SYSTEMS TU-HYDERABAD
IJNTU-HV. Index TelkaoKAS
DATABASE MANAGEMENT Storage and
dexing
5.6 UNIT-5
Magnetic Disks There are different type
5.7
PART-B pesof
Magnetic disks can be
classified into two respect to their storage hard disk that difer with
capacity. Haid disk
ESSAY QUESTIONs WITH
SOLUTIONs Disk from several hundreds holds data ranging
of megabytes to several
()Floppy disk which is circular,, speed of hard disk is measured gigabytes. The
in terms of access time.
STORAGE (FlopPy ylar plastic. This disk is coated with ferric
5.1 DATA ON EXTERNAL of mylar
made Storage Organization of Hard Disk
inely. Give the speed
iece
ut peis enclosed in a plastic cover tnat act as a protective Information is stored in
Q16. List the physical storage media available on the computers you use routinely, Give speed with
wis
which
oxidea
Floppy disk
are readand written by a floppy disk drive
using a circular, flat platters.
the form of magnetic
patterns by
data can be accessed on each medium. hield. which is responsible for carrying out all the These platters are made either
ghielrive of
tthe
drive glass or metals coated with a special
Answer Model Paper-4, is operations like, rotating the disk, reading data O information is stored in tracks. These
material on both sides. The
Q10 data onto the disk. tracks areinturn divided
Physical Storage Media disk and Writingwere into small sectors. The rotation
the basic
basically used in personal computerS of platters is done by a spindle
The different physical storage media that are used in out daily life are, Floppy disk were transfer ad smotor motor ithat is connected to the spindle
mounted onthe platters
oas to perform software distribution, data
1. Magnetic disks Read/write operation is carried
mall backups.
out by using a special
2.Optical disks. of electromagnetic read/write devices present
types ot floppy disks that were used, on the sliders, which
There were two is mounted onto a actuator arms. These arms
Magnetic Disks This floppy disk has very limited storage are connected into a
Magnetic disks are the most commonly used secondary storage medium. The advantages of these disk over the ma
5einch
snace and the rate of data transter is very slow, when
single assembly and placed on the disk surface. Tbhe
positioning
e magnetic of the arms is done by a device called actuator.
tapes are compared to the other disk.
Write operation is carried out in the same way
(a) They provide high storage capacity. 3-inch- To overcome the problems faced while
using 5Ve-inch floppy disk. 3V-inch floppy disk were the floppy disk. The disk surface consists of array of magnetized
as done in
() They are much reliable
developed. This floppy disk have 1.44 MB of storage and demagnetized dots. Binary I is used to represent magnetized
()They have the ability to directly access the stored data. dot and Binary 0 is used to represent demagnetized dot. In order
space and the data transfer rate is high.
Sorage to read data from disk, Virtual File Allocation Table (VFA
Manetic disk comprises circular plate madé of either plasticor metal. This plate is coated with magnetic oxide Organization of Floppy Disk
in these disk are stored either on the magnetized or demagnetized layer. The lave. Data or File Allocation Table (FAT) is initially read into Windows
bit-value ot Is represented on a magnetized
spot and For performing read/write operation a floppy disk
operating system, when the hard disk is being partitioned.
value of0 is represented on demagnetized spot. In order to carry out the
read operation, the data present on the magnetized surface needsto be inserted into the floppy disk drive. When the disk
is converted into electrical impulses which are then send to the processor for s inserted, Reading of VFAT, FAT helps the operating system in
execution. On the holds it and rotate it inside a plastic jacket. A
be carried out by converting those electrical impulses into magnetic spots. Magnetic disk is other hand, the write operation can system drive
in order to protect it from dust or other interferences.
engraved inside a protective shield of levers are pressed whenever a disk is inserted. One knowing the sector and the track in which data is stores. Using
lever is responsible for opening the metal plate while the other this information read/write head can perform the read operation.
Storage Organization of Magnetic Disk levers and gears are responsible for moving two read/write The entire data stored on the disk is read sequentially
Data is organized on a normal disk by dividing the surface of heads. The heads move until they come in contact with the disk In hard disk, read'write head does not come in contact
disk into various regions. The disk surface is fragmented into
many imaginary tracks and sectors. Tracks are On either sides. Signals
are received by the circuit board. These with the disk surface (i.e., floats slightly off the surface). The
the concentric circles along which the data is stored. Sectors
that are accessible by the read/wnite heads of any disk drive. are fixed-sized areas sgnals include data as well as instructions needed to perform distance between the read/write head and the disk surface is
Track sector refers to an area where both track and sector intersect
Every sector possess certain identification information
at its initiation referred to as sector header. In order to access data from
eadwrite operation. Ifthe signal signifies write operation, then equal to the thickness of a human hair. In case, if both read
a disk, proper location path has to be specified. This path tis the responsibility of circuit board to initially check that the write head comes in contact with disksurface then a head crash
is a combination of surface, track and sector number respectively. Ihe
capacity of each sector can exceed 512 bytes of data. For ghtis invisible in the floppy disk drive. Ifa light is detected by is caused. In order to reduce the chances of a head crash, many
ensuring that data stored in the disk is free from errors, the secios a
maintain a provision for error detection and error correction.
The
he photo sensor present on
the opposite side of floppy dísk, then disk controllers place the readiwrite head onto track which is
sectors on the disk surface does not exist in
they are separated by suitable gaps called as inter-sector succession ralner to know that disk is write-protected because.of unused
gap. These gaps are useful in locating desired
sector on a given track.
Comes
Whichdata cannot
be recorded.
Accessing Data from Magnetic Disk Data Access Speed
Data is stored on the circular track by using Amotor which is present below the disk rotates the The access time a had disk is 12-19 ms.
multiple read/write heads. These heads This shaft comes in contact with a notch on disk nud tnat
simultaneously. The read/write head is mounted are capable of accessing the auja traca
on the access am assembly, that can be
placed in both inwards and ourwaru Ses the disk to rotate. Magnetic field is created by electnical 2 Optical Disks
Following are the steps performed while accessing
data from magnetic disk. Ses whenever the read/write head is positioned correctiy. Optical disk is a form of external storage device, which
Seek-In this step, the read/write head is placed on the desired ingnetic fields is created in any one of the read/write head is most widely used today for.storing large volume of data sucn
track. The time taken positioning te heads Order to write of optical disk
from one track to a specific track on the disk is in the read/wi data either on top or bottom of tne sk as multimedia data (audio, video). The advantage
referred to as seek time. The seek lisecond. Read can be stored in very less space.
Latency-In this step, the read/write head associated with time may range from 6 to l m can be carried out when the electrical is that, massive volume of data
the desired platter is activated.
impulne peration many
wait until the required sector comes under, is The time taken by
ead 1o
ransterred to the computer from the respective There are optical disks available in the market that diner
refered as latency time or rotational delay nei
nagnetic storage capacities. As the storage capaciy ot
from 4.2 to 6.7 milliseconds. meranges Particle me magnetic field is produced by a metallic in their sizes and
time. The laten
cle nis
present on the optical disk is large, the cost of storing a single bit
is very low
Data Transfer Data disk
Access Speed device is the Compact
One of the popular optical storage
In this step, the read/write head moves the
data from the disk to the primary memory. data from
The ac
time of a floppy disk drive is 12-25 ms. Disk Read Only Memory
a
(CD-ROM), which is round, tat-
the disk or writing data onto the disk iS referred to as The time taken to b)HardDisk This disk is coated with a
malenal on
data transfer rate. Data transfer second
seco piece of plastic disk.
rate is measured in kilobits p highly retlective areas. The
Access time is the combination of all the three times Hard di which data is wnitten in a form of a
i.e., syster
Is the primary storage unit of the commputer stored data may be read trom
less reflective areas by using
Access time = Seek time + Latency time + Data transfer
rate ComprisessO known as hard drive or fixed disk. This disk diode. In order to read or write data
trom and t the cD
Memory access time can be delined as the time taken Of disk platters. These platters are made oflaser CD-ROM drive is required. The drawback
to transfer a single character from memory to the processor or frol inium It also ROM, a drive called
processor to memory. On the other hand, disk access time ory Consist
ave a magnetie material coating. not allow a usèr to write data into
is the time taken to position
to prusired
desire d h of prot being of CD-ROM is that, it does
available on the disk.-The access time of RAM is 80 the read/write head over i ed. yer, that protect the disk from it. It only allows to read data
written by manufacturer.
nanoseconds and that of a hard
disk is 12 to 19 milliseconds SPECTRU SIA GROUP
Look for the SlA GROUP LOGo
on the TITLE CovER before you buy NLIN-ONE JOURNAL FOR ENGINEERING
STuDENTS
NTU-HYDERABAD
DATABASE MANAGEMENT SYSTEMS JNTU-HYn Indexir
exing
5.8
UNIT-5
Storage and
Following two reels are used for operatine Systems
Therefore, another version, WORM (Write Once Read themaghe Disk Space through OS File data that is read 5.9
any) 1s used for storing archival data. This disk allows the tape, MaB ing nerating system can also manage the disk words, the unit from or written
to disk (in other
space. of I/O) is considered
data to be written only once but can be read several times. Ihe (a) Supply reel sequence of bytes. The operating system or 8 kB. The cost page a page whose s
files as of 1/O i.e., the cost pages size is4 kB
data written can neither be erased nor overwritten. wORM nas Take-up reel. quests such as Read byte b. of hief into of written from
(b) to main memory and read f
longer life-span when compared to other devices. sates i ruction: "Read block of rack t of cylinderitsaSK from main memory
Themovement of the tape is done from evel c usually high than the cost of typical to disk is
Data Access Speed supply insules can also be used to built the database disk cost can be reduced database operations.
This
the take-up reel. The side of the magnetic tape ifa optimized databases
The access time of optical disk is 8-12 ms. with magnetic oxide is passed to the read/write hesCo
teel
whi er
of disk For example, the entire data is stored in one or 1se system is built.
Q17. Explain about tertiary storage media in detail. Whenever the tape comes beneath the read/write
head, en s
p fles and the block are allocated and initialize (by OS)
these fil Now,
this is the job of the disk space manager to
5.2
FILE ORGANIZATION
CLUSTER INDEXES,
AND INDEXING,
Answer read/write operation can be performned. e for
anage the space these OS files.
in SECONDARY
PRIMARY AND
INDEXES
Tertiary Storage Media ii) Magnetic Disks
Bu
nv database systems do not relay on
the OS file Z0. LIst several ways of organizingrecords a
lertiary storage is a type of storage media which lies at For answer refer Unit-V, Q16, Topic: Magnetic and perform their own way or disk management, One of Explain sequential file organization. in file.
the bottom level of the storage device hierarchy, organized Disk sons is that the DBMS may want to access a single file
inQ18. Write a detailed note on disk space managen
accordance to the speed and cost of devices. Followingg are the hnse size is greater than the maximum size of a file which is OR
nent
two devices of tertiary storage media, system. State and explain various file organisation
Answer 3 ApriuMay-12,
Set-4,Q8 GBonthe 32-bit
) Optical disk Disk Space Management 019. How the data is stored external storage?
in methods. Give suitable examples to
each them.
(i) Magnetic tapes The disk space manager manages space (Model Paper-1l, Q10(@) | April-11, Set-1, a2(a)Answer
on disk. The Answer Nov/Dec.-18R16), Q10
(ii) Magnetic disks. space manager uses a page as a unit of data, which
commands to allocate or deallocate a page provides
di Database consists of large volumes of data that cannotOrganizing Records in File
hestored into the main memory. Such persistent data is stored
a
Optical Disk and read or write File organizationisamechanism of physically arringing
page. The page size is cqual to the size of a disk DBMS on some external storage devices like diskS ks and
and
block and the or organizing the records of a file onto a secondary storage
For answer refer Unit-V, Q16, Topic: Optical Disks. pages are stored as disk blocks which
requires one disk inpul uupes. Disks provide random access of data and tapes provide se-
devices such as magnetic disk, tapes or CD-ROM. Some
(i) output to perform the reading or writing of a page. quential access of data. The cost of accessing the data randomly ile of the
Magnetic Tapes organizations supported by DBMS inchude,
more than accessing the data sequentially. The data stored
Magnetic tapes are plastic tapes that have a magnetic The sequence.of pages are stored as contiguous block is in
the disks is usually in the form of files. These files consist I. Sequential file organization
coating around it. In such tapes, data is stored in the form to hold the data that is frequently accessed in sequential order of records that
have a unique identifier known as a record id or 2.
of smal portion of magnetized and demagnetized layer. Heap file organization
The This is advantageous for sequentially accessing disk block id This identifier is used to determine the address of the page
magnetized portion signifies the bit value as 1 whereas the This capability must also be provided to the higher layers 3. Hash (or) direct file organization
demagnetized portion signifies the bit value as 0. There are the DBMS by the disk space manager.
ofie, the record in which the page is stored.
different types of magnetic tapes, each of which differ in their Consider, a database that consists of I million pages. To 4 Indexed sequential file organization.
sizes and their speed (with which the tape moves the read/write The disk space manager hides all the underlying CKecute certain query, the entire database need to
be scanned. 1. Sequential File Organization
hardware details and make the higher layers to think of data as Ifthe main memory contain only 1000
head). Magnetic tapes also differ with respect pages to hold the data
to recording collection of pages. In sequential file organization, records are stored in
density that specifies the amount of data that can
be stored on t
then becomes impossible to bring all the
data into th
a linear inch of tape. Handling of Free Blocks
nemory one at
a time. Thus, whenever required, the DBMS aparticular sequence (i.e. ascending or descending) based
a
tring data into the memory
for processing. But, if there is no on the Search key values. Basically, a search key is neither
The advantage of magnetic tapes is that they The disk space manager keeps track of the space on le pace in the main memory, then some existing page key nor a superkey instead it is a set of attributes or a
durable. The magnetic tapes can be erased and even
are very from the primary are
reused aisK. Ihe database may grow or shrink when the insertionu uCmory must be replaced by the new page by adopting certain single attribute. In this type of organization all the records
many number of times. Magnetic tapes are very (consecutively) stored onto a physical storage
much reliabledeletion operations are performed on it. To manage the ds policy. In this way the DBMS can bring data into contiguouslyoperations
and are inexpensive when compared to other secondary storage pace,the disk space manager will keep track of used disko hemain memory
for processing device. 1he that can be performed on sequental file
devices. as well as which pages are on which disk blocks. The deletin records are
TheDBMS components that read and write data irom
Magnetic tapes are sequential in nature and operation on the disk may create "holes. main memory
are, (a) Search
cannot
erform random access. The data is transmitted at very
Thisoperation is performed so to locate a particular
I. Buffer Manager as
slow here are two ways to determine block usage,
speed incomparable to the magnetic disks. on
1. Using a list of free blocks 1s asoftware layer whose major responsibility is t record by using a binary search technique the search
Storage Organization of Magnetic Tapes
Magnetic tapes are fragmented into vertical 2 Using bitmap. feto
eu g rom the main memory whenever, it receives a
m the files and access methods layer. The pages are (b)
key value.
be ses
Table: Employees Records
12-1-2016
processing File
hand, il a batch query is beino
order. For example, the file structure generated after sequentially On the other inserted into the above ile, a new page
eeords are insertedi
more records
arranging the "student" records using pointer as follows, multiple record then such query can be processed withd Iftwo 1s created and the last record
pass. This is done by initially sorting all the recorde single is stored in the second page
as follows, age
21 11B.cA OU search key values. This retrieval improves the efficiency of Emp_ID NAME
22 13 M.CA OU as well as reduces the cost of processing. leve
doining
101 ABC
24 15 B. Tech JNTU Advantages of Sequential File Organization 15-10-2005
28 11 BCA OU Thetimeconsumed for retrieving the record 102 PQR
) Pages-2 17-12-2011
uentialy 103
30 19 M. Tech JNTU based on search key is very less. XYZ 12-1-2016
2 1 BcA OU (i) It isvery casy to access the next sequential record
usin
Sing
104 MNO 6-7-2017
pointers.
Figure: Sequential File Organization for Student Records (in) It has the ability of creating automatic backup copies Pages-2 105 STU 21-5-2019
ot
In the above structure, the pointer in each record points
to the address of the next sequential record.
the1file.
Disadvantages of Sequential file Organization Table: Employees Records File After Inserting New
Sequential file organization even reduces the block accesS, Records
) The time taken to search a specific record in a large fl Hash or Direct File Organization
while perioming sequential file processing by storing the records is very high.
physically onto the storage devices based on the search Hash file organization organizes the file records in a random order based on the hash function
key The entire file needs to be scanned white perfoming which is computed for every
order. However, it is diffñcult to maintain the physical sequential (i) search key value. The operations that can be perlormed on hash file records are,
order whenever insertions and deletions are performed. This is multiple key retrieval.
0) Search
because of the high cost incurred in moving several records when (11) Newfile needs to be created while
preforming insertion This operation is performed so as to locate a particular record by
a single insertion or deletion is perfomed. Therefore, the concept updates. computing the hash function or the search key value.
b) Insert
of pointer chains can be used while performing the deletion (V) Insertion and deletion are expensive as the recordsneed
operation. On the other hand, the following two rules must be to remain in a physical sequential order. This operation is performed by initially searching the
bucket in which the record is inserted. This is done by computing
applied while performming the insertion operation, 2. the hash function. Once the bucket with the required space
is found, the new record can inserted. However, there no
Heap File Organization if is
Search for the record, which is placed before the record enough space in the file block then a new overflow block is created and is chained with
Heap file organization organizes the respective file block.
records in a
that is to be inserted. This search must be performed random order i.e., the records are stored inthethehleorder in which
(c) Delete
using search key order.
they are created. The operations that can be performed on heap This operation is performed by initially searching the bucket in which the record is present. This
is done by computing the
) Check whether there exists any free space for inserting records are,
hash function. Once the bucket with the desired record is found, the respective record
can be deleted from the file block,
record, If there is suffñcient space, then insert the record.anie(a)
thereby creating a free space that can be used for ihserting another record.
Search
Otherwise insert the respective record in an overfiow adition to these file organizations. there is another file organization referred to as "Multitable clustering fleorganizatio
block. All the record in the overfilow block are This operation is performed so as to locate a particular wnerein the interrelated records of different relations are stored in separate file.
linked
together using pointers. record by using a linear search technique on the search Example
Example key value.
(b) Consider employees table as an example,
If a new record is to be inserted after record2, then Insert
overflow block is used because the file doesn't an
contain any free ln order to perform the insert operation, it is necessui Name E_SAL
pace. After appending
the overflow block, record 2, now points to ensure whether there is a free space in the fileDio ID
to the address of new record, which in turn points
to the address This can be done by performing search operation p
to insert operation. If there is a free space then tne u E_29 Jack 2000
of its next sequential record 1.e., record3.
record can be inserted.
21 11 BCA OU
C) Delete
E_36 Donakd 4000
22 13 M.CA O.U
ln order to perform the delete operation, it is neces E_45 John 9000
24 15 B.Tech JNTU
to ensure whether the record to be deleted
the
28 11 B.CA O.U from
hle. If the desired record exists, then it is delc E 85 Mickey 4500
30 19 M.Toch JNTU the file block.
32 11 B.CA 0.U Example E55 Rhino 5000
puge of
Consider Employer records file carrying one page 3500
d the E_92 Smike
data. If the capacity of a page is 4 recor
27 175 0u rd makes
carries 3 records, then inserted one more rec
Sume that blocks. Each block contains multiple records.
Figure: Sequential File Organization with an Overfiow Block the page ful. So, additional insertions requ at the file information is stored in different
to be inserted into the file. record.
ample, consider salary attribute as desired
ETRUM SIA GROUP
Look for the Sla GROUP LOGO
on the TITLE COVER before you Duy
ALL-AN-ONE
NLL-IN-ONE JOURNAL FOR ENGINEERING STUDENTS
JNTU-HYDERABAL
MANAGEMENT SYSTEMS (JNTU.HV
5.12 DATABASE
Storage andindexing doA
UNIT-5
Organization
Bucket 0 fadexed Sequential File 5.13
E_SAL 9000 sequential file organization 15 an organization
d that enables the user
4500 organizatio rords are stored in physical storage to access the records
directly. In this
both sequentially
mary key is used to order the records that are
key
devices like magnetic
disks along withassociated and
Generally, primary stored on the disk. primary
Bucket 1 E 29 Jack 2000 Com Drimary key, associated index forevery records
part is being stored
records directly and inthe disk. This indexresponds
accessing ne
the records and as well as sequentsally. based onuser
E36 Donakd 4000 ries, by access. Among these two accesses,
direct accessis most preferable
sequental
than
ucka E45 John 9000
ETampie
2000 E_85 Mickey 4500 Track Index for Cylinder
500 1
Track HighestKey
3500 E 55
Rhino 5000
Bucket 3
E 92 Smike 3500
ack
Track Index for Cylinder 3
244 ec4446Fec46
Track Hihest Key
Figure
Inthisfigure, there are several buckets ranging from 0 6,
record is accessed by using fae hash funcdion (E SALJMODD6.to The records available in the table are 6. Therefore, the desid
f the desired recondhas salary 2500, then the range query (2500
Initially, the record value (2500) underyoes E Sal <4000) of that record is to be executed on thet
modulo division with 6.
Track Index for Cylider 4
Track Highest Key
length records within a file. Basically, fixed length records are the 15| B.Tech JNTU D Aleted records are utilized) hen a record (1.e., if no recorders are
deleted or ifthe spaces
in such situation, the new
Record 430| ofall d record is inserted at the end of the file.
records that have same fixed number of bytes, same of number of 19 M.Tech JNTU
fields. In these type of records, the record slots are unifom and Record
are organized in a sequential manner within the file.
The primary
532 11 B.C.A O.Uu header
advantageoffixed-length records is that, it is very easy to pertorm Record 0 21 11 B.C.A O.U
Figure (2: File Structure Generated
insertion and deletion. This is because the space created after Deleting Record 3
by deleting In the above example, the deletion of record Record 1
a record is equal to the space required for storing the new 3, resuled
record in moving all the records following
However, these sort of records waste lot memory
of record3 Record 2 24 15 B.Tech JNTU
if the default 4, record 5) such that each of the record is i.e,
reconi
record sze set is greater than the actual Record 3
size ot record. and inserted into the space that was
moved ahead
Example occupied by its
corresponding previous record Record 4 30 19 M.Tech JNTU
Consider a file of Student" records for college
database. However, this way of reusing the Record5 32 11 B.C.AOU
Every record in the file is defined in the space is highly
following manner, inefficient because it requires more
number records
pe course = record
tobe relocated. Therefore, another approach of is to move obeiiertFigure (4;: File
Structure Generated Using Free List After Deleting Records 1,3
stu_id number (10,2) the last record of the file and insert it within the space
In the above file structure, file header points to the address of the first deleted
c id char(10); formerly occupied by deleted record.
Dtheaddress of second record (i.e, record 1), which in turm points
deleted record (i.e., records) thereby creating a free
name char (20); Record list of deleted records.
021 11 B.C.A
O.U 022. Explain
about variable-length file organization with an example.
unversity_name char (30); Record 1 22 13
M.CA O.U Answer
end Record 2 24 15 B.Tech JNTU ariable-length
For the above record, let us assume File Organization
occupied by numeric and character that the space Record 5 32
data types is lbyte and 8
11 B.C.A O.U length file organization is a way of arranging variable length records within a file. Basically, vanable
byte respectively. This implies that cords
bytes. In order to store all the
a student record is of size
68
student records in a file, a simple
Record 4 30 19 M.Tech JNTU a
e the records of multiple sizes. In
length
contrast to fixed-length records, variable length records incur some overhead wnile
approach is to reserve 68 bytes Figure (3: File Structure Generated by
aihe SCruon and deletion operation. This is because of the difference between the space created after deleting the record
for every ndividual record Moving the Last Record e equired for inserting the record. While inserting a variable length record, it is possible that entire space may be lef
reserve 68 bytes for studentI (i.e.,
record, 68 for student 2 record after Deleting Record 3
so on). Despite of being easily
implemented, this encounters
and 0rdS1ze of new
is less than availablerecord is greater than the available space) or the space may be partially filled (since the suze ot nEW
the following problems: In the above structure,
the last record i.e., record space).
. Deletion Overhead moved and inserted into the space formerly
by the deleted record i.e., record 3.
occup ari
le owever.length file organization is used for organizing the databases that storedata whose size S greater u
Ifa record is to be deleted from the fixed RSuch reet ational databases impose a restriction on the size of the record such that it is less than or equal to the block
then it is necessary for a user to block structure, Owever, Such way of moving the record req iable le on helpsmayin be simplifying the buffer management and free space managements. There are diflerent ways m winich
nitially (prior to deletion) ensure additional block accesses as insertions are hle stored in the database system. In general, variable length records include a recerd consistang ol,
whether it is possible, record
performed than deletion. Therefore, it is preferab Multiple records
To Reuse the Space of Deleted type in a file
other Records Record for Storing eave the space created after deleting a record as u
space and to wait for subsequent insertion pra0 9, Record pes
in which it is possible to define. Variable length fields
Ifit is possible,thenspace utilization the space. pes
types in whicn
the record which is following is done by moving Slotted- which it is possible to reuse the same field multiple times.
the deleted record To use Marker on Deleted Recora length records. This structure is basically used
inserting it within the space and
icture is a technique aznployed for implementing variable
utilization is done
by moving Though, it is possible to usea marker on deleted
recor iine stor header which is placed at the beginning of
every individual block
the record which is following
the deleted record and it der* slores thesin a block. Slotted-page consists ofa
inserting it within the space created
after
so as to ignore those records while processfbe information regarding.
all the remaining records of the fhiles are deletion. Next. not considered as an effective approach oengce i
E
the similar fashion. moved ahead in
difficulty incurred in searching for avala ndoffree space ber
of record entries present in the header
order to perform new insertions. Array within the block
Look for the SIA GROUP LOGO that C the location and size of the
record
TRUM ontains entries specifying
T on the TITLE COVER before you buy
JOURNAL FOR ENGINEERING STUDENTS
SIA GROUP
-ONE
NTU-HYDERABA
SYSTEMS IJNTL
DATABASE MANAGEMENT Indexing
5.16 INIT-6
Storageand RA
wolain about byte-string
Explain representation in de 5.17
Records a23.
Block header nswer
tring representation is a technique used 1or implementing variable-length
Size entries
Location
Free space
-RR A
mbol, L,
which signifies end-of-record is attachedto
n the form of sequential bytess
end of every record.
string. The disadvantages
records. In this representatio
Once the symbol special
of using this technique are, has been added, the records can
e stor w difficult to reuse the space of recently deleted
record.
3 e provided to the variable-length records
increased
increased must be moved
if their size increases. In such situation,
moved. Moving tne records the records whose
Figure (1: Slotted Page Structure
ize has been irom their actual location
ecially when these records are pinned. leads to high maintenance
cOst
manner such that the first entry in t ome these issues, another version of byte-string
The allocation of records within the block is done in a sequential Thus, to ov
starting from the end of bloct Ock he fsing 'L' symbol, a header is stored
1nstead ofusing
representation called "slotted-page
Speciies the size and location of last record in the block (i.e. the allocation done
is structure"isused. In this
llocat representatron, at the beginning of every record. This header
wh
between the last entry in the block header and the first record.ennever
allrecords of a file, the free space left is placed er a recoaboul,
storestheinformation
sto De inserted, aofspace Cord an ent
is allocated for a new record at the end of free space. After inserting the record ( Total number ot record entries present in the header
size and location respective record is appended to the header On the other hand, a
if record deleted from
is the h
the space occupied by that record is freed and the corresponding entry in the header is marked as deleted. Then the c) Position where the free space in the block ends
array
the records placed before the deleted records within the block are changed i.e., the record before the deleted record ooccupies i) Location and size of record, which is stored in an
space freed by the deleted record. Next, all the remaining records are moved such that each record occupies thesspace the
moving its immediate following record. For instance if record R, is deleted, then record R, occupies the space of R,. The ra
Tecord Size Number of
occupies the space of record R, record R, occupies the space of record R, and record R, occupies the space of record R R
spacefreed after inserting the record R, in record R, position is added to the free space, which again lies between the last record entries
Location
the block header and the first record. While changing the position and of the records, the intormation about the
end-of-freee space
pointer is also modifñed simultaneously.
In contrast to fixed length records, the cost of moving the records from one position to another is less
limited block size. costly because o
Slotted-page structure doesn't support direct pointers to the records instead support
indirect pointer to the record. These Figure: Structure of Slotted-page Representation
indirect pointer initially points to the entry in the header where the current location of the Technique
record is maintained and then points
to the intended record based on its location. The usage of indirect pointers
helps in preventing the space iragmentation 1ssue thainecessary
The allocation of records within the block initiates
from the end of the block and is done contiguous manner.
in
can encounter within the block. to ensure before allocation that the free space It is
in the block is between the last entry in the header array
record. When a
record is to be inserted, the block is searched from the end so as and the first
Example to locate a free space. Once the space has been
found, the entry
associated with the location and size of the record is stored in
Consider Student records to be of variable length. Using slotted-page structure, the header. On the other hand, when a record is to
these records are organzed in the followmg bedeleted, the block is searched so as to find the record. After finding the record, the space occupied
tanne S Corresponding entry set is deleted. All the records that are present by that record is freed and
pace occupied by
before the deleted record are moved in order to make the free
those available records. It must be ensured that the free space is occupied in such way
a that it between the
Block header
Records
S entry present in the header
array and the first record. As the block size is limited (i.e., 4 kB), the cost of movingisthe available
Sizel Entries
EEEEEE SInto the free space after deletion is not very high. When the free space is being occupied, the end-of-free space
asoupdated at pointer is
Free space the same time.
Location
21 11 B.C.A O.U nis representation technique, instead of pointing to the records directly, pointer points to the entry defined in the header
1S
cords
the entry that maintains the information about the location of the record. The advantage this
can be moved so as of level of mdirecton
to avoid the fragmentation issue
22 13 M.CA 0U an
ofof iindexingindex. What
are the different kinds of Explain
indices? based on which the technique
the factors
is evaluated.
24 15 B.Tech UNTU OR
aint
Answer with suitable examples the Cluster Indexes, primary and secondary indexes.
(Model Paper-1, Q11(a)I AprilMay-12, Set-2, Q8)
28 11 B.CA OU Index
0 fleis not ordered so that index can be searched quickly. Hash-based Tree-based
Primary index includes single index entry for each Clustering Index ndexing ndexing
3.
160
block in the data file. Each index entry; i holds two field values An index defined on the ordering file of an ordered file
250 Index Data Structures
represented as < K(i). P() > where K(i) refers to primary key index.
iscalled clustering The two methods in which file data entries can be
field value of the first record in a block called block anchor and In order to search a row with key "50', the Clustering index has the same ordering as the one of the file
PO) refers to pointer to the other block. first entryis organized (arranged) are
examined in the given table which is a less Inclustering index, ordering field of the data file can
than
the process is proceeded to the block "B'. Then '50'is searched
and then
havesame values for several records in the file But, in primary 1. Hash-based indexing and
Primary indices are of two types, '50'
on the block. index, ordering field of the data file must contain unique value
) Dense index for each record in the file.
2 Tree-based indexing.
Secondary Index Hash-based Indexing
(Gi) Sparse index. Likeprimary index and secondary index, clustering index
An index that is defined based on non-ordering field of isan ordered file consisting of two fields. Hashing is an organization approach, wherein it is possible
the data file is called secondary index.
Dense Index Thefirst field maintains the same data type as the he tofindthe desired records quickly,based on thesearchkey value.
Secondary index has different ordering than the one of elustering field of the data file. In this type of indexing, a group of the fle records, known
tDense index has an index record for every search
key the file.
value in the file. The index record contains The second field refers to pointer to a disk block. Clustering as BUCKET contains a primary page, along with the other
the search ndex includes one index entry for to
each distinct value of the additional pages that are chained together. In order determine
key value and a pointer to the first data record with that Like primary index, secondary index is also an ordercd
file consisting of two fields. than for every record. The index entry points to which bucket a record belongs to a special function called
scarch key value. to the fr ddata rather a HASH FUNCTION can be applied to search key value. By
block that contains the record with that field value.
011E
The first field maintains the same data type as that oft Example providinga bucket number, the primary page for the respectve
Sai
Sai
021E Ramana
550 non-ordering field of the data file.
Consider a clustering index shown in the below table
bucket can be retrieved in one or more disk VO operations
Ramana
650 The second field refers to pointer to a disk block
022E Ramána Sing of two fields, an index entry and a pointer to the (a) Inserting Records
400 The field on which the secondary index is constructe
block of a file. ) into the bucket by
Sluvani
031E The records are inserted (or added to
Shivani 700 needed"OVERFLOW" pages
Gurudev called indexing field. A file can have several secondary ind Clustering Block (allocating) assigning the
032P Gurudev 500 in addition to its indexing fields. attribute umber
Bhavani b) Searching Records
9ZEBhavani 600 value
Laxmi Secondary index on a key field is sometimes cu The record can be searched by
applying a hash function
016E Bhavani 350 records, Then
secondary key. The key field is guaranteed to have unique va Dinitially locate the bucket
containing the
025E Laxmi 200 for each record in the data file. are scanned so as to hnd the
all the pages in this bucket
024E Laxni 300 cordin search key. However, if the
Secondary index includes index entry for each rec desired record with a given pages
mary value then all the
Figure (1): Dense Index the data file rather than for each block as in the case o pr record does not have search keý
This table in the file needs to be scanned.
index. The reason is records of the data file are no ittrih consist of a entry for every record, For the
(i) Sparse Index ue '30' with block number B' have to be searched.
according to the values of the secondary key theld But th Retrieving Records
Sparse index has an index record for only some resented Value
he olomight contain various records with the attribute function to the record's
search key,
of the Each index entry, i holds two field values 30 By applyng a hash can be identified
search key values in the file. It is used when records
e Buch index result to be expensive for maintaining the required record
the page containing
arranged sequentially according to search key value.
are K), PG)> Where K() refers to the secondaryblock. eeXact g process involves number of queries to find and retrieved in
one disk l/O.
he record and P() refers to pointer to the disk D act record.
sPECTRUM
SIA GROUP
Look for the SIA GROUP LOGO ENGINEERING STUDENTS
2 on the TITLE COVER before you Duy ALL-IN-ONE JOURNAL FOR
TU-HYDERABAD
DATABASE MANAGEMENT SYSTEMS IJNTU-HVn ryss
5.20 storage and Indexinge
UNIT-b
Example d students whe
whose roll nos lies between '19
to find the and 24 the directio
Consider a file student (which represents data records) with a hash key mo. Applying the hash function to the
In order of the
5.21
h is shown
hat contains the needed record. The hash fiunction 'h' uses the last two digits of the binary value of the mo as the bueLds the
et identifie 520IRoot node in the figure.
at 19Smo<24
A search key index of marks obtained i.e., mrks, contain <mrks, rid pairs as data entries in an auxiliary
is shown in the figure (b). The rid (record id) points to the record whose search key value 1s mrks
index
flewhi o c barrel g P 2045 da
mrks uaudsti
h (mrks)-00/ h (mrks)11 a
--L -
Leaf node
Swetha, 19,75
Aasia, 22,35
nbesopmr ameca, 21,66
Lamab, 23, 88
Zainab, 34, 80 Swetha, 15, 80 Neelima, 42, 74
Shazia, 30, 83 Ansia, 31, 43|Juveria, 32, 83 Figure: Searching for the Roll
Numbers Between 19
and 24
Uzma, 34, 74 Jameela, 19, 67 Tabbu, 46, 87(6)
Nishat, 44, 73 Inorder to find all the students roll numbers lying between 17 and 40.
L Afshan, 25,70
rthis, we first direct the search to the node A, and after analyzing
its contents,
.. he leaf node
L', which actually contains required data entry. The other leafwe
ne then forward the search
to B, followed
hmo) 00 hrmo)=1 10 entriest fulills our search criteria. For thiS, all the leaf pages nodes L',, and L', also contains the data
enched
must be designed using double linked
using the NEXT Pointer on and L' can be list. Thus, L, can be
L obtained using the NEXT pointer on L'
mo
Number of disk I/Os Length of the path from The number of eaf
Figure: Auxiliary Index File (a) (occurs iñ search) the root to a leaf pages satistying data
< rno, mrks> Pairs Hashed on mrks (b) Index File ('Student)
Based on rno Key entres
Tree-based Indexing
In this type of indexing, the records are arranged 026. Explain the distinction between closed and open hashing. Discuss relative
key values and are arranged in a hierarchal structure
in a tree-like structure. The data entries are
sorted according to the search in database applications. the merits of each technique
(by the hierarchical search data structure) so
the data entries. as to find the correct page of
Answer :
Example Closed
Hashing
Consider the student record with a search oi
key rno is arranged in a tree-structured index. In Closed hashing is a type of hásh
B,L, L and L',) we need to perform disk I/O. order to retrieve the nodes (A structure in which every data entry is stored in the array of bucket. Basically, closed handles
e SSue of bucket overflow by using a concept called overflow chaining. In this overflow buckets (0) are chained together in
The lowest leaf level contains the data entries. the form
of linked list.
The additional records with rno's
of the leaf node L',, and to the right of the leaf node L' 15 and> 40 are added to the let su
The root node is responsible for initiating the search. DOre inserting a record into the bucket, the bucket must be scanned to know whether there is an unoccupied space for the
non-leaf pages which contains node pointers These searches are then directed
separated by the search key values.
to the correct leaf pages y
the
noT space is found (i.e., ifthe bucket is full) then an overflow bucket is chained with the existing bucket and the data cniry
key value k, are pointed to by the left node
pointer of k, Similarly, the data entries
The data entries (in a subtree) smalnerua iet t bucket.However, if this overflow bucket is full then, another overflow bucket is chained with the fist overflow
pointer of k, greater than k, ng are pointed to by the athe overfilow buckets are chained together, due to which the issue of bucket overflow is handled efficiently
heor arching for a data entry, the bucket including the overflow bucketsarescannedin the sequential manner(1.e, in
StartScarch ether t uckets are inserted) using the hash function, computed based on thesearch key.Thesearch processcontinues unl
ind the seae or an unused array space is found. This unused space specifies that no such key exist in the hash table
mo15 mo
rno /0 CCord
the search terminates.
15<
Adyantages
Closed
hashing is preferred hashing technique for database systems.
It isvery easy to perform deletion operation.
Falak6,70Mahi9,64 -
Leaflevol Duadvantages
n
the cost of finding the data record and the data entry at H+ 2D+4RC and writing back the changed . a record is first inserted at 2D +
Cin students heap file and the
ittheinvolves
index and file at 2D. Hence, the total cost is,
he found in
D log, 0.15 B+ Clog, 6.7 R followed associated entry
by the addition of a new entry intheindex. Thecorectleaf p
af pageCan
and rewritingin D
Deletion
204RO-20
The cost of deletion includes,
Equity Selection Seareh
Costof finding the Cost of finding the Costof rewniting
If the selection is not based on the search key value then the entire file need to be scanned. But, if the selection is ha record in a file
themodified
on the search key value then hash function must be used to locate the bucket with the corresponding data entry. By doing entryinan index Pageintheindex
andthefile.
this, the page containing the record can be retrained. Which corresponds to,
The total cost incurred in this search accounts to the following D log, 0.15B+C log, 6.7R+D+2D
0 The page containing the qualifying entries is identifed at the cost H. The record that is to be
deleted is searched using the index, thereby
i) Retrieval of the page assuming that it is the only page present in the bucket occurs at D, in the file record. modifying or rewriting back to
it therelevant pages
(i) The cost of finding an entry after scanning half page is 4RC. 4 Equality Selection Search
(iv) Fetching a record from the file is D. Hence, the total cost is, H+ D+ 4RC +D Ifthe selection of page containing the record is not based on the search key value, then the entirefileneeds
CH+2D+4RC But, if the selection of
page to be scanned.
is done based on the search key value then by using
record is found. the index the corresponding leaf page the
In case of many matched records the cost is, b Thecost involved in this
operation is the sum of,
H+D+4RC+ One 1/0for cach
recordthatqualifies
) The cost of finding the page containing a matched entry(ies).
) The cost of finding the first matched entry and
Range Selection Search (in) The cost of finding the first matched record.
the file s
Hash indices does not support the selections that are based on ranges. Thus, the cost incurred while scanning
Which is given as,
B(D+RC)
Q30. Explain heap file with unclustered tree index. Dlog, 0.15B+Clog, 6.7R+D
Answer Case of several qualifying entries that are nonconsecutive, the cost 1s given by,
Heap File with Unclustered Tree Index Dlog, 0.158+Clog, 6.7R+One VO for each matched record.
rentriesmad
In heap file with unclustered tree index, the number of leaf pages in an index is based on the number of daaecupan Range Selection
67% 0
Search
in pages. Consider the size of each data entry to be one tenth the size of a data record. If the index pages have satisfies the tange selection 1s retneved
the range selection is matched with the composite key. Now, the record thal
then the number of leaf pages in the index will be, and alatcOther subsequent data entries are sequentialy located until the range
selection is not satistied by a data ctilug
0.1(1.5 B)= 0.15 B To fe for fetching the dala entries depend on
occupancy will e one cost is needed. Thus, the cost required
and o
umbeseach data entry, input/output
tecords instance, if the selection condition is
satisfied by 10% of the data
In the same way, the number of data cniries stored on each page with respect to its relative size und s then al ccords that satisty the range selection. For
10(0.67 R)-6.7R PECTRUN these reçords are retrieved, sorted and then maintained.
H ALL-IN-ONE JOURNAL FOR ENGINEERING sruDENTS
SIAGROUP
Look for the SlA GROUP LOGO on the TITLE covER before you
buy
MANAGEMENT SYSTEMS JNTU-HY
J.HYDERABAD
DATABASE
5.26 and Indexing
UNIT-5 Storage
PERFORMANCE TUNING Operations
5.5 INDEXES AND Aggregnte
Useof
in detall. Consider the given example. 5.27
Q81. Explain indexes and performance tuning Model Paper-l, SELECT S.sno, COUNT (") from
Answer Q106)
Student S GROUP BY S.sno.
Indexes and Performance Tuning we cho0Se and be explained
can be explained in
in terms
tern of cthis query is to count the number
1he pertormance of the system
depends greatly on the indexes the expect
ected For each value of students in
of sno we count each section sno,
sause the number if hash o
work load. then
ofthe absence otthe retrieval operation.
Decause of of index's data ash or B+ tree index
matter entries. The type exists
Work Load Impact means of indexe oftheir of Sno
criteria can be retrieved effectively by 19What is a composite search key? What are the pros index
Data entries that qualifies a particular selection and cons of comp
doesn't
Answer: composite
Two selection types are search keys?
Composite Search
Keys
) Equality and
A arch key nat contains many fields is called
(i) Range selection. a composite/concatenated
as we as nserts, deletes and updates whereas search key.
Tree-based indexing supports both the selection criteria y
onls.
equality
Example
apart from insertion, deletion and updation. record with the fields name,
selection is supported by Harsh-based indexing rno and
shown in the figure. mrks which is sorted
Advantages of Using Tree-Structured Indexes keys is by name composite
entries can be
be handled
handled effectively
effectively. indexwith the various
By using tree-structured indexes, insertion and deletion of data rno,mrks>
in a sorted file.
2. It finds the corect leaf page faster than binary search Index1 75
mrks, mo>
search lor such employees and then scanning the file from that pOints
inary nurmber of pages fetched. -Overflow
utne cost of binary search is proportional to the Prmary
is to create a second file with one record
per page in the ongmal (data). file in the form of indev Pages Pages
Analternative approach in our example). Such an mdex hle wll have the followineformat.
for es
ll Figure (1: Structure of ISAM
(Key, page)and then sort the file by the search key (gpa field Index
The ISAM index can be created in one of the following alternmatives.
Index Entry Adata entry in index is actually a data record with the search kev
k=Key k
P-Pointer to a Page data entry is a pair of (k, rid) where k is the search key value
and rid is the idof a data record.
A data entry is a pair of (&, _lust) where k is the search key
rid
value and rid list is alistof datarecord
cecond alternative is used then the datà records id
Figure: Index Page Format are stored in a separate file
Each key acts as a separator for each left and right contents of the pages printed by the pointers. An index page number of pa
f the ISAM index. When the
ile is created then irst all the and the pairs(k,rid)are storedin
theleaf
value k. (If the file is created leal pages are sequentially allocated
using choice (2) or (5) then and thensortedusingthe
pointers one more than the number of keys. first the data records are created
of ISAM index are allocated). Then all the non-leal pages are
andsorted and then the leaf
To this example query, a binary search of the index file is done to identify a page that cotains the records starting with overfiow area to leaf page if
there are more insertions. The figure
allocated. The additional overfilow pages are
added from an
the search key (gpa) value and then following the pointer to the page to get the first data record with that key value. Thereby the (2) shows page allocation.
are scanned to identify the conditions that satisfy.
2 nofbdare Data Pages sunl i
Hence, query uses a simple ong level indexing file. The structure of one level index file is shown below, ohs
elinokleetadut
Index F
heitn2shhe
Overtlow Pages
Data File
Figure (2): Allocation of Page in ISAM
Operations of ISAM
Figure: Structure of One-level Index
Because an entry size in the index file will be smaller than the page size in the data file, there will be only one entry in the An ISAM structure supports the
basic operations i.e., insertion, deletion and search very well. An
pocessed by starting the search at the root node and equality selection search
indexfile for per page ofthe data file, Hence the index file will be smaller than the data file. Therefore, the binary search on index determining which of the subtree to search by comparng the
HSnt in both search values
filewill much fasterthan binary search on data file. However, the index file will still larger to perform insertion or deletion and it the given record and the curent node. If match
is found then the search is successful otherwise search failed.
will be very expensive. The large size of index file brings on idea of tree indexing, in which one-level index structure is repeated a range query' is also appropriate in the same way by determining
that leads to atree structure. Tetmeving
the data pages sequentially. the starting point in the data (or leat) level and then
Q34. Explain about ISAM along with its pros and cons.
OR FOr
on and deletion, the approximate page is determined similarly as for search and then the record is
insertion, if leaf page is full then overflow pages are
inserted or
Example added.
Explain deletion and insertion operations in ISAM with examples. (May-19(R16), Q11(a) May-17(R15), 0100)
|
release the space on deleting a record so as to prevent the database performance from getu Ording to the value of search key. GROUP
It requires to WECTRLUM SIA
ENGINEERING STUDENTS
Look for the SIA GROUP LOoGo on the TITLE COVER before you
buy nLL-IN-ONE JOURNAL FOR
(JNTU-HYDEDAL
DATABASE MANAGEMENT SYSTEMS ERABAD)
32 5Storage and Indexing T
ructure of Non-leaf Node elseif Kim K, mP represents i tree
pointer The insertion algorithm is as
follows,
in node m /
In B'-ree, the rion-leaf nodes form a multilevel sparse
aex on the leaf nodes. In the structure of non-leaf node, the then m: m.P m-root niode;
ointera point to troc tiode else read m
ADon-leaf can carry upto n pointers and must carry at begin setS /S represents stack/
castpointers search node n for an entry i such that while (m is not a leaf node) do
For a non-eaf with m pointers, m. K,,K ám.K; begin
0) All search keys in the subtree to whichP, points are less
m mP Push address of node "m' on stack S;
than K
For 2 i s n-1, all the search keys in the subtree to
end; t-number oftree pointers in node m;
which P, points have values greater than or equal to K., read n
and less than K
ifK mk
end; then m m.P,
Gii) All search keys in the subtree t6 which P. points have
Searchblock m for entry (K, Pr) with K
values. greater than or equal to
r K else ifK mk,
Q36. Explain all the operations on B'-trees by taking if found then m mP,
a sample example. then read data file block with addres Pr and else
OR
retrieve a record, begin
Discuss insert, delete, search operations on
Btrees, else seatch node m for recordi such
Hov DecA8R10), 011) that mk,, <K<mk
OR record with search key value "K' is not in the m m.P
Describe the insertion and deletion operations cnd;
in B' trees. data file;
Apri-18(R10), Q10(h)
GReforOnly Topica nzertion, Delerion (ii) read n
Insertion
OR end;
Inserting a record in B'-tree inítially assumes that tree
Explain deletion and insertion operation in B contains only root node, which is also treated as leaf node. When Search m for record
K, Pr, with K K; searching forleaf node
trees. May-17(P15), 116) the level of the tree is incremented by 1, the tree is divided into if found
Refer Only Topies: Insertion, Deletio) leaf nodes and non-leaf (internal) nodes
OR It is important to note that every search-key value appears
then record already exists in file,
Explain the insertion and deletion operations in leaf node. The reason is all leaf nodes contain pointers to dala clse
in Btrees with eKample, records,However, some search key values exist at non-leaf
insert record in B-tree
begin
(Refer Only Topies: Tnsertion Deletion nodesto guide the search for records ín the index. Another point
be noted is every search-key value that exists in non-leaf node Create record
(Model Paperai, 011() 1 May-19(R16), a10h)) 1o
Answer (K, Pr)
is also exists as fight most-value in the leaf level of the Subire
The operations on BAree includes,
pointed at by the tree pointer to the left of the value.
*
Pr points to the new record/
() Scarch if leaf node m is
To insert a record, if leaf node is not full then insert the not full
(ii) Insertion theninsert entry (K, Pr) in correct position
TeCord in correct position in leaf node. Otherwise i.e., if leat
Gii) Deletíon. in m
node is full with Precord pointers then split the node into else
( Search two nodes. /leafnodeisfull/
begin
Scarching aB'-tree for a key value always begins at the are
root node. A search for single key value always follows one After spliting. j=Put the first j' records temp: n,
path from the root node to leaf node.
placed in the original node and the left of the records are mov T
temp refers to over size leaf node
The search algorithm for B'-trees is as follows, 1O the new leaf node, In
the non-leaf node of the parent, insert entry (K, Pr) in temp;
m: root node; search value is inserted. In addition to this, an extra pom
read n the new node is created and inserted in the parent noue hew-a new empty leaf node,
ew.
while (m is not leaf node) do begin f the internal node is not full, then insert the
cOrrect position in internal node. Otherwise i.c, if the
recor
Pn.P
tnumber oftree pointers in.node m; node is full with p tree pointers then split the node anced
ifKS mK, mK, represents i search field value are placed
in node m */
nodes. After splitting, the records upto tree pom moved
in the same node and the records from tree pointer yare m-firstj recordsin temp,
then m: mP to the new internal node. Wremaining entries in temp,
PECTRI
LODk for the SlA GROUP LOGO on the TITLE CovER before you buy LL-IN-ONE JOURNAL FOR ENGINEERING STaDENTS SIA GROUP 2
UNIT-6
Storage and Indexing 6A
TU-HYDERABADI
5.34 DATABASE MANAGEMENT SYSTEMS Value 6
IoertngKey 5.35
K-K contal one leaf node, L, and it is empty. So, insert key value 6 inleafnodeIL,
Bree
finiahed-false
repeat
f stack N is empty Value 9
asertngKey
then
noparent node;" To insert a key
value 9, search for the locatic where the key is expected to occur,
It is foundto be
begin value 9 in leaf
node L leafnode
the key L,So,insert
root -8 new empty internal node.
root n, K, new >
finished
true
aserting Key Value
2
end
else value 2, search for the location where the key 1s expected occur.
iea inucrt a key
to It is found to leaf node Bua lesf oe
k full i.e, it contains maxXimumtwo records, >0, Inserting a new record results in overflow. split the leaf L,
begin The first node contains first half of the keys and the second S, ode
odes node contains second half of the keys.
npop stack S;
if internal node m is not full
hen
begin Now, a new root node is required to point to leaf nodes L, and L,. So, create root node
a
insert(K, new ) in node m
finished true;
else
begin / internal node m is full/ de ach lenf node is half ful,
temmpm Inserting Key Value 8
insert(%, new ) in temp
new a new empty intenal node;
onrt
the key
akey value 8, nearch for the location where the key is expected to oecur. tis
value 8 in leaf node L
found to be loaf node
l,So,inen
j-2
m-records up lo tree ponter
now4-records from tree ponter P,
yitatefe
nserting Key
K-K Value 4
end
ond
,, 10 Bplit the n
split the
40
node into
, BCrch for the location where the key is expected to occur.
two noden L, and iy
The location is found to D
until finished
end
nd;
.ook for the SIA GROUP L000 on the TTLE COVER before you buy
L
5.38 DATABASE MANAGEMENT SYSTEMS
[JNTU.HY
DERABAD
mee
because records take nore
usually
e nie organization, maintaining good space utilization is necessarymore siribution ce
keys and pointers. The space utilization can be improved in a B' tree by adding sibling nodes in redistribut
during splits and metges. This technique can be used for leaf nodes as well as non lent noues.
oirocore
Daring insertion, itblock does not contain enough memory space for new entry, then, in order to create space for
stributes some of its records to one oftheadjacent block. If the adjacent block is also ful, then systemmdi divides
th
block into two blocks and redistributes the records to one among the adjacent blocks. Eachblock must hold at least 2
records
where 'n represents the number of records that the block can hold.
EXERCISE QUESTIONS
Discuss about dynamic multi-level indexing.
2 Write the differences between variable-length
and fixed-length file organization.
Implement open hashing technique with
an example.
Distinguish between B-tree and B+-tree.
(a) Four
(b) Five
(c) Eight.