0% found this document useful (0 votes)
36 views

SingleLevelIndexing Examples

This document discusses different types of single-level indexes that can be used to access records in a data file. It describes primary indexes, clustering indexes, and secondary indexes. A primary index is defined on an ordered data file, with one index entry per data block pointing to the first record in that block. A clustering index is also defined on an ordered file but the data is ordered on a non-key field. A secondary index is defined on an unordered data file and has one index entry per record pointing to its location. Secondary indexes allow faster retrieval of individual records than primary indexes but require more storage space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

SingleLevelIndexing Examples

This document discusses different types of single-level indexes that can be used to access records in a data file. It describes primary indexes, clustering indexes, and secondary indexes. A primary index is defined on an ordered data file, with one index entry per data block pointing to the first record in that block. A clustering index is also defined on an ordered file but the data is ordered on a non-key field. A secondary index is defined on an unordered data file and has one index entry per record pointing to its location. Secondary indexes allow faster retrieval of individual records than primary indexes but require more storage space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 24

NOTES VI

File Organizations and Indexing


7 Indexes as Access Paths
8 Types of Single-level Indexes
8.1 Primary Indexes
8.2 l!stering Indexes
8." Secondary Indexes
Indexes as Access Paths
1 Introduction
- Indexes: access structures
o The index is called an access path on the field
o Used to seed u the retrie!al of records in resonse to certain search conditions
o Indexing fields: used to construct the index
- A single"le!el index is an auxiliar# file that $a%es it $ore efficient to search for a record in the data file
" The index is usuall# secified on one field of the file &although it could 'e secified on se!eral fields(
" One for$ of an index is a file of entries )field !alue* ointer to record+* ,hich is ordered '# field !alue
" The index file usuall# occuies considera'l# less dis% 'loc%s than the data file 'ecause its entries are $uch s$aller
" A 'inar# search on the index #ields a ointer to the file record
Exa$le: -i!en the follo,ing data file:
E.P/O0EE&NA.E* SSN* A112ESS* 3O4* SA/* 555 (
Suose that:
record size 26178 '#tes
'loc% size 46719 '#tes
r6:8888 records
Then* ,e get:
'loc%ing factor 4fr6 4 di! 26 719 di! 1786 : records;'loc%
nu$'er of file 'loc%s '6 &r;4fr(6 &:8888;:(6 18888 'loc%s
For an index on the SSN field* assu$e the field size V
SSN
6< '#tes*
assu$e the record ointer size P
2
6= '#tes5 Then:
index entr# size 2
I
6&V
SSN
> P
2
(6&<>=(61? '#tes
index 'loc%ing factor 4fr
I
6 4 di! 2
I
6 719 di! 1?6 :9 entries;'loc%
nu$'er of index 'loc%s '6 &r;4fr
I
(6 &:8888;:9(6 <:@ 'loc%s
'inar# search needs log
9
'
I
6 log
9
<:@6 18 'loc% accesses
This is co$ared to an a!erage linear search cost of:
&';9(6 :8888;96 17888 'loc% accesses
If the file records are ordered* the 'inar# search cost ,ould 'e:
log
9
'6 log
9
:88886 17 'loc% accesses
9 T#es of Single"/e!el Indexes
951 Pri$ar# Index
" 1efined on an ordered data file
" The data file is ordered on a key field
- Includes one index entr# for each block in the data fileA the index entr# has the %e# field !alue for the first record in the 'loc%* ,hich is
called the block anchor
o A ri$ar# index is an ordered file ,hose records are of fixed length ,ith t,o fields5
The first field is of the sa$e data t#e as the ordering %e# fieldBcalled the ri$ar# %e#Bof the data file* and
the second field is a ointer to a dis% 'loc% &a 'loc% address(5
Ce refer to the t,o field !alues of index entr# i as )D&i(* P&i(+5
- Exa$les &refer to figure(
o Ce use the NAME field as ri$ar# %e#* 'ecause that is the ordering %e# field of the file &assu$ing that each !alue of NAME is
uniEue(5
o Each entr# in the index has a NAME !alue and a ointer5 The first three index entries are as follo,s:
)D&1( 6 &Aaron*Ed(* P&1( 6 address of 'loc% 1+
)D&9( 6 &Ada$s*3ohn(* P&9( 6 address of 'loc% 9+
)D&:( 6 &Alexander*Ed(* P&:( 6 address of 'loc% :+
- Indexes can also 'e characterized as dense or sarse5
o A dense index has an index entr# for every search key value &and hence e!er# record( in the data file5
o A sarse &or nondense( index has index entries for onl# so$e of the search !alues5
- A ri$ar# index is hence a nondense &sarse( index*
o since it includes an entr# for each dis% 'loc% of the data file rather than for e!er# search !alue &or e!er# record(5
- The index file for a ri$ar# index needs su'stantiall# fe,er 'loc%s than does the data file* for t,o reasons5
o First* there are fewer index entries than there are records in the data file5
o Second* each index entr# is t#icall# smaller in size than a data record 'ecause it has onl# t,o fieldsA
FonseEuentl#* $ore index entries than data records can fit in one 'loc%5
A 'inar# search on the index file hence reEuires fe,er 'loc% accesses than a 'inar# search on the data file5
- A record ,hose ri$ar# %e# !alue is D lies in the 'loc% ,hose address is P&i(*
o ,here D&i( < D ) D&i > 1(5
o The i
th
'loc% in the data file contains all such records 'ecause of the h#sical ordering of the file records on the ri$ar# %e# field5
o To retrie!e a record* gi!en the !alue D of its ri$ar# %e# field*
Ce do a 'inar# search on the index file to find the aroriate index entr# i* and
Then retrie!e the data file 'loc% ,hose address is P&i(
- A
- Exa$les
- Suose that ,e ha!e an ordered file
o ,ith r 6 :8*888 records stored on a dis%
o ,ith 'loc% size 4 6 189G '#tes5
o File records are of fixed size and are unsanned*
,ith record length 2 6 188 '#tes5
o The 'loc%ing factor for the file ,ould 'e 'fr 6 &4;2( 6 &189G;188( 6 18 records er 'loc%5
o The nu$'er of 'loc%s needed for the file is
' 6 &r;'fr( 6 &:8*888;18( 6 :888 'loc%s5
o A 'inar# search on the data file ,ould need aroxi$atel#
log9' 6 &log9:888( 6 19 'loc% accesses5
o No, suose that
the ordering %e# field of the file is V 6 < '#tes long*
a 'loc% ointer is P 6 ? '#tes long* and
,e ha!e constructed a ri$ar# index for the file5
The size of each index entr# is 2i 6 &< > ?( 6 17 '#tes*
so the 'loc%ing factor for the index is
o 'fri 6 &4;2i( 6 &189G;17( 6 ?@ entries er 'loc%5
The total nu$'er of index entries ri is eEual to the nu$'er of 'loc%s in the data file* ,hich is :8885
The nu$'er of index 'loc%s is hence
'i 6 &ri;'fri( 6 &:888;?@( 6 G7 'loc%s5
To erfor$ a 'inar# search on the index file ,ould need
&log9'i( 6 &log9G7( 6 ? 'loc% accesses5
To search for a record using the index* ,e need one additional 'loc% access to the data file for a total of ? > 1 6 = 'loc%
accesses
an i$ro!e$ent o!er 'inar# search on the data file* ,hich reEuired 19 'loc% accesses5
- A $aHor ro'le$ ,ith a ri$ar# indexBas ,ith an# ordered fileBis insertion and deletion of records5
o if ,e atte$t to insert a record in its correct osition in the data file*
,e ha!e to not onl# $o!e records to $a%e sace for the ne, record 'ut also change so$e index entries*
since $o!ing records ,ill change the anchor records of so$e 'loc%s5
-
959 Flustering Index
" 1efined on an ordered data file
- The data file is ordered on a non-key field
- A clustering index is also an ordered file ,ith t,o fieldsA
o the first field is of the sa$e t#e as the clustering field of the data file* and
o the second field is a 'loc% ointer5
- There is one entr# in the clustering index for each distinct value of the clustering field* containing
o the !alue and
o a ointer to the first block in the data file that has a record ,ith that !alue for its clustering field5
- 2ecord insertion and deletion still cause ro'le$s* 'ecause the data records are h#sicall# ordered5
o To alle!iate the ro'le$ of insertion* it is co$$on to reser!e a ,hole 'loc% &or a cluster of contiguous 'loc%s( for each value of
the clustering fieldA
o all records ,ith that !alue are laced in the 'loc% &or 'loc% cluster(5
This $a%es insertion and deletion relati!el# straightfor,ard5
95: Secondar# Index
" 1efined on an unordered data file
- Fan 'e defined on
o a %e# field &,ith a uniEue !alue( or
o a non"%e# field ,ith dulicate !alues
- A secondar# index is also an ordered file ,ith t,o fields5
o The first field is of the sa$e data t#e as so$e nonordering field of the data file that is an indexing field5
o The second field is either a block ointer or a record ointer5
- There can 'e many secondar# indexes &and hence* indexing fields( for the sa$e file5
- Ce first consider a secondar# index access structure on a %e# field that has a distinct value for e!er# record5
o Such a field is so$eti$es called a secondar# %e#5
o In this case there is one index entr# for each record in the data file*
The index entr# contains
the !alue of the secondar# %e# for the record and
a ointer either to the 'loc% in ,hich the record is stored or to the record itself5
- Indexes can also 'e characterized as dense or sarse5
o A dense index has an index entr# for every search key value &and hence e!er# record( in the data file5
o A sarse &or nondense( index has index entries for onl# so$e of the search !alues5
- Therefore* Secondar# index is dense5
- Ce refer to the t,o field !alues of index entr# i as )D&i(* P&i(+5
o The entries are ordered '# !alue of D&i(* so ,e can erfor$ a 'inar# search5
o 4ecause the records of the data file are not h#sicall# ordered '# !alues of the secondar# %e# field*
Ce cannot use 'loc% anchors5
That is ,h# an index entr# is created for each record in the data file* rather than for each 'loc%* as in the case of a
ri$ar# index5
- The follo,ing figure illustrates a secondar# index in ,hich the ointers P&i( in the index entries are block pointers, not record ointers5
o Once the aroriate 'loc% is transferred to $ain $e$or#* a search for the desired record ,ithin the 'loc% can 'e carried out5
- A secondar# index usuall# needs $ore storage sace and longer search ti$e than does a ri$ar# index*
o 'ecause of its larger nu$'er of entries5
o Io,e!er* the improvement in search ti$e for an ar'itrar# record is $uch greater for a secondar# index than for a ri$ar#
index*
since ,e ,ould ha!e to do a linear search on the data file if the secondar# index did not exist5
o For a ri$ar# index* ,e could still use a 'inar# search on the $ain file* e!en if the index did not exist5
- Exa$le: the i$ro!e$ent in nu$'er of 'loc%s accessed5
o Fonsider the file of Exa$le 1
Exa$le1:
Cith r 6 :8*888 fixed"length records of size 2 6 188 '#tes stored on a dis%
Cith 'loc% size 4 6 189G '#tes5
The file has ' 6 :888 'loc%s* as calculated in Exa$le 15
To do a linear search on the file* ,e ,ould reEuire ';9 6 :888;9 6 1788 'loc% accesses on the a!erage5
o Suose that ,e construct a secondar# index on a nonordering %e# field of the file that is V 6 < '#tes long5
As in Exa$le 1* a 'loc% ointer is P 6 ? '#tes long*
so each index entr# is 2i 6 &< > ?( 6 17 '#tes* and
the 'loc%ing factor for the index is 'fri 6 &4;2i( 6 &189G;17( 6 ?@ entries er 'loc%5
In a dense secondar# index such as this*
o the total nu$'er of index entries ri is eEual to the number of records in the data file* ,hich is :8*8885
o The nu$'er of 'loc%s needed for the index is hence
'i 6 &ri;'fri( 6 &:8*888;?@( 6 GG9 'loc%s5
o A 'inar# search on this secondar# index needs
&log9'i( 6 &log9GG9( 6 < 'loc% accesses5
To search for a record using the index*
,e need an additional 'loc% access to the data file for a total of < > 1 6 18 'loc% accesses
a !ast i$ro!e$ent o!er the 1788 'loc% accesses needed on the a!erage for a linear search*
'ut slightl# ,orse than the se!en 'loc% accesses reEuired for the ri$ar# index5
- Freating a secondar# index on a nonkey field of a file5
- In this case* nu$erous records in the data file can ha!e the sa$e !alue for the indexing field5
o There are se!eral otions for i$le$enting such an index:
o Otion 1 is to include se!eral index entries ,ith the sa$e D&i( !alueBone for each record5 This ,ould 'e a dense index5
o Otion 9 is to ha!e !aria'le"length records for the index entries* ,ith a reeating field for the ointer5
Ce %ee a list of ointers )P&i*1(* 555* P&i*%(+ in the index entr# for D&i(
one ointer to each 'loc% that contains a record ,hose indexing field !alue eEuals D&i(5
In either otion 1 or otion 9* the 'inar# search algorith$ on the index $ust 'e $odified aroriatel#5
o Otion :* ,hich is $ore co$$onl# used* is
to %ee the index entries the$sel!es at a fixed length and ha!e a single entr# for each index field value* 'ut
to create an extra le!el of indirection to handle the $ultile ointers5
In this nondense sche$e*
the ointer P&i( in index entr# )D&i(* P&i(+ oints to a block of record pointers;
each record ointer in that 'loc% oints to one of the data file records ,ith !alue D&i( for the indexing field5
If so$e !alue D&i( occurs in too $an# records* so that their record ointers cannot fit in a single dis% 'loc%* a
cluster or lin%ed list of 'loc%s is used5
This techniEue is illustrated in the follo,ing figure5
2etrie!al !ia the index reEuires one or $ore additional 'loc% access
4ecause of the extra le!el* 'ut the algorith$s for searching the index and &$ore i$ortantl#( for inserting of ne,
records in the data file are straightfor,ard5
T#es of Indexes
Ordering Field Nonordering field
De# field Pri$ar# index Secondar# index &%e#(
Non%e# field Flustering index Secondar# index &non%e#(
Proerties of Index T#es
Type of
Index
Nu$'er of &First"le!el( Index Entries 1ense or Nondense 4loc% Anchoring on the
1ata File
Pri$ar# Nu$'er of 'loc%s in data file Nondense 0es
Flustering Nu$'er of distinct index field !alues Nondense
0es;no
&Note a(
Secondar# &%e#( Nu$'er of records in data file 1ense No
Secondar#
&non%e#(
Nu$'er of records &Note '( or Nu$'er of
distinct index field !alues &Note c(
1ense or Nondense No

You might also like