0% found this document useful (0 votes)
13 views

Indexed Structures

Uploaded by

fdlm096
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Indexed Structures

Uploaded by

fdlm096
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Democratic and Popular Republic of Algeria

Ministry of Higher Education and Scientific Research

Ecole supérieure en sciences et technologies de


l’informatique et du numérique

Indexed sequential structures

Presented by : Dr. Daoudi Meroua

Academic year: 2024/2025


Files with Indexes

➔Searching for a record in a sequential file structure is generally


costly

 → sequential search

→ binary search in a (very) large file

➔Indexing is a data structure technique that allows efficient


retrieval of file records based on certain attributes on which the
indexing has been performed.

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 2
Files with Indexes
The attribute (or group of attributes) used to search for records is called a "search
key."
For example, in a meteorological measurements file:
File of meteorological measurements
< city, date, temperature >
Search examples:
→ Find the record(s) where city = 'DJELFA'
Result:
‘DJELFA’, ‘2015-06-23’, 21
‘DJELFA’, ‘2013-10-04’, 15
‘DJELFA’, ‘2015-06-22’, 20
‘DJELFA’, ‘2020-07-16’, 29
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 3
Files with Indexes
An index is an ordered table in main memory (MC), containing,
among other things, pairs: < key, address >

Key adr

Data file

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 4
Files with Indexes
Example: Search for the record with the attribute value A1 = 54

→ Perform a binary search for 54 in the index table in main memory (MC):
result adr = <4,2>

→ LireDir(F, 4, buf) and retrieve the record buf.tab[2]

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 5
Files with Indexes

Index table (MC) Data file (MS) Index file (MS)

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 6
Files with Indexes
The key can have unique values or not (multiple values).

Example of an index on a key attribute with multiple values

Key adr

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 7
Files with Indexes
Different representations of index tables with multiple values:
Key adr

1) One entry per key value.


Key adr
Key adr

2) Multiple entries per key value.


Key adr

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 8
Files with Indexes
The data file can be ordered by the
key or not.

If the data file is ordered (by the


key attribute)

⇒ Non-dense index (Clustered


Index) does not contain all the
values of the key attribute.

In this example, each entry in the


index table contains the largest key
of a group of two consecutive
blocks.

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 9
Files with Indexes
The data file can be ordered
by the key or not.
2) If the data file is not
ordered (by the key
attribute)
⇒ Dense index (Non-
Clustered Index)
contains all the values of
the key attribute.

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 10
Files with Indexes : basic operations

Record Search

Search in the index in main memory (MC), then access the data file.

● Exact query (key = value) → binary search for the exact value.
● Interval query (key ∈ [a, b]) → binary search for ‘a’ + sequential search
for the following values up to ‘b’.

Insertion / Deletion of Records


Insertions/deletions of records in the data file and, if necessary, update the
index in MC.

Case of Ordered File:


More efficient interval query.
Deletion is more costly.
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 11
Files with Indexes : : basic operations
Example: Insertion in T~OF with a dense index and unique key values.
Type Tbloc = Struct
tab : tableau[ b ] de typeEnreg Tcouple = Struct
NB : entier cle : typeqlq ;
Fin numBlc , depl : entier
Var F : FICHIER de Tbloc BUFFER buf ENTETE Fin ( entier )
Index : tableau [ MaxIndex ] de Tcouple
NbE : entier // number of elements in the index table (== number of records in
the file F)
Ins( e:TypeEnreg )
Rech( e.cle , trouv , k ) // Search (binary) in the index table
SI ( Non trouv )
// Insertion at the end of the data file ...
OUVRIR( F, « donnees.dat » , ‘A’ )
i ← Entete( F , 1 )
LireDir( F , i , buf )

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 12
Files with Indexes
SI ( buf.NB < b ) buf.NB++ ; j ← buf.NB ; buf.tab[ j ] ← e
EcrireDir( F , i , buf )
SINON
i++ ; j ← 1 ;
buf.NB ← 1 ;
buf.tab[ j ] ← e
Aff_entete( F, 1, i ) ; EcrireDir( F , i , buf )
FSI
FERMER( F )
// Insertion in the index table ...
NbE++ ; m ← NbE
TQ ( m > k )
Index[ m ] ← Index[ m-1 ] ;
m–
FTQ
Index[ k ] ← < e.c , i , j > // clé, numBlc, depl
FSI
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 13
Files with Indexes
Same example but with non-unique key values.
Type Tcouple = Struct maillon = struct
cle : typeqlq ; val : struct (numblc , depl :
tete : ptr(maillon) entier) ;
Fin adr : ptr(maillon)
Var Index : tableau [ MaxIndex ] de Tcouple Fin
Ins( e:TypeEnreg )
// Insertion at the end of the data file ...
OUVRIR( F, « donnees.dat » , ‘A’ )
i ← Entete( F , 1 )
LireDir( F , i , buf )
SI ( buf.NB < b ) buf.NB++ ; j ← buf.NB ; buf.tab[ j ] ← e
EcrireDir( F , i , buf )
SINON
i++ ; j ← 1 ; buf.NB ← 1 ; buf.tab[ j ] ← e
Aff_entete( F, 1, i ) ; EcrireDir( F , i , buf )
FSI
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 14
Files with Indexes
FERMER( F )
// Insertion in the index table ...
Rech( e.cle , trouv , k )
SI ( trouv ) // Add a link <i, j> to the list index[k].head
Allouer( p ) ;
Affval( p , < i , j > ) ;
Affadr( p , Index[ k ].tete ) ;
Index[ k ].tete = p
SINON // Insert a new entry <key, <i, j>> in the index at position k.
NbE++ ;
m ← NbE ;
Allouer(p) ;
Affval(p, < i , j >) ;
Affadr(p,nil)
TQ ( m > k ) Index[ m ] ← Index[ m-1 ] ; m-- FTQ
Index[ k ] ← < e.c , p > // key = e.c, head = p
FSI
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 15
Files with Indexes
Management of an Overflow Area

Non-Dense Index Table

Overflow Area Data File


Primary Area Data File

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 16
Files with Indexes
Example: Index for LOF File
(no inter-block offsets and no overflow area)

2020/2021
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 17
Files with Indexes
Exemple :LOF File / Insertion
The insertion of c5 causes
the overflow of block i:

Add a new block → i’

Split the content of i into two


halves

Update the index by


inserting a new entry for
block i’

2020/2021
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 18
Files with Indexes
Index in Main Memory in the form of BST

Type Tnoeud = struct


cle : typeqlq
numBlc , depl : entier
fg , fd : ptr(Tnoeud)
Fin

2020/2021
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 19
Files with Indexes : Large Index

Index in central
memory in the file in main
form of an memory (MC)
ordered file with
contiguous
blocks

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 20
Files with Indexes : Large Index
Index
Multiniveaux

2020/2021
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 21
Files with Indexes : Multi-Key Query

2020/2021
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 22
Files with Indexes : Multi-Key Query

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 23
Files with Indexes : Multi-Key Query
Find all records where the value of X = vx AND the value of Y = vy AND
…” with X, Y, ... as ‘secondary keys’ (For each secondary key, there is a
corresponding secondary index):

● Using the secondary index X, find the list Lx of primary keys associated
with the value vx.

● (Repeat the same action for each secondary key mentioned in the
query…)

● Perform the intersection of the primary key lists Lx, Ly, ... to find the
primary keys associated with each secondary key value mentioned in
the query.

● Use the primary index to retrieve the records from the data file (by first
sorting the sequence of block numbers before performing the physical
transfers).
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 24
Files with Indexes : Multi-Key Query
If we are searching for all records
where A2 = ‘eee’ and A3 = 870, the
multi-key query algorithm will
proceed as follows:
a. Search for ‘eee’ in the index
IndA2 → result: LA2 = [32, 65, 70]
b. Search for 870 in the index IndA3
→ result: LA3 = [32]
c. Intersection of LA2 and LA3 →
result: Final L: [32]
d. Search for 32 in IndA1 → result:
block number <2>
e. ReadDir(F, 2, buf) and retrieve
the record “<32, bbb, 870, …>”

2024/2024
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 25
Files with Indexes : Multi-Key Query
Insertion of a record < c, vx, vy, ... >
● Search for c in the primary index → ip: the index where this key should be
inserted (binary search).
● Insert the record into the data file → adr: the address where the record has
been inserted.
● Insert in the primary index, at position ip, the entry < c, adr > if it is a dense
index, or update the entry at index ip if it is a non-dense index.
● Search for the value vx in the secondary index X.
● If vx exists, add c to the list pointed to by vx.
● If vx does not exist, insert vx in the secondary index X.
● → In this case, the new entry vx will point to a list formed by a single primary
key (c).
● Repeat step 4) for each remaining secondary key (vy, ...).

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 26
Files with Indexes : Multi-Key Query
Deletion of a record < c, vx, vy, ... >

● To logically delete a record with primary key c, it is sufficient to set a


deletion bit (or character) in the data file or in the primary index table for
the entry c.

● To physically delete a record with primary key c, you must first physically
remove the record from the data file, and then update the primary index
table either by deleting the entry related to c (in the case of a dense
index) or by modifying the key and/or address of the representative of the
group to which the deleted record belongs (in the case of a non-dense
index).

In both types of deletion (logical or physical), it is not necessary to update


the secondary indexes.

2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 27
Files with Indexes : Index Bitmap
Index Bitmap
A bitmap index on an attribute A (formed by m different values: v1, v2, … vm)
consists of m binary strings, each with N bits (IndA_v1, IndA_v2, ...
IndA_vm):
Each string IndA_vj is associated with the value vj of attribute A.
● If (IndA_vj[k] = 1), then in record number k, attribute A equals vj.
● If (IndA_vj[k] = 0), then in record number k, attribute A is different from vj.

Record number

The bit string associated with v1

The bit string associated with v2

The bit string associated with vm

Examples:
A = v2 in record number 2 and record number i of the data file.
A = v1 in records number 1, 5, 6, 8, … N-2 and N-1.
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 28
Files with Indexes : Index Bitmap
Bitmap indexes can be useful for attributes with low cardinality (e.g., < 20
distinct values).
The different bit strings can be loaded into main memory (MC) independently
of each other.
They are primarily used for multi-key queries on attributes with low cardinality.
Example: “Find records where A = v2 and B = w4.

Cardinality of A = 3

Cardinality of w = 4

The result of the query is given by the binary operation: (IndA_v2 AND
IndB_w4)
→ Records number 7 and number i.
2024/2025
2ème année CP Pr Hidouci W.K. (http://hidouci.esi.dz) / SFSD / ESI 28

You might also like