G3 - R-Tree, R+-Tree
G3 - R-Tree, R+-Tree
R-Tree
Group 3
Dương | Hiếu | Nam
Content
A. Introduction
Index | B-Tree | B+-Tree
● Clustering Index is defined on an ordered data file. The data file is ordered
on a non-key field.
A node in a B-tree
B-Tree
● In a B-tree, every value of the search field appears once at some level in
the tree, along with a data pointer.
● In a B+-tree, data pointers are stored only at the leaf nodes of the tree,
so the structure of leaf nodes differs from the structure of internal nodes
+
B -Tree
Using the Pnext pointer it links all the leaf nodes, just like a linked list, thereby
achieving ordered access to the records stored in the disk.
+
B -Tree
An example of B+ Tree
+
B-Tree vs B -Tree
B-Tree B+-Tree
Search keys can not be repeatedly stored Search keys can be repeatedly stored
Data is stored in leaf and internal nodes Data is only stored on leaf nodes
Leaf nodes can not be linked together Leaf nodes are linked together
B. Spatial Data
Indexing
Introduction to Spatial Databases (1/4)
Spatial databases are optimized for storing and querying spatial data that
represents objects defined in a geometric space (k-dimensional).
R
E.g. Find all points in R
Introduction to Spatial Databases (2b/4)
Spatial queries:
● Nearest neighbor queries: Find k-closest objects to a given location
E.g. Find the police car that is closest to the location of a crime
A1 A1
B1
B2
B1 A2
A2 B3 B4
B2
● Indexing structures for point data include Grid files, hB trees, KD trees,
Point Quad trees, and SR trees.
● Indexing structures handle regions as well as point data include Region
Quad trees, SKD trees, and R-Trees.
R-Tree
The R tree is a height-balanced tree, which is an extension of the B+ tree for
k-dimensions, where k > 1.
if N is non-leaf
foreach entry e in N // (e=<bmr, ptr>)
if e.mbr overlaps Q, search subtree identified by e.ptr
else // N is leaf
foreach entry e in N
if e.mbr overlaps Q, add e.ptr to the answer list
R-Tree \ Search (2/3)
1 G5 G6 9
G6
G5
F G
A C G1 M
B G3 2 G1 G2 6 G3 G4
H
G4 N
E A B C D E F G H M N
G2 D
3 4 5 7 8
R-Tree \ Search (3/3)
Main points:
● A child MBR may be covered by more than one parent, but it is stored
under only one of them.
I2. [Add record to leaf node] If L has room for another entry, install E.
Otherwise invoke SplitNode to obtain L and LL containing E and all the old entries
of L.
I3. [Propagate changes upward] Invoke AdjustTree on L, also passing LL if a split was
performed.
I4. [Grow tree taller] If node split propagation caused the root to split, create a new
root whose children are the two resulting nodes.
R-Tree \ Insertion (2/3)
Algorithm ChooseLeaf: Select a leaf node in which to place a new index entry E.
CL3 [Choose subtree] If N is not a leaf, let F be the entry in N whose rectangle F I needs
least enlargement to include E I. Resolve ties by choosing the entry with the
rectangle for smallest area.
CL4 [Descend until a leaf is reached] Set N to be the child node pointed to by Fp and
repeat from CL2.
R-Tree \ Insertion (3a/3)
Algorithm AdjustTree: Ascend from a leaf node L to the root, adjusting covering
rectangles and propagating node splits as necessary.
AT1 [Initialize] Set N=L. If L was split previously, set NN to be the resulting second node.
AT3 [Adjust covering rectangle in parent entry] Let P be the parent node of N,
and Let EN be the N’s entry in P.
Adjust EN I so that it tightly encloses all entry rectangles in N.
R-Tree \ Insertion (3b/3)
Algorithm AdjustTree: Ascend from a leaf node L to the root, adjusting covering
rectangles and propagating node splits as necessary.
AT4 [Propagate node split upward] If N has a partner NN resulting from an earlier split,
create a new entry ENN with ENNp pointing to NN and ENNI enclosing all rectangles
in NN Add ENN to P if there is room. Otherwise, invoke SplitNode to produce P and
PP containing ENN and all P’s old entries.
AT5 [Move up to next level] Set N=P and set NN=PP if a split occurred.
Repeat from AT2.
R-Tree \ Deletion (1/3)
Algorithm Delete: Remove index record E from an R-tree
D1 [Find node containing record] Invoke FindLeaf to location the leaf L containing E.
Stop if the record was not found.
D4 [Shorten tree] If the root node has only one child after the tree has been adjusted,
make the child the new root.
R-Tree \ Deletion (2/3)
Algorithm FindLeaf: Given an R-tree whose root node is T, find the leaf node containing
the index entry E.
FL1. [Search subtrees] If T is not a leaf, check each entry F in T to determine if F I overlaps
E l.
For each such entry invoke FindLeaf on the tree whose root is pointed to by F p
until E is found or all entries have been checked.
FL2. [Search leaf node for record] If T is a leaf, check each entry to see if it matches E.
If E is found return T.
R-Tree \ Deletion (3a/3)
Algorithm CondenseTree: Given a leaf node L from which an entry has been deleted,
eliminate the node if it has too few entries and relocate its entries Propagate node
elimination upward as necessary. Adjust all covermg rectangles on the path to the root,
making them smaller if possible.
CT1 [Initialize] Set N=L. Set Q, the set of eliminated nodes, to be empty.
CT4 [Adjust covering rectangle] If N has not been eliminated , adjust EN I to tightly contain
all entries in N.
CT5 [Move up one level in tree] Set N=P and repeat from CT2.
Solutions:
- Exhaustive Algorithm
- A Quadratic-cost Algorithm
- A Linear-Cost Algorithm
Norbert Beckmann, etc. 1990. The R*-tree: an efficient and robust access method
for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international
conference on Management of data (SIGMOD '90). ACM, New York, NY, USA, 322-331.
R+-Tree \ Introduction (1/2)
Considering the performance of R-tree searching, the concepts of coverage and overlap
are important.
- Coverage of a level of an R-tree is defined as the total area of all the rectangles
associated with the nodes of that level.
- Overlap of a level of an R-tree is defined as the total area contained within two or
more nodes.
R+-Tree \ Introduction (2/2)
Obviously, efficient R-tree searching demands
that both overlap and coverage be minimized.
[5] Timos K. Sellis, etc. 1987. The R+-Tree: A Dynamic Index for Multi-Dimensional
Objects. In Proceedings of the 13th International Conference on Very Large Data Bases
(VLDB '87), San Francisco, CA, USA, 507-518.
[6] Norbert Beckmann, etc. 1990. The R*-tree: an efficient and robust access method
for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international
conference on Management of data (SIGMOD '90). ACM, New York, NY, USA, 322-331.
Q&A