An Efficient and Robust Access Method for Points and Rectangles
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger
Presented By : Alok Kumar Yadav 2009cs10177
Overview
Introduction
Rtree and Its Optimization R-tree Variants
R*-tree
Experiments Conclusions
Introduction
Spatial Access Methods (SAMs): Approximation of complex spatial object by MBR Pros:
Complex object can be represented by limited no of bytes Preserves most essential geometric properties i.e. location and extension
Cons: (Obviously) A lot of information is lost
Introduction (Cont..)
R-tree: B+-tree like structure Popular access method for rectangles Based on the heuristic optimization of area of enclosing rectangles in each inner node R*-tree: Combined optimization: Area, Margin & Overlap Outperforms exiting R-tree variants Efficiently supports point and spatial data
Introduction (Cont..)
Given a city map, index all university buildings in an efficient structure for quick topological search.
Introduction (Cont..)
Introduction (Cont..)
MBR of the city
neighbourhoods.
MBR of the city
defining the overall search region.
R-tree
B+-tree like structure
Nodes : E (cp, rectangle) cp :
Child Pointer For leaves it is a record in database MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object
rectangle :
Max no of elements in a node is M, min is m
R-tree (Cont..)
Structure
c
I(A) I(B) I(M)
d
b
a
I(a) I(b) I(c) I(d)
R-tree (Cont..)
B+-tree like structure Nodes : E (cp, rectangle) cp :
Child Pointer For leaves it is a record in database
rectangle : MBR of all rectangle in child node For leaves it is enclosing rectangle of spatial object
Optimization criterion: Minimization of area of enclosing
rectangles in each inner node Allows overlapping of directory rectangles, hence cannot guarantee a single search path
R-tree Variants
It is dynamic, hence all optimization
Insertion Algorithm
have to applied during insertion Finds most suitable subtree for new entry
ChooseSubtree Algorithm
If node is filled with more then M
If ends in a node filled with M entries
entries, distribute in two nodes in a appropriate manner
Split Algorithm
R-tree Variants
Original R-tree : Guttman
Greenes R-tree
Original R-tree: Guttman
Method of optimization is minimize directory
rectangle area Split algos: Exponential, Linear, Quadratic Exponential best but cpu cost too high Others are approximations Quadratic outperforms linear
Guttmans ChooseSubtree
CS1 CS2 Set N to be the root node If N is a leaf, return N else Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the rectangle of smallest area end Set N to be the childnode pointed to by the childpointer of the chosen entry. Repeat from CS2
CS3
Guttmans Split Algorithm
Quadratic Split
[Divide a set of M+1 index entries into two groups] QS1 Invoke PickSeeds to choose two entries, each be first entry of each group QS2 Repeat DistributeEntry until all entries are distributed or one of the two groups has Mm+1 entries (so that the other group has m entries) QS3 If entries remain, assign them to the other group so that it has the minimum number m required
Guttmans Split Algorithm (Cont..)
PickSeeds PS1 For each pair of entries E1 and E2, compose a rectangle R including E1 rectangle and E2 rectangle
Calculate d = area(R) - area(E1 rectangle) - area(E2 rectangle)
PS2
Choose the pair with the largest d
DistributeEntry
DE1 DE2 Invoke PickNext to choose the next entry to be assigned Add It to the group whose covering rectangle will have to be enlarged least to accommodate It. Resolve ties by adding the entry to the group with the smallest area, then to the one with the fewer entries, then to either
Guttmans Split Algorithm (Cont..)
PickNext
DE1 For each entry E not yet in a group, calculate d1 = the area increase required in the covering rectangle of Group 1 to include E Rectangle. Calculate d2 analogously for Group 2. Choose the entry with the maximum difference between d1 and d2
DE2
Problems
Small Seeds: If d-1 of the d axes of a far away rectangle is same as one seed, needle like bounding rectangle may be formed May initiate a bad split
R1 R2
R3
Problems (Cont..)
Prefer Bounding Rectangle: Algo prefer the MBR created from previous assignment Since it was enlarged, it requires less area enlargement to include next entry
G1
X
Z
G2
If a group reached M-m+1 entries, rest are assigned
to other without considering geometric properties
Greenes R-tree
ChooseSubtree is same as Guttmans
Alternative split algorithm Invokes PickSeeds to find two most distant rectangles
Picks a axis using these two rectangles depending
upon there separation distance Sorts the remaining rectangles along chosen axis Distributes half entries to one group and remaining to other
Greenes R-tree (Cont..)
In some situations cannot find right axis and bad split
may occur
Inspiration For R*-tree
Minimize overlap between directory rectangles :
Decrease no of path to be traversed Minimize margin of a directory rectangle : Rectangle would be shaped more quadratic
Optimize space utilization : Height will be low
R*-tree
Structure same as R-tree
For insertion R-tree versions only consider area R*-tree consider area, margin & overlap in different
combinations Overlap is defined as
R*-tree: ChooseSubtree
Similar to original one, only difference is that it minimizes overlap enlargement when N points to leaves
CS1 CS2 Set N to be the root node If N is a leaf, return N else If childpointers in N point to leaves [determine the minimum overlap cost], choose the entry in N whose rectangle needs least overlap enlargement to include the new data rectangle. Resolve ties by choosing the entry whose rectangle needs least area enlargement, then the entry with the rectangle of smallest area else [determine the minimum area cost] choose the entry in N whose rectangle needs least area enlargement to include the new data rectangle. Resolve ties by choosing entry with rectangle of smallest area end Set N to be the childnode pointed to by the childpointer of the chosen entry. Repeat from CS2
CS3
R*-tree: Split Algorithm
Three goodness values:
area-value margin-value overlap-value
area[bb(first group)] + area[bb(second group)] margin[bb(first group)] + margin[bb(second group)] area[bb(first group) bb(second group)]
Depending on these values final distribution is determined
*bb: bounding box
R*-tree: Split Algorithm (Cont..)
Method for good split: Along each axis entries are sorted first by lower and then by upper value of their rectangles For each sort M-2m+2 distribution of M+1 entries is determined First group contains (m-1)+k entries while other contains remaining (k=1,..,(M-2m+2)) For each distribution goodness value is measured
R*-tree: Split Algorithm (Cont..)
Split S1 Invoke ChooseSplitAxis to determine the axis, perpendicular to which the split is performed S2 Invoke ChooseSplitIndex to determine the best distribution into two groups along that axis S3 Distribute the entries into two groups
ChooseSplitAxis
CSA1
CSA2
CSI1
For each axis Sort the entries by the lower then by the upper value of their rectangles and determine all distributions as described above. Compute S, the sum of all marginvalues of the different distributions end Choose the axis with the minimum S as split axis
Along the chosen split axis, choose the distribution with the minimum overlap-value. Resolve ties by choosing the distribution with minimum area-value
ChooseSplitIndex
Reinsert
Dealing with under filled nodes in R-tree: Remove its
entries and reinsert them It improves retrieval performances Deletion and reinsertions tunes R-tree but it is very static To achieve dynamic reorganization R*-tree uses forced reinsertion during insertion routine
Forced Reinsert
If a node is overfilled, R*-tree takes p entries based on
distance of their center from center of MBR Removes p entries and adjust the MBR Reinserts them to prevent splits If they are reinserted in the same node again then it calls split Now cpu cost of insert is increased but if we take average on large insert it is only increased about 4% due to reduced splits and better structure
Forced Reinsert: Advantages
More reconstruction, less split
Storage utilization is improved Outer rectangles are reinserted, directory rectangle
becomes more quadratic which is a desired property
Experiments
Comparison between four R-tree variants R-tree with quadratic split algorithm (qua.Gut) R-tree with linear split algorithm (lin.Gut) Greenes variant of R-tree (Greene) R*-tree Six data files containing about 100,000 2D rectangles
All experiments were measured in number of disk
accesses
Experiments (Cont..)
Types of queries Rectangle intersection query
given a rectangle S, find all rectangles R in the file with R S
Point query given a point P, find all rectangles R in the file with P R. Rectangle enclosure query given a rectangle S, find all rectangles R in the file with R S Spatial join Over two files as the set of all pairs of rectangles where rectangle from f1 intersects rectangle from f2 Also measured the parameters insertion and storage
utilization
Seven query files were created
Results
The page access for queries to R*-tree are standardized
to 100%. Here is the relative performance for all 4 variants for R-tree
Results (Cont..)
Unweighted average results over all distributions
Results (Cont..)
R*-tree are very efficient for PAM
Even outperforms very popular 2-level grid file
Conclusions
Since all three area, margin and overlap are reduced,
R*-tree is very robust against ugly data Storage utilization is higher, insertion cost is low Outperforms all existing R-tree variants R*-tree can efficiently be used as an access method in database systems organizing both multidimensional points and spatial data
References
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles
-N. Beckmann, H.-P. Kriegel, R. Schneider and B. Seeger. SIGMOD 1990 http://en.wikipedia.org/wiki/R-tree Image Sources:
Maps & R-tee: http://electures.informatik.uni-freiburg.de/portal/download/3/8534/thm03%20-%20rTtee%20p1.ppt Thumbs Up: http://www.ideachampions.com/weblogs/Peer%20to%20Peer%20Recognition.png Gears: http://www.yesup.net/wordpress/wp-content/themes/yesupnet2/images/icon5.png Tree: http://a01421.deviantart.com/art/tree-variants-304634600 Choose: http://www.transforming-technologies.com/blog/index.php/2011/06/16/how-to-choose-an-esd-mat/ Original: http://www.pixmac.com/picture/original+ink+stamp/000045168969 Split: http://www.clipartguide.com/_pages/0511-1001-2605-2460.html Forced: http://thepoliticalcarnival.net/2011/05/ Advantages : http://www.webgraffiti.ca/advantages.html Experiments: http://nilssmith.com/becoming-a-social-media-pastor-part-4-the-experiment/ Results: http://www.iconshock.com/icons/sigma/project_managment/results-icon.html Conclusions: http://herbertjlkld.portrelay.com/conclusions-clip-art.html Introduction: http://www.eng.fju.edu.tw/iacd_2010S/computer/introduction1.htm
Thanks
Q/A