Column-vs-Row databases
Column-vs-Row databases
Contents
Column-store introduction
Column-store data model
Emulation of Column Store in Row store
Column store optimization
Experiment and Results
Conclusion
1
11/30/2024
2
11/30/2024
(+) Easy to add/modify a record (+) Only need to read in relevant data
(-) Might read in unnecessary data (-) Tuple writes require multiple accesses
3
11/30/2024
4
11/30/2024
Join Indexes
T1 and T2 are projections on T
M segments in T1 and N segments in T2
Join Index from T1 to T2 is a table of the form:
(s: Segment ID in T2, k: Storage key in Segment s)
Each row in join index matches corresponding row in T1
Join indexes are built such that T could be efficiently
reconstructed from T1 and T2
Compression
Trades I/O for CPU
Increased column-store opportunities:
Higher data value locality in column stores
Techniques such as run length encoding far more useful
Schemes
Null Suppression
Dictionary encoding
Run Length encoding
Bit-Vector encoding
Heavyweight schemes
10
5
11/30/2024
11
12
6
11/30/2024
13
Vertical Partitioning
Process:
Full Vertical partitioning of each relation
Each column =1 Physical table
This can be achieved by adding integer position column to every table
Adding integer position is better than adding primary key
Join on Position for multi column fetch
Problems:
“Position” - Space and disk bandwidth
Header for every tuple – further space wastage
e.g. 24 byte overhead in PostgreSQL
14
7
11/30/2024
15
Materialized Views
Process:
Create ‘optimal' set of MVs for given query workload
Objective:
Provide just the required data
Avoid overheads
Performs better
Expected to perform better than other two approach
Problems:
Practical only in limited situation
Require knowledge of query workloads in advance
16
8
11/30/2024
Select F.custID
from Facts as F
where F.price>20
17
18
9
11/30/2024
Compression
19
Compression
10
11/30/2024
Late Materialization
21
Late Materialization
22
11
11/30/2024
Late Materialization
Advantages
Unnecessary construction of tuple is avoided
Direct operation on compressed data
Cache performance is improved (PAX)
23
Thank You!
24
12