0% found this document useful (0 votes)

3 views

Column Store Tutorial VLDB09

Uploaded by

marcelobaxo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Column Store Tutorial VLDB09

Uploaded by

marcelobaxo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

VLDB
Column-Oriented 2009
Tutorial
Database Systems
Part 1: Stavros Harizopoulos (HP Labs)
Part 2: Daniel Abadi (Yale)
Part 3: Peter Boncz (CWI)

VLDB 2009 Tutorial 1

Column-Oriented Database Systems
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

What is a column-store?

row-store column-store
Date Store Product Customer Price
Date Store Product Customer Price

+ easy to add/modify a record + only need to read in relevant data

- might read in unnecessary data - tuple writes require multiple accesses

=> suitable for read-mostly, read-intensive, large data repositories

VLDB 2009 Tutorial Column-Oriented Database Systems 2

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Are these two fundamentally different?

l The only fundamental difference is the storage layout
l However: we need to look at the big picture

different storage layouts proposed

row-stores row-stores++ row-stores++ converge?

‘70s ‘80s ‘90s ‘00s today

column-stores
new applications
new bottleneck in hardware

l How did we get here, and where we are heading Part 1

l What are the column-specific optimizations? Part 2
l How do we improve CPU efficiency when operating on Cs
Part 3
VLDB 2009 Tutorial Column-Oriented Database Systems 3
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Outline
l Part 1: Basic concepts — Stavros
l Introduction to key features
l From DSM to column-stores and performance tradeoffs
l Column-store architecture overview
l Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 4

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco Data Warehousing example

l Typical DW installation dimension tables
account fact table
or RAM
l Real-world example usage source

“One Size Fits All? - Part 2: Benchmarking toll

Results” Stonebraker et al. CIDR 2007 star schema

QUERY 2
SELECT account.account_number,
sum (usage.toll_airtime),
sum (usage.toll_price) Column-store Row-store
FROM usage, toll, source, account
WHERE usage.toll_id = toll.toll_id
Query 1 2.06 300
AND usage.source_id = source.source_id Query 2 2.20 300
AND usage.account_id = account.account_id
AND toll.type_ind in (‘AE’. ‘AA’)
Query 3 0.09 300
AND usage.toll_price > 0 Query 4 5.24 300
AND source.type != ‘CIBER’
AND toll.rating_method = ‘IS’
Query 5 2.88 300
AND usage.invoice_date = 20051013
GROUP BY account.account_number
Why? Three main factors (next slides)
VLDB 2009 Tutorial Column-Oriented Database Systems 5
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco example explained (1/3):

read efficiency
row store column store

read pages containing entire rows read only columns needed

one row = 212 columns! in this example: 7 columns

is this typical? (it depends) caveats:

• “select * ” not any faster
• clever disk prefetching
What about vertical partitioning?
• clever tuple reconstruction
(it does not work with ad-hoc
queries)
VLDB 2009 Tutorial Column-Oriented Database Systems 6
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco example explained (2/3):

compression efficiency
l Columns compress better than rows
l Typical row-store compression ratio 1 : 3
l Column-store 1 : 10

l Why?
l Rows contain values from different domains
=> more entropy, difficult to dense-pack
l Columns exhibit significantly less entropy
l Examples: Male, Female, Female, Female, Male
1998, 1998, 1999, 1999, 1999, 2000

l Caveat: CPU cost (use lightweight compression)

VLDB 2009 Tutorial Column-Oriented Database Systems 7

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Telco example explained (3/3):

sorting & indexing efficiency
l Compression and dense-packing free up space
l Use multiple overlapping column collections
l Sorted columns compress better
l Range queries are faster
l Use sparse clustered indexes

What about heavily-indexed row-stores?

(works well for single column access,
cross-column joins become increasingly expensive)

VLDB 2009 Tutorial Column-Oriented Database Systems 8

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Additional opportunities for column-stores

l Block-tuple / vectorized processing
l Easier to build block-tuple operators
l Amortizes function-call cost, improves CPU cache performance
l Easier to apply vectorized primitives
l Software-based: bitwise operations
l Hardware-based: SIMD Part 3

l Opportunities with compressed columns

l Avoid decompression: operate directly on compressed
l Delay decompression (and tuple reconstruction)
more
l Also known as: late materialization in Part 2

l Exploit columnar storage in other DBMS components

l Physical design (both static and dynamic) See: Database
Cracking, from CWI
VLDB 2009 Tutorial Column-Oriented Database Systems 9
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)
“Column-Stores vs Row-Stores:
How Different are They
Effect on C-Store performance Really?” Abadi, Hachem, and
Madden. SIGMOD 2008.

Average for SSBM queries on C-store

original
Time (sec)

C-store

enable
late
column-oriented enable materialization
join algorithm compression &
operate on compressed

VLDB 2009 Tutorial Column-Oriented Database Systems 10

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Summary of column-store key features

columnar storage
Part 1
header/ID elimination
l Storage layout
compression Part 2 Part 3

multiple sort orders

column operators Part 1 Part 2

avoid decompression
l Execution engine Part 2
late materialization

vectorized operations Part 3

l Design tools, optimizer

VLDB 2009 Tutorial Column-Oriented Database Systems 11
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 12

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

From DSM to Column-stores

TOD: Time Oriented Database – Wiederhold et al.
70s -1985: "A Modular, Self-Describing Clinical Databank
System," Computers and Biomedical Research, 1975
More 1970s: Transposed files, Lorie, Batory,
Svensson.
“An overview of cantor: a new system for data analysis”
Karasalo, Svensson, SSDBM 1983

1985: DSM paper “A decomposition storage model”

Copeland and Khoshafian. SIGMOD 1985.

1990s: Commercialization through SybaseIQ

Late 90s – 2000s: Focus on main-memory performance
l DSM “on steroids” [1997 – now] CWI: MonetDB

l Hybrid DSM/NSM [2001 – 2004] Wisconsin: PAX, Fractured Mirrors

Michigan: Data Morphing CMU: Clotho

2005 – : Re-birth of read-optimized DSM as “column-store”
MIT: C-Store CWI: MonetDB/X100 10+ startups
VLDB 2009 Tutorial Column-Oriented Database Systems 13
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“A decomposition storage
The original DSM paper model” Copeland and
Khoshafian. SIGMOD 1985.
l Proposed as an alternative to NSM
l 2 indexes: clustered on ID, non-clustered on value
l Speeds up queries projecting few columns
l Requires more storage value
ID 0100 0962 1000 ..
1 2 3 4 ..

VLDB 2009 Tutorial Column-Oriented Database Systems 14

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Memory wall and PAX

l 90s: Cache-conscious research
“Cache Conscious Algorithms for
from: Relational Query Processing.”
Shatdal, Kant, Naughton. VLDB 1994.
“DBMSs on a modern processor:
“Database Architecture Optimized for and: Where does time go?” Ailamaki,
to: the New Bottleneck: Memory Access.” DeWitt, Hill, Wood. VLDB 1999.
Boncz, Manegold, Kersten. VLDB 1999.

l PAX: Partition Attributes Across

l Retains NSM I/O pattern
l Optimizes cache-to-RAM communication
“Weaving Relations for Cache Performance.”
Ailamaki, DeWitt, Hill, Skounakis, VLDB 2001.

VLDB 2009 Tutorial Column-Oriented Database Systems 15

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

More hybrid NSM/DSM schemes

l Dynamic PAX: Data Morphing
“Data morphing: an adaptive, cache-conscious
storage technique.” Hankins, Patel, VLDB 2003.

l Clotho: custom layout using scatter-gather I/O

“Clotho: Decoupling Memory Page Layout from Storage Organization.”
Shao, Schindler, Schlosser, Ailamaki, and Ganger. VLDB 2004.

l Fractured mirrors
l Smart mirroring with both NSM/DSM copies
“A Case For Fractured
Mirrors.” Ramamurthy,
DeWitt, Su, VLDB 2002.

VLDB 2009 Tutorial Column-Oriented Database Systems 16

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

MonetDB (more in Part 3)

l Late 1990s, CWI: Boncz, Manegold, and Kersten
l Motivation:
l Main-memory
l Improve computational efficiency by avoiding expression
interpreter
l DSM with virtual IDs natural choice
l Developed new query execution algebra
l Initial contributions:
l Pointed out memory-wall in DBMSs
l Cache-conscious projections and joins
l …

VLDB 2009 Tutorial Column-Oriented Database Systems 17

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

2005: the (re)birth of column-stores

l New hardware and application realities
l Faster CPUs, larger memories, disk bandwidth limit
l Multi-terabyte Data Warehouses
l New approach: combine several techniques
l Read-optimized, fast multi-column access,
disk/CPU efficiency, light-weight compression

l C-store paper:
l First comprehensive design description of a column-store
l MonetDB/X100
l “proper” disk-based column store
l Explosion of new products
VLDB 2009 Tutorial Column-Oriented Database Systems 18
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Performance tradeoffs: columns vs. rows

DSM traditionally was not favored by technology trends
How has this changed?

l Optimized DSM in “Fractured Mirrors,” 2002

l “Apples-to-apples” comparison “Performance Tradeoffs in Read-
Optimized Databases”
Harizopoulos, Liang, Abadi,
Madden, VLDB’06
l Follow-up study “Read-Optimized Databases, In-
Depth” Holloway, DeWitt, VLDB’08
l Main-memory DSM vs. NSM
“DSM vs. NSM: CPU performance tradeoffs in block-oriented
query processing” Boncz, Zukowski, Nes, DaMoN’08
l Flash-disks: a come-back for PAX? “Query Processing Techniques
“Fast Scans and Joins Using Flash for Solid State Drives”
Drives” Shah, Harizopoulos, Tsirogiannis, Harizopoulos,
Wiener, Graefe. DaMoN’08
VLDB 2009 Tutorial
Shah, Wiener, Graefe,
Column-Oriented Database Systems 19
SIGMOD’09
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Fractured mirrors: a closer look

l Store DSM relations inside a B-tree “A Case For Fractured
Mirrors” Ramamurthy,
l Leaf nodes contain values DeWitt, Su, VLDB 2002.
l Eliminate IDs, amortize header overhead
l Custom implementation on Shore
sparse
Tuple TID Column
Header Data B-tree on ID
1 a1 3
2 a2
3 a3 1 a1 2 a2 3 a3 4 a4 5 a5

4 a4 a1 a2 a3 a4 a5
1 4
5 a5
Similar: storage density “Efficient columnar
comparable storage in B-trees” Graefe.
to column stores Sigmod Record 03/2007.
VLDB 2009 Tutorial Column-Oriented Database Systems 20
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Fractured mirrors: performance

From PAX paper: column?

time
row
regular DSM
column?

columns projected:
1 2 3 4 5

optimized
l Chunk-based tuple merging DSM
l Read in segments of M pages
l Merge segments in memory
l Becomes CPU-bound after 5 pages

VLDB 2009 Tutorial Column-Oriented Database Systems 21

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)
“Performance Tradeoffs in Read-
Column-scanner Optimized Databases”
implementation Harizopoulos, Liang, Abadi,
Madden, VLDB’06
row scanner column scanner

Joe 45
… …
Joe 45
… … SELECT name, age
WHERE age > 40
apply S
predicate(s)
S #POS 45
Joe #POS …
Direct I/O Sue
…
prefetch ~100ms apply
1 Joe 45 worth of data predicate #1 S
2 Sue 37
…… …
45
37
…
VLDB 2009 Tutorial Column-Oriented Database Systems 22
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Scan performance
l Large prefetch hides disk seeks in columns
l Column-CPU efficiency with lower selectivity
l Row-CPU suffers from memory stalls not shown,
l Memory stalls disappear in narrow tuples details in the paper

l Compression: similar to narrow

VLDB 2009 Tutorial Column-Oriented Database Systems 23

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“Read-Optimized Databases, In-

Even more results Depth” Holloway, DeWitt, VLDB’08
35
• Same engine as before narrow & compressed tuple:
• Additional findings 30
CPU-bound!
C-25%
25
C-10%
R-50%
20

Time (s)
15

5
wide attributes:
same as before 0
1 2 3 4 5 6 7 8 9 10
Columns Returned

Non-selective queries, narrow tuples, favor well-compressed rows

Materialized views are a win
Column-joins are
Scan times determine early materialized joins covered in part 2!
VLDB 2009 Tutorial Column-Oriented Database Systems 24
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Speedup of columns over rows

“Performance Tradeoffs in Read-
Optimized Databases”
Harizopoulos, Liang, Abadi,
cycles per disk byte 144 Madden, VLDB’06

(cpdb) 36
+++
18
_ = + ++
9
8 12 16 20 24 28 32 36
tuple width
l Rows favored by narrow tuples and low cpdb
l Disk-bound workloads have higher cpdb
VLDB 2009 Tutorial Column-Oriented Database Systems 25
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying prefetch size

no competing
disk traffic
40
Column 2
time (sec)

30
Column 8
20 Column 16
Column 48 (x 128KB)
10
Row (any prefetch size)
0
4 8 12 16 20 24 28 32
selected bytes per tuple

l No prefetching hurts columns in single scans

VLDB 2009 Tutorial Column-Oriented Database Systems 26

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Varying prefetch size

with competing disk traffic
40 Column, 48 40

time (sec)
30 Row, 48 30

20 20

10 10 Column, 8
Row, 8
0 0
4 12 20 28 4 12 20 28
selected bytes per tuple

l No prefetching hurts columns in single scans

l Under competing traffic, columns outperform rows for
any prefetch size
VLDB 2009 Tutorial Column-Oriented Database Systems 27
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

“DSM vs. NSM: CPU performance trade

CPU Performance offs in block-oriented query processing”
Boncz, Zukowski, Nes, DaMoN’08
l Benefit in on-the-fly conversion between NSM and DSM
l DSM: sequential access (block fits in L2), random in L1
l NSM: random access, SIMD for grouped Aggregation

VLDB 2009 Tutorial Column-Oriented Database Systems 28

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

New storage technology: Flash SSDs

l Performance characteristics
l very fast random reads, slow random writes
l fast sequential reads and writes
l Price per bit (capacity follows)
l cheaper than RAM, order of magnitude more expensive than Disk
l Flash Translation Layer introduces unpredictability
l avoid random writes!
l Form factors not ideal yet
l SSD (Ł small reads still suffer from SATA overhead/OS limitations)
l PCI card (Ł high price, limited expandability)

l Boost Sequential I/O in a simple package

l Flash RAID: very tight bandwidth/cm3 packing (4GB/sec inside the box)
l Column Store Updates
l useful for delta structures and logs
l Random I/O on flash fixes unclustered index access
l still suboptimal if I/O block size > record size
l therefore column stores profit mush less than horizontal stores
l Random I/O useful to exploit secondary, tertiary table orderings
l the larger the data, the deeper clustering one can exploit

VLDB 2009 Tutorial Column-Oriented Database Systems 29

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Even faster column scans on flash SSDs

30K Read IOps, 3K Write Iops
l New-generation SSDs 250MB/s Read BW, 200MB/s Write
l Very fast random reads, slower random writes
l Fast sequential RW, comparable to HDD arrays
l No expensive seeks across columns
l FlashScan and Flashjoin: PAX on SSDs, inside Postgres
“Query Processing Techniques for
Solid State Drives” Tsirogiannis,
Harizopoulos, Shah, Wiener, Graefe,
SIGMOD’09

mini-pages with no
qualified attributes are
not accessed

VLDB 2009 Tutorial Column-Oriented Database Systems 30

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Column-scan performance over time

regular DSM (2001)
column-store (2006) ..to 1.2x slower

from 7x slower

..to 2x slower
..to same

and 3x faster!

optimized DSM (2002) SSD Postgres/PAX (2009)

VLDB 2009 Tutorial Column-Oriented Database Systems 31

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 32

Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Architecture of a column-store
storage layout
l read-optimized: dense-packed, compressed
l organize in extends, batch updates
l multiple sort orders
l sparse indexes engine
l block-tuple operators

l new access methods

system-level l optimized relational operators

l system-wide column support

l loading / updates
l scaling through multiple nodes
l transactions / redundancy
VLDB 2009 Tutorial Column-Oriented Database Systems 33
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)
“C-Store: A Column-Oriented
DBMS.” Stonebraker et al.
C-Store VLDB 2005.

l Compress columns
l No alignment
l Big disk blocks
l Only materialized views (perhaps many)
l Focus on Sorting not indexing
l Data ordered on anything, not just time
l Automatic physical DBMS design
l Optimize for grid computing
l Innovative redundancy
l Xacts – but no need for Mohan
l Column optimizer and executor

VLDB 2009 Tutorial Column-Oriented Database Systems 34

C-Store: only materialized views (MVs)

l Projection (MV) is some number of columns from a fact table
l Plus columns in a dimension table – with a 1-n join between
Fact and Dimension table
l Stored in order of a storage key(s)
l Several may be stored!
l With a permutation, if necessary, to map between them
l Table (as the user specified it and sees it) is not stored!
l No secondary indexes (they are a one column sorted MV plus
a permutation, if you really want one)
User view: Possible set of MVs:
EMP (name, age, salary, dept) MV-1 (name, dept, floor) in floor order
Dept (dname, floor) MV-2 (salary, age) in age order
MV-3 (dname, salary, name) in salary order

VLDB 2009 Tutorial Column-Oriented Database Systems 35

Continuous Load and Query (Vertica)

Hybrid Storage Architecture

> Write Optimized > Read Optimized

Store (WOS) Store (ROS)
Trickle • On disk
Load • Sorted / Compressed
TUPLE MOVER
Asynchronous • Segmented
A B C
Data Transfer • Large data loaded direct

§Memory based
A B C
§Unsorted / Uncompressed
§Segmented
§Low latency / Small quick (A B C | A)
inserts
VLDB 2009 Tutorial Column-Oriented Database Systems 36
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Loading Data (Vertica)

> INSERT, UPDATE, DELETE Write-Optimized

Store (WOS)
> Bulk and Trickle Loads In-memory
§COPY
Automatic
§COPY DIRECT Tuple Mover

> User loads data into logical Tables

> Vertica loads atomically into
storage
Read-Optimized
Store (ROS)
On-disk

VLDB 2009 Tutorial Column-Oriented Database Systems 37

Applications for column-stores

l Data Warehousing
l High end (clustering)
l Mid end/Mass Market
l Personal Analytics
l Data Mining
l E.g. Proximity
l Google BigTable
l RDF
l Semantic web data management
l Information retrieval
l Terabyte TREC
l Scientific datasets
l SciDB initiative
l SLOAN Digital Sky Survey on MonetDB

VLDB 2009 Tutorial Column-Oriented Database Systems 38

List of column-store systems

l Cantor (history)
l Sybase IQ
l SenSage (former Addamark Technologies)
l Kdb
l 1010data
l MonetDB
l C-Store/Vertica
l X100/VectorWise
l KickFire
l SAP Business Accelerator
l Infobright
l ParAccel
l Exasol
VLDB 2009 Tutorial Column-Oriented Database Systems 39
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 40

Simulate a Column-Store inside a Row-Store

Date Store Product Customer Price

01/01 BOS Table Mesa $20

01/01 NYC Chair Lutz $13 Option B:

Index Every Column
01/01 BOS Bed Mudd $79
Option A: Date Index
Vertical Partitioning

Date Store Product Customer Price

TID Value TID Value TID Value TID Value TID Value

1 01/01 1 BOS 1 Table 1 Mesa 1 $20

Store Index
2 01/01 2 NYC 2 Chair 2 Lutz 2 $13

3 01/01 3 BOS 3 Bed 3 Mudd 3 $79

…
VLDB 2009 Tutorial Column-Oriented Database Systems 41
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Simulate a Column-Store inside a Row-Store

Date Store Product Customer Price

01/01 BOS Table Mesa $20

01/01 NYC Chair Lutz $13 Option B:

Index Every Column
01/01 BOS Bed Mudd $79
Option A: Date Index
Vertical Partitioning

Date Store Product Customer Price

Value StartPos Length TID Value TID Value TID Value TID Value

01/01 1 3 1 BOS 1 Table 1 Mesa 1 $20

Store Index
2 NYC 2 Chair 2 Lutz 2 $13

Can explicitly run- 3 BOS 3 Bed 3 Mudd 3 $79

length encode date
“Teaching an Old Elephant New Tricks.”
Bruno, CIDR 2009.
…
VLDB 2009 Tutorial Column-Oriented Database Systems 42
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

Experiments
l Star Schema Benchmark (SSBM) Adjoined Dimension Column Index (ADC Index)
to Improve Star Schema Query Performance”.
O’Neil et. al. ICDE 2008.

l Implemented by professional DBA

l Original row-store plus 2 column-store
simulations on same row-store product
250.0
“Column-Stores vs Row-Stores:
200.0
How Different are They Really?”
Abadi, Hachem, and Madden.
Time (seconds)

150.0 SIGMOD 2008.

100.0

50.0

0.0
Vertically Partitioned Row-Store With All
Normal Row-Store
Row-Store Indexes
Average 25.7 79.9 221.2

VLDB 2009 Tutorial Column-Oriented Database Systems 43

What’s Going On? Vertical Partitions

l Vertical partitions in row-stores:
l Work well when workload is known
l ..and queries access disjoint sets of columns
l See automated physical design
Tuple TID Column
Header Data

1
l Do not work well as full-columns
2
l TupleID overhead significant
3
l Excessive joins
Queries touch 3-4 foreign keys in fact table,
1-2 numeric columns
“Column-Stores vs. Row-Stores: Complete fact table takes up ~4 GB
How Different Are They Really?” (compressed)
Abadi, Madden, and Hachem. Vertically partitioned tables take up 0.7-1.1
SIGMOD 2008. GB (compressed)

VLDB 2009 Tutorial Column-Oriented Database Systems 44

What’s Going On? All Indexes Case

l Tuple construction
l Common type of query: SELECT store_name, SUM(revenue)
FROM Facts, Stores
WHERE fact.store_id = stores.store_id
AND stores.country = “Canada”
GROUP BY store_name

l Result of lower part of query plan is a set of TIDs that passed

all predicates
l Need to extract SELECT attributes at these TIDs
l BUT: index maps value to TID
l You really want to map TID to value (i.e., a vertical partition)
Tuple construction is SLOW

VLDB 2009 Tutorial Column-Oriented Database Systems 45

So….
l All indexes approach is a poor way to simulate a column-store
l Problems with vertical partitioning are NOT fundamental
l Store tuple header in a separate partition
l Allow virtual TIDs
l Combine clustered indexes, vertical partitioning
l So can row-stores simulate column-stores?
l Might be possible, BUT:
l Need better support for vertical partitioning at the storage layer
l Need support for column-specific optimizations at the executer level
l Full integration: buffer pool, transaction manager, ..

l When will this happen? See Part 2, Part 3

l Most promising features = soon for most promising
features
l ..unless new technology / new objectives change the game
(SSDs, Massively Parallel Platforms, Energy-efficiency)
VLDB 2009 Tutorial Column-Oriented Database Systems 46
Re-use permitted when acknowledging the original © Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009)

End of Part 1
l Basic concepts — Stavros
l Introduction to key features
l From DSM to column-stores and performance tradeoffs
l Column-store architecture overview
l Will rows and columns ever converge?

l Part 2: Column-oriented execution — Daniel

l Part 3: MonetDB/X100 and CPU efficiency — Peter

VLDB 2009 Tutorial Column-Oriented Database Systems 47

Final PPT - Phishing Website
100% (1)
Final PPT - Phishing Website
23 pages
MX Fire Alarm Fault Finding 2
No ratings yet
MX Fire Alarm Fault Finding 2
17 pages
Performance Tradeoffs in Read-Optimized Databases: Stavros Harizopoulos Velen Liang Samuel Madden Daniel J. Abadi
No ratings yet
Performance Tradeoffs in Read-Optimized Databases: Stavros Harizopoulos Velen Liang Samuel Madden Daniel J. Abadi
12 pages
Column Vs Row
No ratings yet
Column Vs Row
64 pages
Column Vs Row
No ratings yet
Column Vs Row
64 pages
Vertica Column-vs-Row
No ratings yet
Vertica Column-vs-Row
64 pages
Oodbms and Ordbms
No ratings yet
Oodbms and Ordbms
13 pages
Lecture 03
No ratings yet
Lecture 03
33 pages
DB Lec ALL Online - 7 PDF
No ratings yet
DB Lec ALL Online - 7 PDF
17 pages
Class 6
No ratings yet
Class 6
29 pages
Introduction To RDBMS ORDBMS
No ratings yet
Introduction To RDBMS ORDBMS
5 pages
Database Management System
100% (2)
Database Management System
120 pages
CH 11
No ratings yet
CH 11
50 pages
Unit 1
No ratings yet
Unit 1
45 pages
Database System Concepts and Architecture
No ratings yet
Database System Concepts and Architecture
38 pages
OODBMS
No ratings yet
OODBMS
19 pages
Advanced Database - Allchapters
No ratings yet
Advanced Database - Allchapters
306 pages
21aim45a Dbms Module 1
No ratings yet
21aim45a Dbms Module 1
116 pages
Data Base Models
No ratings yet
Data Base Models
21 pages
Database System Concepts and Architecture
No ratings yet
Database System Concepts and Architecture
19 pages
KSK ADS Unit 2
No ratings yet
KSK ADS Unit 2
121 pages
DBMS - Quick Guide
No ratings yet
DBMS - Quick Guide
66 pages
C9 MySQL1
100% (2)
C9 MySQL1
56 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
20 pages
Database Management System Chapter 2
No ratings yet
Database Management System Chapter 2
19 pages
BDII
No ratings yet
BDII
10 pages
Introduction of DB: Database System Concepts and Architecture
No ratings yet
Introduction of DB: Database System Concepts and Architecture
36 pages
Database and Data Modeling
No ratings yet
Database and Data Modeling
31 pages
Dbms Notes
No ratings yet
Dbms Notes
28 pages
Unit1 DBMS
No ratings yet
Unit1 DBMS
57 pages
Dbms Unit Test Notes Till Unit 4
No ratings yet
Dbms Unit Test Notes Till Unit 4
31 pages
DBMS
No ratings yet
DBMS
80 pages
dbms1 p2
No ratings yet
dbms1 p2
4 pages
OODBMS and ORDBMS
No ratings yet
OODBMS and ORDBMS
6 pages
DBMS TutorialsPoint Min
No ratings yet
DBMS TutorialsPoint Min
47 pages
2.1 Introduction To Object Oriented Data Bases Object Databases
No ratings yet
2.1 Introduction To Object Oriented Data Bases Object Databases
27 pages
DBMS 1
No ratings yet
DBMS 1
41 pages
Week 1
No ratings yet
Week 1
36 pages
Introduction To Introduction To Databases Databases Introduction To Introduction To Databases Databases
No ratings yet
Introduction To Introduction To Databases Databases Introduction To Introduction To Databases Databases
5 pages
ADMS Chapter One-1
No ratings yet
ADMS Chapter One-1
33 pages
PDF Document BIDA 2
No ratings yet
PDF Document BIDA 2
21 pages
CSE202 Database Management Systems: Lecture #6
No ratings yet
CSE202 Database Management Systems: Lecture #6
82 pages
RDBMS Vs ODBMS
100% (1)
RDBMS Vs ODBMS
3 pages
Database System Concepts and Architecture: Chapter - 2
No ratings yet
Database System Concepts and Architecture: Chapter - 2
45 pages
Object-Oriented Database: Adoption of Object Databases
No ratings yet
Object-Oriented Database: Adoption of Object Databases
5 pages
MOD1
No ratings yet
MOD1
42 pages
Woodger Computing Inc. - Architecture: Object-Oriented Databases
No ratings yet
Woodger Computing Inc. - Architecture: Object-Oriented Databases
5 pages
Elective-I Advanced Database Management Systems
No ratings yet
Elective-I Advanced Database Management Systems
67 pages
Database Management Systems
No ratings yet
Database Management Systems
25 pages
Unit - Iii Database Management Systems
No ratings yet
Unit - Iii Database Management Systems
42 pages
Advanced Database Systems: Prerequisite of ADS
No ratings yet
Advanced Database Systems: Prerequisite of ADS
4 pages
Unit #5 - Data Warehouse and Data Mining
No ratings yet
Unit #5 - Data Warehouse and Data Mining
49 pages
03-lec 3 ch 2
No ratings yet
03-lec 3 ch 2
32 pages
9 - Analytics Databases
No ratings yet
9 - Analytics Databases
12 pages
SCSA1301 DBMS Unit-5
No ratings yet
SCSA1301 DBMS Unit-5
38 pages
Book Store Management Report
100% (1)
Book Store Management Report
26 pages
DBMS Unit-1 Notes
No ratings yet
DBMS Unit-1 Notes
17 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
58 pages
Learn DBMS in 24 Hours
From Everand
Learn DBMS in 24 Hours
Alex Nordeen
No ratings yet
Oracle OBIEE Interview Q & A
From Everand
Oracle OBIEE Interview Q & A
Mohammed Azizuddin Aamer
3/5 (1)
PyQt6 101: A Beginner’s guide to PyQt6
From Everand
PyQt6 101: A Beginner’s guide to PyQt6
Edward Chang
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Digital Technology
No ratings yet
Digital Technology
33 pages
Turbo Encoder Final Document Hard
0% (1)
Turbo Encoder Final Document Hard
71 pages
TrueNAS M-Series Unified Flash Storage Data Sheet June 2023
No ratings yet
TrueNAS M-Series Unified Flash Storage Data Sheet June 2023
2 pages
PrivyID Callback Document Status - Every Document Completed v1.2
No ratings yet
PrivyID Callback Document Status - Every Document Completed v1.2
5 pages
Massive MIMO For Maximal Spectral Efficiency How M
No ratings yet
Massive MIMO For Maximal Spectral Efficiency How M
17 pages
Windows 11 Activation Txt
No ratings yet
Windows 11 Activation Txt
13 pages
Chine Systems Inc Rob Zahensky FR Yer Ma Chine Systems Inc Rob Zahensky
No ratings yet
Chine Systems Inc Rob Zahensky FR Yer Ma Chine Systems Inc Rob Zahensky
1 page
MCAC706 Catalog
No ratings yet
MCAC706 Catalog
23 pages
Ninja Block
No ratings yet
Ninja Block
9 pages
Gmail Non E Series
No ratings yet
Gmail Non E Series
5 pages
MicroCap Info Fall2010
No ratings yet
MicroCap Info Fall2010
17 pages
Data Analisis Pajsk
No ratings yet
Data Analisis Pajsk
7 pages
Concurrency Control
No ratings yet
Concurrency Control
79 pages
QUESTION BANK sem coa
No ratings yet
QUESTION BANK sem coa
9 pages
q2 2023 Whos Who in Ransomware Report
No ratings yet
q2 2023 Whos Who in Ransomware Report
35 pages
MB Manual B550-Aorus-Elite-Ax-V2 1502 e
No ratings yet
MB Manual B550-Aorus-Elite-Ax-V2 1502 e
31 pages
The-Gorilla-Guide-To-Enterprise-Security-Fundamentals 2
No ratings yet
The-Gorilla-Guide-To-Enterprise-Security-Fundamentals 2
54 pages
Steps in Syniti ADMM For Development
No ratings yet
Steps in Syniti ADMM For Development
11 pages
Lec 3
No ratings yet
Lec 3
9 pages
Dummy 123
No ratings yet
Dummy 123
4 pages
Quiz PRF192
No ratings yet
Quiz PRF192
13 pages
Xii CS PB1
No ratings yet
Xii CS PB1
10 pages
Arduino Setup Guide For Otto Robots
No ratings yet
Arduino Setup Guide For Otto Robots
35 pages
Walkthrough of An iOS CTF
No ratings yet
Walkthrough of An iOS CTF
11 pages
AC500 - The Scalable PLC For Customized Automation: Technical Information
No ratings yet
AC500 - The Scalable PLC For Customized Automation: Technical Information
43 pages
Modern Networking
No ratings yet
Modern Networking
122 pages
BalaBit Comply ISO 27011
No ratings yet
BalaBit Comply ISO 27011
6 pages
Switching Techniques
No ratings yet
Switching Techniques
18 pages