A Course in In-Memory Data Management: Prof. Hasso Plattner
A Course in In-Memory Data Management: Prof. Hasso Plattner
Hasso Plattner
A Course in
In-Memory Data Management
The Inner Mechanics
of In-Memory Databases
September 1, 2013
13.1 Introduction
85
86 13 Tuple Reconstruction
First, let us consider an example using the row layout. Let us assume, we
need to reconstruct the tuple knowing the position of the tuple. As a first
example, we take into account the following properties of the tuple:
• the size of one tuple is 200 byte;
• the number of attributes in the tuple is 6.
To estimate the result, we also need the following parameters:
• speed of the read operation from main memory: 2 MB/ms/core;
• we consider 64 byte long cache lines;
• all calculations will be done for one core per CPU. If we consider more
cores, the performance will increase appropriately.
Let us calculate how much time the read operation for the tuple recon-
struction will take in this case considering that the data is organized using
row layout. The operation is executed relatively fast, as all attributes are
stored sequentially. Considering a size of 200 bytes per tuple, we will need
4 cache accesses (d 200
64 e = 4) to read the whole tuple from main memory. The
CPU reads a bit more than the size of a tuple (200 byte) in this case, because it
will read a complete cache line for every cache access (in case of a row layout,
the CPU will load some data of the following tuple to the cache). Thus, we
read 256 byte from main memory. Considering the speed 2 MB/ms/core, we
can calculate the time as described below:
256 byte
Tuple reconstruction response time (row layout) =
2, 000, 000 byte/ms/core
= 0.128 microseconds
Now let us estimate the processing time for the same operation and tuples
with the same characteristics but taking into account that the data is orga-
nized in a columnar layout. The data is stored attribute-wise in this case. To
reconstruct the tuple, the CPU cannot just sequentially read data from mem-
ory. It needs to do cache accesses for every attribute of the tuple required
for the tuple reconstruction. Therefore, knowing the implicit recordID of
the tuple to be reconstructed, it will “jump” between the attributes of the
tuple to collect the values. Let us calculate how much time the read opera-
tion for the tuple reconstruction will take in this case. Considering that the
reconstructed tuple has 6 attributes and that for a complete read of every
attribute one cache access is required, we will need 6 cache accesses to read
all attributes of the tuple from main memory. Taking into account a cache
13.4 Further Examples and Discussion 87
line size of 64 byte, the CPU needs to read: 64 byte · 6 = 384 byte from main
memory. The CPU reads more than the size of a tuple (200 byte) in this case,
because it will read a complete cache line for every memory access (using
a columnar layout, the CPU will load some additional attributes’ values of
the following tuples). Considering the speed 2 MB/ms/core, we can calculate
the time as described below:
384 byte
Tuple reconstruction response time(column layout) =
2, 000, 000 byte/ms/core
= 0.192 microseconds
3, 200 byte
Tuple reconstruction response time (row layout) =
2, 000, 000 byte/ms/core
= 1.6 microseconds
Columnar layout:
100 cache accesses are required in case of the columnar layout to read the
attributes of the tuple: 100 · 64 byte = 6,400 byte.
88 13 Tuple Reconstruction
6, 400 byte
Tuple reconstruction response time (column layout) =
2, 000, 000 byte/ms/core
= 3.2 microseconds
This example shows how the number of attributes of the tuple can influence
the response time for both layouts. The performance for tuple reconstruction
of the columnar layout will become progressively worse in comparison to the
row store when we increase the number of a tuple’s attributes and request
all attributes.
We can conclude that it is important to select only the necessary fields of
a tuple. This way, the potential disadvantage of tuple reconstruction using a
columnar layout can be reduced to a minimum.