0% found this document useful (0 votes)
70 views5 pages

A Course in In-Memory Data Management: Prof. Hasso Plattner

This chapter discusses tuple reconstruction in row-oriented and column-oriented databases stored in main memory. Tuple reconstruction is needed when a user requests multiple columns of a tuple, such as when viewing or editing a record. In a row-oriented layout, all attributes of a tuple are stored sequentially, allowing the tuple to be reconstructed with few cache accesses. In contrast, a column-oriented layout stores attributes of different tuples together, requiring a cache access for each attribute needed, making tuple reconstruction less efficient than in a row-oriented layout. The performance difference increases as the number of attributes in each tuple grows. Selecting only necessary fields can reduce the disadvantages of column-oriented layouts for tuple reconstruction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views5 pages

A Course in In-Memory Data Management: Prof. Hasso Plattner

This chapter discusses tuple reconstruction in row-oriented and column-oriented databases stored in main memory. Tuple reconstruction is needed when a user requests multiple columns of a tuple, such as when viewing or editing a record. In a row-oriented layout, all attributes of a tuple are stored sequentially, allowing the tuple to be reconstructed with few cache accesses. In contrast, a column-oriented layout stores attributes of different tuples together, requiring a cache access for each attribute needed, making tuple reconstruction less efficient than in a row-oriented layout. The performance difference increases as the number of attributes in each tuple grows. Selecting only necessary fields can reduce the disadvantages of column-oriented layouts for tuple reconstruction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Prof.

Hasso Plattner

A Course in
In-Memory Data Management
The Inner Mechanics
of In-Memory Databases

September 1, 2013

This learning material is part of the reading material for Prof.


Plattner’s online lecture "In-Memory Data Management" taking place at
www.openHPI.de. If you have any questions or remarks regarding the
online lecture or the reading material, please give us a note at openhpi-
[email protected]. We are glad to further improve the material.
Chapter 13
Tuple Reconstruction

13.1 Introduction

As mentioned earlier, data resembling a table can be stored in linear memory


either column by column (columnar layout) or row by row (row layout) . The
impacts were already discussed in Chapter 8 in more detail. The columnar
layout is optimized for analytical set-based operations that work on many
rows but for a notably smaller subset of all columns of data. The row layout
shows a better performance for select operations on few single tuples. In
this chapter, we discuss the operations needed for tuple reconstruction in
detail and explain the influence of the di↵erent layouts on the performance
of these operations. Tuple reconstruction is a typical functionality in OLTP
applications. It is executed whenever more than one column is requested
from the database, for example when the user in an ERP system calls the
"show" or "edit" transactions for the master data object or for a document.
To explain the influence of the main memory layout organization on the
performance of the tuple reconstruction operation, we have to consider the
notion of the cache access and the size of the cache line. A CPU cache is a
cache used by the central processing unit of a computer to reduce the average
time to access memory. The cache is a smaller, faster memory which stores
copies of the data from the most frequently used main memory locations.
Memory cache is organized in 32 or 64 byte long cache lines. Even when
reading just one byte from the memory, the CPU reads a complete cache
line and places it into the cache. This characteristic of a cache will help us to
estimate the response time for the tuple reconstruction operations for both
layouts.

85
86 13 Tuple Reconstruction

13.2 Tuple Reconstruction in Row-Oriented Databases

First, let us consider an example using the row layout. Let us assume, we
need to reconstruct the tuple knowing the position of the tuple. As a first
example, we take into account the following properties of the tuple:
• the size of one tuple is 200 byte;
• the number of attributes in the tuple is 6.
To estimate the result, we also need the following parameters:
• speed of the read operation from main memory: 2 MB/ms/core;
• we consider 64 byte long cache lines;
• all calculations will be done for one core per CPU. If we consider more
cores, the performance will increase appropriately.
Let us calculate how much time the read operation for the tuple recon-
struction will take in this case considering that the data is organized using
row layout. The operation is executed relatively fast, as all attributes are
stored sequentially. Considering a size of 200 bytes per tuple, we will need
4 cache accesses (d 200
64 e = 4) to read the whole tuple from main memory. The
CPU reads a bit more than the size of a tuple (200 byte) in this case, because it
will read a complete cache line for every cache access (in case of a row layout,
the CPU will load some data of the following tuple to the cache). Thus, we
read 256 byte from main memory. Considering the speed 2 MB/ms/core, we
can calculate the time as described below:
256 byte
Tuple reconstruction response time (row layout) =
2, 000, 000 byte/ms/core
= 0.128 microseconds

13.3 Tuple Reconstruction in Column-Oriented Databases

Now let us estimate the processing time for the same operation and tuples
with the same characteristics but taking into account that the data is orga-
nized in a columnar layout. The data is stored attribute-wise in this case. To
reconstruct the tuple, the CPU cannot just sequentially read data from mem-
ory. It needs to do cache accesses for every attribute of the tuple required
for the tuple reconstruction. Therefore, knowing the implicit recordID of
the tuple to be reconstructed, it will “jump” between the attributes of the
tuple to collect the values. Let us calculate how much time the read opera-
tion for the tuple reconstruction will take in this case. Considering that the
reconstructed tuple has 6 attributes and that for a complete read of every
attribute one cache access is required, we will need 6 cache accesses to read
all attributes of the tuple from main memory. Taking into account a cache
13.4 Further Examples and Discussion 87

line size of 64 byte, the CPU needs to read: 64 byte · 6 = 384 byte from main
memory. The CPU reads more than the size of a tuple (200 byte) in this case,
because it will read a complete cache line for every memory access (using
a columnar layout, the CPU will load some additional attributes’ values of
the following tuples). Considering the speed 2 MB/ms/core, we can calculate
the time as described below:
384 byte
Tuple reconstruction response time(column layout) =
2, 000, 000 byte/ms/core
= 0.192 microseconds

In this simple example, performance of the tuple reconstruction operation


using a columnar layout is 50% worse in comparison with the row layout.
The di↵erence in the response time can be even higher if we consider an
example for a tuple with a larger number of attributes.

13.4 Further Examples and Discussion

In reality, the number of attributes in the tables of business applications is


much larger. As an example, let us calculate the response time for tuple
reconstruction with the following characteristics:
• The size of one tuple is 3,200 byte. For the response time of the column
layout calculation, we also consider that for every attribute of the tuple,
one cache access is enough to read the whole attribute of the tuple.
• The number of attributes in the tuple is 100.
Let us calculate response times for the tuple reconstruction operation for
both layouts considering the same CPU characteristics that were described
in the example above.
Row layout:
50 cache accesses are required for a CPU to read the whole tuple: 50 · 64 byte
= 3,200 byte.

3, 200 byte
Tuple reconstruction response time (row layout) =
2, 000, 000 byte/ms/core
= 1.6 microseconds

Columnar layout:
100 cache accesses are required in case of the columnar layout to read the
attributes of the tuple: 100 · 64 byte = 6,400 byte.
88 13 Tuple Reconstruction

6, 400 byte
Tuple reconstruction response time (column layout) =
2, 000, 000 byte/ms/core
= 3.2 microseconds

This example shows how the number of attributes of the tuple can influence
the response time for both layouts. The performance for tuple reconstruction
of the columnar layout will become progressively worse in comparison to the
row store when we increase the number of a tuple’s attributes and request
all attributes.
We can conclude that it is important to select only the necessary fields of
a tuple. This way, the potential disadvantage of tuple reconstruction using a
columnar layout can be reduced to a minimum.

You might also like