
2 S. CHAMBI, D. LEMIRE, O. KASER, R. GODIN
second most significant bit indicates the bit value of the homogeneous word sequence, while the
remaining w − 2 bits store the run length of the homogeneous word sequence.
When compressing a sparse bitmap, e.g., corresponding to the set {0, 2(w − 1), 4(w − 1), . . .},
WAH can use 2w bits per set bit. Concise reduces this memory usage by half [1]. It uses a similar
format except for coded fill words. Instead of storing the run length r using w − 2 bits, Concise uses
only w − 2 − dlog
2
(w)e bits, setting aside dlog
2
(w)e bits as position bits. These dlog
2
(w)e position
bits encode a number p ∈ [0, w). When p = 0, we decode r + 1 fill words. When it is non-zero, we
decode r fill words preceded by a word that has its (p − 1)
th
bit flipped compared to the following
fill words. Consider the case where w = 32. Concise can code the set {0, 62, 124, . . .} using only
32 bits/integer, in contrast to WAH which requires 64 bits/integer.
Though they reduce memory usage, these formats derived from BBC have slow random access
compared to an uncompressed bitmap. That is, checking or changing the i
th
bit value is an O(n)-
time operation. Thus, though they represent an integer set, we cannot quickly check whether an
integer is in the set. This makes them unsuitable for some applications [8]. Moreover, RLE formats
have a limited ability to quickly skip data. For example, suppose that we are computing the bitwise
AND between two compressed bitmaps. If one bitmap has long runs of zeros, we might wish to
skip over the corresponding words in the other bitmap. Without an auxiliary index, this might be
impossible with formats like WAH and Concise.
Instead of using RLE and sacrificing random access, we propose to partition the space [0, n)
into chunks and to store dense and sparse chunks differently [9]. On this basis, we introduce a new
bitmap compression scheme called Roaring. Roaring bitmaps store 32-bit integers in a compact
and efficient two-level indexing data structure. Dense chunks are stored using bitmaps; sparse
chunks use packed arrays of 16-bit integers. In our example ({0, 62, 124, . . .}), it would use only
≈ 16 bits/integer, half of Concise’s memory usage. Moreover, on the synthetic-data test proposed
by Colantonio and Di Pietro [1], it is at least four times faster than WAH and Concise. In some
instances, it can be hundreds of times faster.
Our approach is reminiscent of O’Neil and O’Neil’s RIDBit external-memory system. RIDBit
is a B-tree of bitmaps, where a list is used instead when a chunk’s density is too small. However
RIDBit fared poorly compared to FastBit—a WAH-based system [10]: FastBit was up to 10× faster.
In contrast to the negative results of O’Neil et al., we find that Roaring bitmaps can be several
times faster than WAH bitmaps for in-memory processing. Thus one of our main contributions is
to challenge the belief—expressed by authors such as by Colantonio and Di Pietro [1]—that WAH
bitmap compression is the most efficient alternative.
A key ingredient in the performance of Roaring bitmaps are the new bit-count processor
instructions (such as popcnt) that became available on desktop processors more recently (2008).
Previously, table lookups were often used instead in systems like RIDBit [11], but they can be
several times slower. These new instructions allow us to quickly compute the density of new chunks,
and to efficiently extract the location of the set bits from a bitmap.
To surpass RLE-based formats such as WAH and Concise, we also rely on several algorithmic
strategies (see § 4). For example, when intersecting two sparse chunks, we may use an approach
based on binary search instead of a linear-time merge like RIDBit. Also, when merging two chunks,
we predict whether the result is dense or sparse to minimize wasteful conversions. In contrast,
O’Neil et al. report that RIDBit converts chunks after computing them [11].
2. ROARING BITMAP
We partition the range of 32-bit indexes ([0, n)) into chunks of 2
16
integers sharing the same 16 most
significant digits. We use specialized containers to store their 16 least significant bits.
When a chunk contains no more than 4096 integers, we use a sorted array of packed 16-bit
integers. When there are more than 4096 integers, we use a 2
16
-bit bitmap. Thus, we have two types
of containers: an array container for sparse chunks and a bitmap container for dense chunks. The
4096 threshold insures that at the level of the containers, each integer uses no more than 16 bits: we