0% found this document useful (0 votes)
204 views

Apache Druid: Sudhindra Tirupati Nagaraj

Apache Druid is an open source distributed data store designed for real-time analytics on large datasets. It was started in 2011 at Metamarkets to address the slow performance of HBase for aggregate queries. Druid uses a columnar data storage format with techniques like dictionary encoding, bitmap indexes, and compression to enable fast filtering and aggregation. It is horizontally scalable and supports high ingest and query throughput. Druid is used by many large companies for applications like analytics dashboards and real-time metrics.

Uploaded by

Ankit Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
204 views

Apache Druid: Sudhindra Tirupati Nagaraj

Apache Druid is an open source distributed data store designed for real-time analytics on large datasets. It was started in 2011 at Metamarkets to address the slow performance of HBase for aggregate queries. Druid uses a columnar data storage format with techniques like dictionary encoding, bitmap indexes, and compression to enable fast filtering and aggregation. It is horizontally scalable and supports high ingest and query throughput. Druid is used by many large companies for applications like analytics dashboards and real-time metrics.

Uploaded by

Ankit Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Apache Druid

http://static.druid.io/docs/druid.pdf

Sudhindra Tirupati Nagaraj


History and Trivia

● Started in 2011 within a ad-tech company called Metamarkets

● Initially considered HBase, which was too slow for aggregate queries

● A real-time and batch analytics data store

● Open sourced in 2012

● Apache Druid project in 2015

● Used by thousands of companies like Netflix, Lyft, Twitter, Cisco etc


Architecture
Caching

● Brokers have segment caches (either local heap memory/memcached)

● Historicals cache segments loaded from deep storage

Load Balancing

● Coordinators periodically read segments and assign to historicals (discovered via zk)

● Use query patterns to determine cost based optimization to spread or co-locate segments from different
data sources

Availability

● Periodic persisting to disk of real-time nodes, disks backed up.

● Zookeeper/MySQL failure will not affect data availability for querying

Rules

● Hot-tier vs Cold-tier in historicals. Rules configure query SLA’s


Why Druid does not do joins

● Scaling joins is hard

● Gains of supporting joins is offset by problems due to high throughput, join heavy
workloads

● Possible to materialize columns into streams and perform hash-based/sort-merge join, but
requires lot of computation
Storage Format

● Columnar storage (similar to HBase)

● Data tables called Data Sources (similar to Rockset collections). Table partitioned into
“segments”. Segment is typically 5-10 million rows, spanning a period of time. Segments are
immutable.

● Multiple column types in segment => different encoding/compression techniques


Example: String columns use dictionary encoding. Numeric columns use raw values. Post encoding, compression LZF.

John Smith -> 0


Jane Doe -> 1

Name column: [0, 1, 0]


Timestamp Name City Expenses

T0 John Smith San Francisco 1000

T1 Jane Doe San Francisco 2000

T2 John Smith San Francisco 500


Filtering

● Filtering on aggregated results (eg: sum of all expenses for San Francisco)

● Binary bitmap as indices. For example, for each city, there can be a binary bitmap indicating the
rows containing that city. Example: San Francisco -> [0, 1, 2] -> [1, 1, 1, 0], New York -> [3] -> [0, 0, 0, 1]

The bitmap can be compressed further using bitmap compression algorithms (Druid uses
Concise). We can also combine 2 bitmaps. For example, sum of all expenses in San Francisco
and New York is obtained by: [1, 1, 1, 0] OR [0, 0, 0, 1] -> [1, 1, 1, 1]

Timestamp Name City Characters Added

T0 John Smith San Francisco 1000

T1 Jane Doe San Francisco 2000

T2 John Smith San Francisco 500

T3 John Smith New York 200


Query Language

● No SQL.

● HTTP POST: Response:

[
Request: {
"timestamp": "2012-01-01T00:00:00.000Z",
{ "result": {
"rows": 393298
"queryType": "timeseries",
}
"dataSource": "wikipedia", },
"intervals": "2013-01-01/2013-01-08", {
"filter": { "timestamp": "2012-01-02T00:00:00.000Z",
"type": "selector", "result": {
"dimension": "page", "rows": 382932
"value": "Ke$ha" }
}, },
"granularity": "day", ...
{
"aggregations": [
"timestamp": "2012-01-07T00:00:00.000Z",
{ "result": {
"type": "count", "rows": 1337
"name": "rows" }
} }
] ]
}
Query Performance

Test data:

30% standard aggregate queries


60% group by over aggregates
10% search

Results: Average ~550ms, 95th percentile 2 seconds, 99th percentile 10 seconds

Cardinality of a dimension matters a lot!


Query Performance with scaling

Linear scaling helped mostly simple aggregate


queries.
Ingest Performance

● Depends on the source, more than


dimensions/metrics

● Achieves peak rate of 800k events/s/core with only


timestamp data.

● Once source is discounted, cost is mainly


deserialization.
Thank You

You might also like