ClickHouse_grokking
ClickHouse_grokking
Hung Vo
[email protected]
Introduction
●
An open source column-oriented database
management system capable of real time
generation of analytical data reports using SQL
queries.
Introduction
●
Blazing Fast
●
Linearly Scalable
●
Hardware Efcient
●
Fault Tolerant
●
Feature Rich
●
Highly Reliable
●
Simple and Handy
Key Features
●
True column-oriented storage
●
Vectorized query execution
●
Data compression
●
Parallel and distributed query execution
●
Real time query processing
●
Real time data ingestion
●
On-disk locality of reference
●
Cross-datacenter replication
●
High availability
●
SQL support
Key Features
●
Local and distributed joins
●
Pluggable external dimension tables
●
Arrays and nested data types
●
Approximate query processing
●
Probabilistic data structures
●
Full support of IPv6
●
Features for web analytics
●
State-of-the-art algorithms
●
Detailed documentation
●
Clean documented code
Feature Rich
●
ClickHouse features a user-friendly SQL query dialect with a number of built-in
analytics capabilities. For example, it includes probabilistic data structures for fast
and memory-efcient calculation of cardinalities and quantiles. There are
functions for working dates, times and time zones, as well as some specialized ones
like addressing URLs and IPs (both IPv4 and IPv6) and many more.
●
Data organizing options available in ClickHouse, such as arrays, array joins, tuples
and nested data structures, are extremely efcient for managing denormalized
data.
●
Using ClickHouse allows joining both distributed data and co-located data, as the
system supports local joins and distributed joins. It also ofers an opportunity to use
external dictionaries, dimension tables loaded from an external source, for seamless
joins with simple syntax.
●
ClickHouse supports approximate query processing – you can get results as fast as
you want, which is indispensable when dealing with terabytes and petabytes of data.
●
The system's conditional aggregate functions, calculation of totals and extremes,
allow getting results with a single query without having to run a number of them.
When to use ClickHouse
For analytics over stream of clean, well structured and immutable events or
logs. It is recommended to put each such stream into a single wide fact table
with pre-joined dimensions.
●
Web and App analytics
●
Advertising networks and RTB
●
Telecommunications
●
E-commerce and fnance
●
Information security
●
Monitoring and telemetry
●
Time series
●
Business intelligence
●
Online games
●
Internet of Things
When NOT to use
●
Transactional workloads (OLTP): ClickHouse doesn't have
UPDATE statement and full-featured transactions.
●
Key-value access with high request rate: If you want high load
of small single-row queries, please use another system.
●
Blob-store, document oriented: ClickHouse is intended for
vast amount of fne-grained data.
●
Over-normalized data: Better to make up single wide fact
table with pre-joined dimensions.
Interfaces
●
HTTP REST
●
clickhouse-client
●
JDBC (production), ODBC (beta)
Languages
●
Python, PHP, Perl, Go,
●
Node.js, Ruby, C++, .NET, Scala, R, Julia, Rust