ClickHouse_grokking

Uploaded by

vanvan99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

ClickHouse_grokking

Uploaded by

vanvan99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

ClickHouse

DBMS for data analytics

Hung Vo
[email protected]
Introduction
●
An open source column-oriented database
management system capable of real time
generation of analytical data reports using SQL
queries.
Introduction
●
Blazing Fast
●
Linearly Scalable
●
Hardware Efcient
●
Fault Tolerant
●
Feature Rich
●
Highly Reliable
●
Simple and Handy
Key Features
●
True column-oriented storage
●
Vectorized query execution
●
Data compression
●
Parallel and distributed query execution
●
Real time query processing
●
Real time data ingestion
●
On-disk locality of reference
●
Cross-datacenter replication
●
High availability
●
SQL support
Key Features
●
Local and distributed joins
●
Pluggable external dimension tables
●
Arrays and nested data types
●
Approximate query processing
●
Probabilistic data structures
●
Full support of IPv6
●
Features for web analytics
●
State-of-the-art algorithms
●
Detailed documentation
●
Clean documented code
Feature Rich
●
ClickHouse features a user-friendly SQL query dialect with a number of built-in
analytics capabilities. For example, it includes probabilistic data structures for fast
and memory-efcient calculation of cardinalities and quantiles. There are
functions for working dates, times and time zones, as well as some specialized ones
like addressing URLs and IPs (both IPv4 and IPv6) and many more.
●
Data organizing options available in ClickHouse, such as arrays, array joins, tuples
and nested data structures, are extremely efcient for managing denormalized
data.
●
Using ClickHouse allows joining both distributed data and co-located data, as the
system supports local joins and distributed joins. It also ofers an opportunity to use
external dictionaries, dimension tables loaded from an external source, for seamless
joins with simple syntax.
●
ClickHouse supports approximate query processing – you can get results as fast as
you want, which is indispensable when dealing with terabytes and petabytes of data.
●
The system's conditional aggregate functions, calculation of totals and extremes,
allow getting results with a single query without having to run a number of them.
When to use ClickHouse
For analytics over stream of clean, well structured and immutable events or
logs. It is recommended to put each such stream into a single wide fact table
with pre-joined dimensions.
●
Web and App analytics
●
Advertising networks and RTB
●
Telecommunications
●
E-commerce and fnance
●
Information security
●
Monitoring and telemetry
●
Time series
●
Business intelligence
●
Online games
●
Internet of Things
When NOT to use
●
Transactional workloads (OLTP): ClickHouse doesn't have
UPDATE statement and full-featured transactions.
●
Key-value access with high request rate: If you want high load
of small single-row queries, please use another system.
●
Blob-store, document oriented: ClickHouse is intended for
vast amount of fne-grained data.
●
Over-normalized data: Better to make up single wide fact
table with pre-joined dimensions.
Interfaces
●
HTTP REST
●
clickhouse-client
●
JDBC (production), ODBC (beta)

Languages
●
Python, PHP, Perl, Go,
●
Node.js, Ruby, C++, .NET, Scala, R, Julia, Rust

Input/Output data formats

●
CSV, TSV, JSON, CapnProto, XML
Data types
●
UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
●
Float32, Float64
●
Boolean: UInt8 type, but restricted value 0 or 1
●
String, FixedString(N)
●
Date/DateTime
●
Enum
●
Array
●
AggregateFunction
●
Tuple
●
Nested data structure
SQL
●
SELECT
●
INSERT INTO
●
CREATE DATABASE/TABLE/[MATERIALIZED] VIEW
●
ALTER: Column Manipulations, Partitions/Parts
●
ALTER TABLE DELETE WHERE….
●
ATTACH, DROP, DETACH, RENAME, USE, SET,
OPTIMIZE, KILL QUERY
SQL SELECT
SQL Functions
●
Arithmetic, Rounding, Mathematical
●
Comparison, Logical, Conditional
●
Type conversion
●
Dates/Times
●
String
●
Bit, Hash, Array
●
URLs, IP, JSON
●
Geographical coordinates
●
Higher-order: lambda, arrayMap/Filter...
Aggregate Functions
●
Normal Aggregate functions:
– count, min, max, any*, sum*, avg, median
– stddev*, var*, covar*, corr
– uniq*, quantile*, topK
●
Aggregate function combinators: -If, -Array, -State, -Merge, -
MergeState, -ForEach
●
Parametric aggregate functions: sequenceMatch,
sequenceCount, windowFunnel, retention, uniqUpTo.
Table engines
●
MergeTree family
●
TinyLog, Log, Memory, Bufer, External data
●
Distributed, Dictionary, Merge, File, URL
●
View, MaterializedView
●
Integrations: Kafka, MySQL
Table engines - MergeTree
●
Stores data sorted by primary key: This allows you to create a
small sparse index that helps fnd data faster.
●
This allows you to use partitions if the partitioning key is
specifed: ClickHouse supports certain operations with partitions
that are more efective than general operations on the same
data with the same result. ClickHouse also automatically cuts
of the partition data where the partitioning key is specifed in
the query. This also increases the query performance.
●
Data replication support: The family of ReplicatedMergeTree
tables is used for this.
●
Data sampling support.
MergeTree family
●
MergeTree
●
ReplacingMergeTree: removes duplicate entries with the same
primary key value
●
SummingMergeTree: totals data while merging
●
AggregatingMergeTree: the merge combines the states of
aggregate functions stored in the table for rows with the same
primary key value
●
CollapsingMergeTree: allows automatic deletion, or "collapsing"
certain pairs of rows when merging.
●
GraphiteMergeTree: designed for rollup (thinning and
aggregating/averaging) Graphite data
THANK YOU!!

Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
From Everand
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
Robert Johnson
No ratings yet
Data Analysis With Databricks Version 2
No ratings yet
Data Analysis With Databricks Version 2
137 pages
M1 - Introduction To Data Engineering Slides
No ratings yet
M1 - Introduction To Data Engineering Slides
62 pages
Data Analysis With Databricks
75% (4)
Data Analysis With Databricks
80 pages
Click House
No ratings yet
Click House
14 pages
Quick Tour of ClickHouse Internals
No ratings yet
Quick Tour of ClickHouse Internals
34 pages
Multi-Terabyte MySQL Data Warehouses - Absolutely! Presentation
100% (1)
Multi-Terabyte MySQL Data Warehouses - Absolutely! Presentation
33 pages
CURSO GOOGLE DATA ENGINEER
No ratings yet
CURSO GOOGLE DATA ENGINEER
36 pages
Chapter 4 Clickhouse Bigdata V3.5 Questions
No ratings yet
Chapter 4 Clickhouse Bigdata V3.5 Questions
5 pages
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
Really Big Elephants: Data Warehousing Postgresql
No ratings yet
Really Big Elephants: Data Warehousing Postgresql
62 pages
Module 6
No ratings yet
Module 6
7 pages
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
No ratings yet
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
50 pages
So-Thats-Why-Its-So-Fast-An-Introduction-to-ClickHouse-Internals-2022-05-16
No ratings yet
So-Thats-Why-Its-So-Fast-An-Introduction-to-ClickHouse-Internals-2022-05-16
48 pages
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
No ratings yet
Matthieu - Lamairesse - Reda - Khouani - Why The Best Serverless Data Warehouse Is A Lakehouse - (DAIWT - PARIS)
38 pages
Clickhouse en
No ratings yet
Clickhouse en
673 pages
ClickHouse原理解析与应用实践
No ratings yet
ClickHouse原理解析与应用实践
501 pages
COMP321F Database
No ratings yet
COMP321F Database
31 pages
sqf6 Clickhouse Guide Sample
No ratings yet
sqf6 Clickhouse Guide Sample
14 pages
2 Data Warehouse 2
No ratings yet
2 Data Warehouse 2
57 pages
Data Engineering Fundamentals
No ratings yet
Data Engineering Fundamentals
29 pages
PRE 6 FINALS
No ratings yet
PRE 6 FINALS
9 pages
Guide to Data Warehousing in the Lakehouse 1731468863
No ratings yet
Guide to Data Warehousing in the Lakehouse 1731468863
55 pages
Module-1
No ratings yet
Module-1
78 pages
Greenplum Architecture, Administration, and
No ratings yet
Greenplum Architecture, Administration, and
573 pages
Unit IV - Data Warehousing and OLAP Technologies
No ratings yet
Unit IV - Data Warehousing and OLAP Technologies
68 pages
clickhouse_Q1
No ratings yet
clickhouse_Q1
1 page
Advanced PostgreSQL Mastery: In-Depth Database Techniques and Performance Tuning
From Everand
Advanced PostgreSQL Mastery: In-Depth Database Techniques and Performance Tuning
Adam Jones
No ratings yet
CS408: Data Warehousing: Welcome To Course
No ratings yet
CS408: Data Warehousing: Welcome To Course
45 pages
WP Dremio Definitive Guide To The Data Lakehouse
No ratings yet
WP Dremio Definitive Guide To The Data Lakehouse
20 pages
Unit 3
No ratings yet
Unit 3
27 pages
Unit 5 Da
No ratings yet
Unit 5 Da
41 pages
SQLGraph - When ClickHouse Marries Graph Processing Amoisbird PDF
0% (1)
SQLGraph - When ClickHouse Marries Graph Processing Amoisbird PDF
35 pages
First Data WarehouseAima First Final Updated 9 Sep 2016
No ratings yet
First Data WarehouseAima First Final Updated 9 Sep 2016
188 pages
2 Data Warehouse 2
No ratings yet
2 Data Warehouse 2
57 pages
8 Data Warehousing
No ratings yet
8 Data Warehousing
113 pages
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Unstructured Data: User Price Shipped
No ratings yet
Unstructured Data: User Price Shipped
14 pages
DATA WAREHOUSE - Pertemuan01
No ratings yet
DATA WAREHOUSE - Pertemuan01
20 pages
Traditional Enterprise BI
No ratings yet
Traditional Enterprise BI
47 pages
Mastering PrestoDB: Fast SQL Analytics at Scale
From Everand
Mastering PrestoDB: Fast SQL Analytics at Scale
Robert Johnson
No ratings yet
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
DataWarehousing - Powerpoint Canadien Cs - Sfu.ca 2e Version
No ratings yet
DataWarehousing - Powerpoint Canadien Cs - Sfu.ca 2e Version
14 pages
Databricks Certified Data Analyst Associate (1)
No ratings yet
Databricks Certified Data Analyst Associate (1)
110 pages
DataEngg Day2 v1
No ratings yet
DataEngg Day2 v1
28 pages
On-Line Application Processing
No ratings yet
On-Line Application Processing
20 pages
Unit-1 4
No ratings yet
Unit-1 4
54 pages
Lesson 4
No ratings yet
Lesson 4
41 pages
Iare DWDM PPT Cse
No ratings yet
Iare DWDM PPT Cse
249 pages
DWH
No ratings yet
DWH
7 pages
GUNADWDM
No ratings yet
GUNADWDM
105 pages
Datawre House and Mining Question With Answer
No ratings yet
Datawre House and Mining Question With Answer
56 pages
Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann
No ratings yet
Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann
55 pages
1684245766488
No ratings yet
1684245766488
33 pages
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
From Everand
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
Robert Johnson
No ratings yet
Data Warehousing and Decision Support
No ratings yet
Data Warehousing and Decision Support
8 pages
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
119 pages
Lecture 1 - Introduction: Data Warehouses, Business Intelligence, Data Mining
No ratings yet
Lecture 1 - Introduction: Data Warehouses, Business Intelligence, Data Mining
41 pages
Data Warehousing: Special Thanks To: Liem Tran, Robert Turan, and Miguel Delgado
No ratings yet
Data Warehousing: Special Thanks To: Liem Tran, Robert Turan, and Miguel Delgado
46 pages
2 Data Warehouse
No ratings yet
2 Data Warehouse
61 pages
BCTN - Tran Trang Nha - 202674
No ratings yet
BCTN - Tran Trang Nha - 202674
59 pages
2MS ___ Sequence 02__ me-and-my-shopping ___ By Teacher Toula Batoul 2017-2018
No ratings yet
2MS ___ Sequence 02__ me-and-my-shopping ___ By Teacher Toula Batoul 2017-2018
15 pages
Verilog Notes
No ratings yet
Verilog Notes
34 pages
Panduan Skripsi IAIN LANGSA
No ratings yet
Panduan Skripsi IAIN LANGSA
47 pages
Gerunds & Infinitives
No ratings yet
Gerunds & Infinitives
6 pages
Weekly Home Learning Plan Grade Ii First Quarter Week 4: Department of Education
100% (1)
Weekly Home Learning Plan Grade Ii First Quarter Week 4: Department of Education
3 pages
Research Proposal:: The Effects of The Contemporary 1
No ratings yet
Research Proposal:: The Effects of The Contemporary 1
5 pages
UNIT 2 - LESSON 1 - PART 1 - Vocabulary and Listening
No ratings yet
UNIT 2 - LESSON 1 - PART 1 - Vocabulary and Listening
5 pages
Project-Based Learning
No ratings yet
Project-Based Learning
14 pages
Assignment No 1: Abasyn University Peshawar
No ratings yet
Assignment No 1: Abasyn University Peshawar
6 pages
Islam Sobhy Resume
No ratings yet
Islam Sobhy Resume
2 pages
Biography of Hadrat Syed Dawood Hussain Shirazi
No ratings yet
Biography of Hadrat Syed Dawood Hussain Shirazi
17 pages
6G Unit 1 Vocabulary Quiz STUDENT COPY Ed2.0 - Google Docs
No ratings yet
6G Unit 1 Vocabulary Quiz STUDENT COPY Ed2.0 - Google Docs
5 pages
P5B Should,Must,Need,Have To
No ratings yet
P5B Should,Must,Need,Have To
5 pages
Fish And Chips Oxford Reading Tree Stage 2 Songbirds Donaldson Julia instant download
No ratings yet
Fish And Chips Oxford Reading Tree Stage 2 Songbirds Donaldson Julia instant download
38 pages
Full Download Fundamentalism and American Culture New Edition George M. Marsden PDF
100% (12)
Full Download Fundamentalism and American Culture New Edition George M. Marsden PDF
60 pages
Yiddish Language
No ratings yet
Yiddish Language
34 pages
Balaguruswamy
50% (2)
Balaguruswamy
34 pages
IP_2
No ratings yet
IP_2
9 pages
The Boys S03E08 WEBRip x264-ION10 SRT
No ratings yet
The Boys S03E08 WEBRip x264-ION10 SRT
72 pages
ATM Test Cases and Defect Report
No ratings yet
ATM Test Cases and Defect Report
2 pages
RAILWAY RESERVATION SYSTEM DB
No ratings yet
RAILWAY RESERVATION SYSTEM DB
8 pages
Python Lab Questions
No ratings yet
Python Lab Questions
1 page
ECE 606, Fall 2019, Assignment 3: Zhijie Wang, Student ID Number: 20856733 Zhijie - Wang@uwaterloo - Ca September 24, 2019
No ratings yet
ECE 606, Fall 2019, Assignment 3: Zhijie Wang, Student ID Number: 20856733 Zhijie - Wang@uwaterloo - Ca September 24, 2019
1 page
Action Script Tutorial
No ratings yet
Action Script Tutorial
29 pages
Introduction To Scilab and Its Benefits
No ratings yet
Introduction To Scilab and Its Benefits
5 pages
Arid Agriculture University, Rawalpindi
No ratings yet
Arid Agriculture University, Rawalpindi
4 pages
SIMATIC PCS 7 Process Control System
No ratings yet
SIMATIC PCS 7 Process Control System
36 pages
RPH Eng Tahun 3 6 April 2021 PDPR
No ratings yet
RPH Eng Tahun 3 6 April 2021 PDPR
3 pages
Apjeas-2018 5 3 09
No ratings yet
Apjeas-2018 5 3 09
7 pages

ClickHouse_grokking

Uploaded by

ClickHouse_grokking

Uploaded by

ClickHouse

DBMS for data analytics

Input/Output data formats

You might also like