100% found this document useful (1 vote)
148 views

Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei

TrinoDB is a zero-copy lakehouse solution that provides interactive queries across data sources through its SQL query engine and advanced optimization capabilities. It requires no ETL processes and can integrate various data sources and microservices through custom connectors. TrinoDB's query planner leverages techniques like predicate pushdown, dynamic filtering, and index joins to optimize queries and achieve interactive response times for analytical workloads across large datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
148 views

Trino (Presto) DB: Zero Copy Lakehouse: Artem Aliev Huawei

TrinoDB is a zero-copy lakehouse solution that provides interactive queries across data sources through its SQL query engine and advanced optimization capabilities. It requires no ETL processes and can integrate various data sources and microservices through custom connectors. TrinoDB's query planner leverages techniques like predicate pushdown, dynamic filtering, and index joins to optimize queries and achieve interactive response times for analytical workloads across large datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Trino(Presto)DB:

Zero Copy Lakehouse


Artem Aliev
Huawei
Artem Aliev
• Huawei Cloud Hybrid Integration Platform
• Expert and solution architect
• 20+ years in Software Development
• Big data platforms integrations
• Apache Hadoop, Spark, Cassandra, TinkerPop
• Storage optimizations
• JVM development
• SpbU teacher

[email protected]
Application scenarios
• Data enrichment and composition services
• Multi-datasource, multi-cloud, micro service environment
• Exploration analytic
• What else we have for analyses?
• Fraud/Security breach detection and prevention
• ML model inference
Application scenarios
• Data enrichment and composition services
• Multi-datasource, multi-cloud, micro service environment
• Exploration analytic
• What else we have for analyses?
• Fraud/Security breach detection and prevention
• ML model inference
Requirements
• Interactive queries (join queries)
• Seconds for analytics
• Sub-seconds for user services
• Different Data Sources
• SQL/NoSQL databases
• S3 files and Hadoop Systems
• REST Services
• Consistent up-to-date results
• Open Source
Warehouse District

Example (tpc-c) 100 rows


History
3 000 000
1000 rows

rows Customer
Show user history for the Stock 3 000 000 rows
10 000 000
given warehouse. rows New-Order
900 000 rows

Item Order-Line Order


select distinct i_name, i_price 100 000
30 000 000 rows 3 000 000 rows
from warehouse rows
join district on (w_id = d_w_id)
join customer on (d_w_id = c_w_id and d_id = c_d_id)
join orders on (o_w_id = w_id and o_d_id = d_id and o_c_id = c_id)
join order_line on (o_w_id = ol_w_id and o_d_id = ol_d_id and o_id = ol_o_id)
join stock on (ol_supply_w_id = s_w_id and ol_i_id = s_i_id)
join item on (s_i_id = i_id)
where w_id = 50 and c_id = 101; seconds
MPP DB 20-80
Tuned Trino 4
Postgres 0.7
Traditional Stack
• Data Lake
• Hive, Spark, Impala, Trino, Drill, Dremio*
• Data warehouse
• ClickHouse, Greenplum, Vertica*
• Data marts
• Postgres, Mysql, ClickHouse
ETL/ELT from sources to data marts
• Nightly by batches
• Streaming
• Fast
• Need special database to enrich and join data in the stream
• Redis, Cassandra, etc..
• Eager enrichments
• Both fights with:
• Data source model changes
• Loading failures
• Inconsistent loading
Databricks Solution: Lakehouse
Databricks Solution: Lakehouse
NO ETL!
• Big Data as usual DataBase
• Direct request to Data Sources
Micro service architecture support
• A lot of small exotic databases
• “Agile” development with a lot of schema changes
• REST API data access only
• Pay per request
• Google API, etc
Feature requirements summary
• Schema changes tolerance
• Advanced pushdowns to data sources and optimizations
• Legacy databases are still better in indexing
• No ETL
• Extreme: No caches, local materialized views, reflections, etc.
• Avoid full scans
• REST endpoint support
• Open Source
Candidate tested
• Postgres with FDW
• Very old and unsupported plugins
• Pushdowns works only with other Postgres
• Drill – schema-free for Hadoop
• Not in active development
• Optimizer is not good
• TrinoDB
• Very easy REST connector development
• Dremio -- not really Open Source
• Hive, Spark – files and manual jdbc only
The winner is: Presto
• Facebook develop Presto at 2012 and release to OS at 2013
• 2019
• PrestoDB supported by Facebook in Linux Foundation
• https://github.com/prestodb/presto
• PrestoSQL supported by Starburst
• 2020 Renamed to TrinoDB
• https://github.com/trinodb/trino

• 2020 OpenLooKeng from Huawei


• https://gitee.com/openlookeng/hetu-core
• Cloud Services
TrinoDB/PrestoDB
• SQL
• 30+ connectors
• Easy to develop new connectors
• Dynamic Catalog
• Represent data as tables
• In schema, in catalog
• Common type system
• Type conversions for columns
• Query planner is types aware
Classical Distributed Architecture
Adding Datasouce
• Just drop a property file into etc/catalog directory
• File name is a catalog name
• Schemas and tables will be loaded from the connector

connector.name=postgresql
connection-url=jdbc:postgresql://localhost:5432/tpcc
connection-user=postgres
connection-password=password
Great Optimization Engine
• Cost based optimizations (CBO)
• Hive connector only 
• Pushdowns
• Predicate
• Optimizer propagates constants through joins
• Dynamic filtering support for joins (base on CBO)
• Projection
• Aggregation!
• JOIN*
• TOP-N and LIMITs
• ORDER BY ... LIMIT N or ORDER BY ... FETCH FIRST N ROWS
Warehouse District

Highly-Selective Join 100 rows


History
3 000 000
1000 rows

rows Customer
Show user history for given Stock 3 000 000 rows
10 000 000
warehouse. rows New-Order
900 000 rows

Item Order-Line Order


select distinct i_name, i_price 100 000
30 000 000 rows 3 000 000 rows
from warehouse rows
join district on (w_id = d_w_id)
join customer on (d_w_id = c_w_id and d_id = c_d_id)
join orders on (o_w_id = w_id and o_d_id = d_id and o_c_id = c_id)
join order_line on (o_w_id = ol_w_id and o_d_id = ol_d_id and o_id = ol_o_id)
join stock on (ol_supply_w_id = s_w_id and ol_i_id = s_i_id)
join item on (s_i_id = i_id)
where w_id = 50 and c_id = 101; seconds
MPP DB 20-80
Tuned Trino 4
Postgres 0.7
Nested Loop Join
Nested Loop Join
First Attempt: Dynamic Filtering
• Collect ids from the right side
• Push ids to the left side join
• CBO is recommended
• Hive and Memory supported
• JDBC PR #7968
Secret Index Joins for Thrift Connector
• Is used to integrate external storage system without connector.
• Just wrap you service with ThriftServer
• Works for REST API!
• Wrapping JDBC
Is inconvenient
Apache Thrift overview
• Thrift is Remote Procedure Call Server development framework
• Development:
• Describe interface in .thrift file.
• Generate service interface and client code:
thrift --gen java TrinoThriftService.thrift
• Implement interfaces for the server
• Trino example ThriftTpchServer
Adding Index to JDBC connector
• Just add ;)

• Not in open source yet


Fixed:
• From 80 sec to 4
REST API and micro services
• Faceboook use(d) ThriftService
• Create thrift server for your microservices
• trino-example-http connector
• Modify for your needs
• Don’t forget about Index Provider
• We developed simple configurable connector for our internal services
Zero Copy Done!
• No need to build huge data lake with a lot of servers a head of time
• Single node TrinoDB could do data exploration

Let see other features:


Security
• HTTPS with TLS 1.2, 1.3
• User auth: Password,LDAP,Oauth,Kerberos,JWT,Certificate
• Access Control
• up to table operations
• System operations
Administration
• Web UI for monitoring
• JMX monitoring
• Resource groups
• Memory, CPU limits
• Queues
• Spill to disk support
Dynamic datasource reconfiguration
• Static property files by default
• PR: #12605
• OpenLooKeng fork
Caching
• Alluxio FS cache for Hive
• Memory connector
Indexing for Hive
• OpenLooKeng exclusive feature
• Bloom, Btree, MinMax,Bitmap indexes
High Availability
• OpenLooKeng
• Active-Active base on distributed cache
• Use standard approaches for microservices
• K8s
Try it: Lakehouse microserivce
#> docker run -p 8080:8080 --name trino trinodb/trino
Connect cli:
#> docker exec -ti trino trino

For “production” usage just store catalog in the git and mount it into the docker
#> docker run --rm -p 8080:8080 \
-v /opt/trino_catalog_git:/etc/trino/catalog \
--name trino trinodb/trino
Run some commands
Sample data the right way
Web UI
System catalog
JMX support
• A lot of System Mbeans
And so on and so far

You might also like