Hail Hydrate! From Stream
to Lake Using Open Source
Tim Spann / Dev Advocate
#ossummit @PaasDev @StreamNative
https://github.com/tspannhw https://www.datainmotion.dev/
Tim Spann, Developer Advocate
DZone Zone Leader and Big Data MVB
@PaasDev
https://github.com/tspannhw
https://www.datainmotion.dev/
https://github.com/tspannhw/SpeakerProfile
https://dev.to/tspannhw
https://sessionize.com/tspann/
https://www.slideshare.net/bunkertor
#ossummit
Agenda
Use Case - Populate the Data Lake
Key Challenges
▪ Their Impact
▪ A Solution
▪ Outcome
Why Apache NiFi and Apache Pulsar?
Successful Architecture
Demo
#ossummit
USE CASE
IoT Ingestion: High-volume streaming sources, multiple message formats,
diverse protocols and multi-vendor devices creates data ingestion challenges.
#ossummit
Key Challenges
Visibility: Lack visibility of end-to-end streaming data flows,
inability to troubleshoot bottlenecks, consumption patterns etc.
Data Ingestion: High-volume streaming sources,
multiple message formats, diverse protocols and multi-vendor
devices creates data ingestion challenges.
Real-time Insights: Analyzing continuous and rapid inflow
(velocity) of streaming data at high volumes creates major
challenges for gaining real-time insights.
#ossummit
Impact
Delays: Decreasing user satisfaction and delay in project delivery.
Missed revenue and opportunities.
Code Sprawl: Custom scripts over various qualities proliferate
across environments to cope with the complexity.
Costs: Increasing costs of development and maintenance. Too
many tools, not enough experts, waiting for contractors or time
delays as developers learn yet another tool, package or language.
#ossummit
Solution
Visibility: Apache NiFi provenance provides insights, metrics and
control over the entire end-to-end stream across clouds.
Data Ingestion: Apache NiFi is the one tool handle high-volume
streaming sources, multiple message formats, diverse protocols
and multi-vendor devices.
Variety of Data: Apache NiFi offers hundreds of OOTB connectors
and a GUI that accelerates flow developments. With Record
Processors that convert types in a single fast step.
 
#ossummit
Outcome
Agility: Reduction of new data source onboarding time from
weeks to days. More data in your data warehouse now.
New Applications: Enablement of new innovative use cases in
compressed timeframe. No more waiting for data to arrive, Data
Analysts and Data Scientists focus on innovation.
Savings: Cost reduction thanks to technologies offload, reduced
consultant costs and simplification of ingest processes.
#ossummit
Multiple users, frameworks, languages, clouds, data sources & clusters
CLOUD DATA ENGINEER
• Experience in ETL/ELT
• Coding skills in Python or Java
• Knowledge of database query
languages such as SQL
• Experience with Streaming
• Knowledge of Cloud Tools
● Typical User
● No Coding Skills
● Can use NiFi
● Questions your cloud spend
● Expert in ETL (Eating, Ties and
Laziness)
● Edge Camera Interaction
CAT AI / Deep Learning / ML / DS
• Can run in Apache NiFi
• Can run in Apache Pulsar Functions
• Can run in Apache Flink
• Can run in Apache Flink SQL
• Can run in Apache Pulsar Clients
• Can run in Apache Pulsar
Microservices
• Can run in Function Mesh
FLiP Stack for Cloud Data Engineers - ML
https://functionmesh.io/
#ossummit
StreamNative Solution
Application Messaging Data Pipelines Real-time Contextual Analytics
Tiered Storage
APP Layer
Computing
Layer
Storage
Layer
StreamNative
Platform
IaaS Layer
Micro
Service
Notification Dashboard Risk Control Auditing
Payment ETL
#ossummit
FLiP Stack (FLink -integrate- Pulsar)
https://hub.streamnative.io/data-processing/pulsar-flink/2.7.0/
#ossummit
What is Apache NiFi?
Apache NiFi is a scalable, real-time streaming data
platform that collects, curates, and analyzes data so
customers gain key insights for immediate
actionable intelligence.
#ossummit
Apache NiFi
ACQUIRE PROCESS DELIVER
• Over 300 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Guaranteed Delivery
• Full data provenance from acquisition to
delivery
• Diverse, Non-Traditional Sources
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLO
G
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
#ossummit
What is Apache Pulsar?
Apache Pulsar is an open source, cloud-native
distributed messaging and streaming platform.
EVENTS
MESSAGES
#ossummit
Apache Pulsar
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent Storage
● Pulsar Connectors
● REST API
● CLI
● Many clients available
● Four Different Subscription Types
● Multi-Protocol Support
○ MQTT
○ AMQP
○ JMS
○ Kafka
○ ...
ALL DATA - ANYTIME - ANYWHERE - ANY CLOUD
Multi-
inges
t
Multi-
inges
t
Multi-i
ngest
Merge
Priority
#ossummit
End to End Streaming Codeless Pipeline
Enterprise
sources
Sensors
Errors
Aggregates
Alerts
IoT
ETL
Analytics
Streaming SQL
Clickstream Market data
Machine logs Social
#ossummit
Show Me Some Data
{"uuid": "rpi4_uuid_jfx_20200826203733", "amplitude100": 1.2, "amplitude500": 0.6, "amplitude1000": 0.3, "lownoise": 0.6,
"midnoise": 0.2, "highnoise": 0.2, "amps": 0.3, "ipaddress": "192.168.1.76", "host": "rp4", "host_name": "rp4", "macaddress":
"6e:37:12:08:63:e1", "systemtime": "08/26/2020 16:37:34", "endtime": "1598474254.75", "runtime": "28179.03", "starttime":
"08/26/2020 08:47:54", "cpu": 48.3, "cpu_temp": "72.0", "diskusage": "40219.3 MB", "memory": 24.3, "id":
"20200826203733_28ce9520-6832-4f80-b17d-f36c21fd8fc9", "temperature": "47.2", "adjtemp": "35.8", "adjtempf": "76.4",
"temperaturef": "97.0", "pressure": 1010.0, "humidity": 8.3, "lux": 67.4, "proximity": 0, "oxidising": 77.9, "reducing": 184.6, "nh3":
144.7, "gasKO": "Oxidising: 77913.04 OhmsnReducing: 184625.00 OhmsnNH3: 144651.47 Ohms"}
#ossummit
Weather Streaming Pipeline
#ossummit
Connect with the Community & Stay Up-To-Date
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Subscribe to Monthly Pulsar Newsletter for
major news, events, project updates, and
resources in the Pulsar community
#ossummit
Deeper Content
● https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html
● https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html
● https://github.com/tspannhw/EverythingApacheNiFi
● https://github.com/tspannhw/CloudDemo2021
● https://github.com/tspannhw/StreamingSQLExamples
● https://www.linkedin.com/pulse/2021-schedule-tim-spann/
● https://github.com/tspannhw/StreamingSQLExamples/blob/8d02e62260e82b027b43abb911b5c366a308192
7/README.md
#ossummit
● https://github.com/tspannhw/FLiP-SQL
● https://github.com/tspannhw/StreamingSQLExamples
● https://github.com/streamnative/pulsar-flink
● https://www.linkedin.com/pulse/2021-schedule-tim-spann/
● https://github.com/tspannhw/SpeakerProfile/blob/main/2021/talks/20210729_HailHydrate!FromSt
reamtoLake_TimSpann.pdf
● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-on-streamnative-cloud
● https://docs.streamnative.io/cloud/stable/compute/flink-sql
Deeper Content
@PaasDev
https://www.pulsardeveloper.com/
timothyspann
streamnative.io
Hail hydrate! from stream to lake using open source

More Related Content

PDF
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
PDF
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
PDF
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
PDF
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
PDF
Cloud lunch and learn real-time streaming in azure
PDF
Automation + dev ops summit hail hydrate! from stream to lake
PDF
Big data conference europe real-time streaming in any and all clouds, hybri...
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Pass data community summit - 2021 - Real-Time Streaming in Azure with Apache ...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Cloud lunch and learn real-time streaming in azure
Automation + dev ops summit hail hydrate! from stream to lake
Big data conference europe real-time streaming in any and all clouds, hybri...
DBCC 2021 - FLiP Stack for Cloud Data Lakes

What's hot (20)

PDF
Music city data Hail Hydrate! from stream to lake
PDF
Real time cloud native open source streaming of any data to apache solr
PDF
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
PDF
ApacheCon 2021 Apache Deep Learning 302
PDF
FLiP Into Trino
PDF
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
PDF
Codeless pipelines with pulsar and flink
PDF
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PDF
Microsoft Office 2010 by Mr. EJ Lopez
PDF
Architecting for Scale
PPTX
Matt Franklin - Apache Software (Geekfest)
PDF
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
PDF
Axway amplify api management platform
PDF
Cracking the nut, solving edge ai with apache tools and frameworks
PDF
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
PPTX
Cloud streaming presentation
PDF
fluentd -- the missing log collector
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
PDF
Using FLiP with influxdb for edgeai iot at scale 2022
PDF
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Music city data Hail Hydrate! from stream to lake
Real time cloud native open source streaming of any data to apache solr
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
ApacheCon 2021 Apache Deep Learning 302
FLiP Into Trino
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Codeless pipelines with pulsar and flink
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
Microsoft Office 2010 by Mr. EJ Lopez
Architecting for Scale
Matt Franklin - Apache Software (Geekfest)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
Axway amplify api management platform
Cracking the nut, solving edge ai with apache tools and frameworks
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Cloud streaming presentation
fluentd -- the missing log collector
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
Using FLiP with influxdb for edgeai iot at scale 2022
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Ad

Similar to Hail hydrate! from stream to lake using open source (20)

PDF
ApacheCon2022_Citizen Streaming Engineer - A How To
PDF
Citizen Streaming Engineer - A How To
PDF
[AI Dev World 2022] Build ML Enhanced Event Streaming
PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
PDF
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
PDF
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
PDF
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
PDF
Open keynote_carolyn&matteo&sijie
PDF
RTAS 2023: Building a Real-Time IoT Application
PDF
Sql bits apache nifi 101 Introduction and best practices
PDF
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
PDF
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
PDF
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
PDF
All Day DevOps - FLiP Stack for Cloud Data Lakes
PDF
Real time stock processing with apache nifi, apache flink and apache kafka
PPTX
Building an Event Streaming Architecture with Apache Pulsar
PDF
Using the flipn stack for edge ai (flink, nifi, pulsar)
PDF
OSSNA Building Modern Data Streaming Apps
PDF
eBay Pulsar: Real-time analytics platform
ApacheCon2022_Citizen Streaming Engineer - A How To
Citizen Streaming Engineer - A How To
[AI Dev World 2022] Build ML Enhanced Event Streaming
ApacheCon 2021 - Apache NiFi Deep Dive 300
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Ai dev world utilizing apache pulsar, apache ni fi and minifi for edgeai io...
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Open keynote_carolyn&matteo&sijie
RTAS 2023: Building a Real-Time IoT Application
Sql bits apache nifi 101 Introduction and best practices
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
All Day DevOps - FLiP Stack for Cloud Data Lakes
Real time stock processing with apache nifi, apache flink and apache kafka
Building an Event Streaming Architecture with Apache Pulsar
Using the flipn stack for edge ai (flink, nifi, pulsar)
OSSNA Building Modern Data Streaming Apps
eBay Pulsar: Real-time analytics platform
Ad

More from Timothy Spann (20)

PDF
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
PDF
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
PDF
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
PDF
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
PDF
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
PDF
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
PDF
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
PDF
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
PDF
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
PDF
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
PPTX
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
PDF
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
PDF
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
PDF
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
PDF
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
PDF
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
PDF
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
PDF
01-Oct-2024_PES-VectorDatabasesAndAI.pdf
14May2025_TSPANN_FromAirQualityUnstructuredData.pdf
Streaming AI Pipelines with Apache NiFi and Snowflake NYC 2025
2025-03-03-Philly-AAAI-GoodData-Build Secure RAG Apps With Open LLM
Conf42_IoT_Dec2024_Building IoT Applications With Open Source
2024 Dec 05 - PyData Global - Tutorial Its In The Air Tonight
2024Nov20-BigDataEU-RealTimeAIWithOpenSource
TSPANN-2024-Nov-CloudX-Adding Generative AI to Real-Time Streaming Pipelines
2024-Nov-BuildStuff-Adding Generative AI to Real-Time Streaming Pipelines
14 November 2024 - Conf 42 - Prompt Engineering - Codeless Generative AI Pipe...
2024 Nov 05 - Linux Foundation TAC TALK With Milvus
tspann06-NOV-2024_AI-Alliance_NYC_ intro to Data Prep Kit and Open Source RAG
tspann08-Nov-2024_PyDataNYC_Unstructured Data Processing with a Raspberry Pi ...
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
10-25-2024_BITS_NYC_Unstructured Data and LLM_ What, Why and How
2024-OCT-23 NYC Meetup - Unstructured Data Meetup - Unstructured Halloween
DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering
17-October-2024 NYC AI Camp - Step-by-Step RAG 101
11-OCT-2024_AI_101_CryptoOracle_UnstructuredData
2024-10-04 - Grace Hopper Celebration Open Source Day - Stefan
01-Oct-2024_PES-VectorDatabasesAndAI.pdf

Recently uploaded (20)

PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
PPTX
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PPTX
Viber For Windows 25.7.1 Crack + Serial Keygen
PDF
Bright VPN Crack Free Download (Latest 2025)
PPTX
SAP Business AI_L1 Overview_EXTERNAL.pptx
PDF
Module 1 - Introduction to Generative AI.pdf
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PDF
Cloud Native Aachen Meetup - Aug 21, 2025
PPTX
ESDS_SAP Application Cloud Offerings.pptx
PPTX
Chapter 1 - Transaction Processing and Mgt.pptx
PDF
Top 10 Project Management Software for Small Teams in 2025.pdf
PDF
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
PDF
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
PPTX
Foundations of Marketo Engage: Nurturing
PDF
What Makes a Great Data Visualization Consulting Service.pdf
PDF
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
PPTX
Presentation - Summer Internship at Samatrix.io_template_2.pptx
PDF
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
PDF
Engineering Document Management System (EDMS)
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
Plex Media Server 1.28.2.6151 With Crac5 2022 Free .
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Viber For Windows 25.7.1 Crack + Serial Keygen
Bright VPN Crack Free Download (Latest 2025)
SAP Business AI_L1 Overview_EXTERNAL.pptx
Module 1 - Introduction to Generative AI.pdf
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
Cloud Native Aachen Meetup - Aug 21, 2025
ESDS_SAP Application Cloud Offerings.pptx
Chapter 1 - Transaction Processing and Mgt.pptx
Top 10 Project Management Software for Small Teams in 2025.pdf
Streamlining Project Management in Microsoft Project, Planner, and Teams with...
Coding with GPT-5- What’s New in GPT 5 That Benefits Developers.pdf
Foundations of Marketo Engage: Nurturing
What Makes a Great Data Visualization Consulting Service.pdf
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
Presentation - Summer Internship at Samatrix.io_template_2.pptx
Ragic Data Security Overview: Certifications, Compliance, and Network Safegua...
Engineering Document Management System (EDMS)

Hail hydrate! from stream to lake using open source

  • 1. Hail Hydrate! From Stream to Lake Using Open Source Tim Spann / Dev Advocate #ossummit @PaasDev @StreamNative
  • 3. Tim Spann, Developer Advocate DZone Zone Leader and Big Data MVB @PaasDev https://github.com/tspannhw https://www.datainmotion.dev/ https://github.com/tspannhw/SpeakerProfile https://dev.to/tspannhw https://sessionize.com/tspann/ https://www.slideshare.net/bunkertor
  • 4. #ossummit Agenda Use Case - Populate the Data Lake Key Challenges ▪ Their Impact ▪ A Solution ▪ Outcome Why Apache NiFi and Apache Pulsar? Successful Architecture Demo
  • 5. #ossummit USE CASE IoT Ingestion: High-volume streaming sources, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges.
  • 6. #ossummit Key Challenges Visibility: Lack visibility of end-to-end streaming data flows, inability to troubleshoot bottlenecks, consumption patterns etc. Data Ingestion: High-volume streaming sources, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Real-time Insights: Analyzing continuous and rapid inflow (velocity) of streaming data at high volumes creates major challenges for gaining real-time insights.
  • 7. #ossummit Impact Delays: Decreasing user satisfaction and delay in project delivery. Missed revenue and opportunities. Code Sprawl: Custom scripts over various qualities proliferate across environments to cope with the complexity. Costs: Increasing costs of development and maintenance. Too many tools, not enough experts, waiting for contractors or time delays as developers learn yet another tool, package or language.
  • 8. #ossummit Solution Visibility: Apache NiFi provenance provides insights, metrics and control over the entire end-to-end stream across clouds. Data Ingestion: Apache NiFi is the one tool handle high-volume streaming sources, multiple message formats, diverse protocols and multi-vendor devices. Variety of Data: Apache NiFi offers hundreds of OOTB connectors and a GUI that accelerates flow developments. With Record Processors that convert types in a single fast step.  
  • 9. #ossummit Outcome Agility: Reduction of new data source onboarding time from weeks to days. More data in your data warehouse now. New Applications: Enablement of new innovative use cases in compressed timeframe. No more waiting for data to arrive, Data Analysts and Data Scientists focus on innovation. Savings: Cost reduction thanks to technologies offload, reduced consultant costs and simplification of ingest processes.
  • 10. #ossummit Multiple users, frameworks, languages, clouds, data sources & clusters CLOUD DATA ENGINEER • Experience in ETL/ELT • Coding skills in Python or Java • Knowledge of database query languages such as SQL • Experience with Streaming • Knowledge of Cloud Tools ● Typical User ● No Coding Skills ● Can use NiFi ● Questions your cloud spend ● Expert in ETL (Eating, Ties and Laziness) ● Edge Camera Interaction CAT AI / Deep Learning / ML / DS • Can run in Apache NiFi • Can run in Apache Pulsar Functions • Can run in Apache Flink • Can run in Apache Flink SQL • Can run in Apache Pulsar Clients • Can run in Apache Pulsar Microservices • Can run in Function Mesh FLiP Stack for Cloud Data Engineers - ML https://functionmesh.io/
  • 11. #ossummit StreamNative Solution Application Messaging Data Pipelines Real-time Contextual Analytics Tiered Storage APP Layer Computing Layer Storage Layer StreamNative Platform IaaS Layer Micro Service Notification Dashboard Risk Control Auditing Payment ETL
  • 12. #ossummit FLiP Stack (FLink -integrate- Pulsar) https://hub.streamnative.io/data-processing/pulsar-flink/2.7.0/
  • 13. #ossummit What is Apache NiFi? Apache NiFi is a scalable, real-time streaming data platform that collects, curates, and analyzes data so customers gain key insights for immediate actionable intelligence.
  • 14. #ossummit Apache NiFi ACQUIRE PROCESS DELIVER • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLO G HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 15. #ossummit What is Apache Pulsar? Apache Pulsar is an open source, cloud-native distributed messaging and streaming platform. EVENTS MESSAGES
  • 16. #ossummit Apache Pulsar ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
  • 17. ALL DATA - ANYTIME - ANYWHERE - ANY CLOUD Multi- inges t Multi- inges t Multi-i ngest Merge Priority
  • 18. #ossummit End to End Streaming Codeless Pipeline Enterprise sources Sensors Errors Aggregates Alerts IoT ETL Analytics Streaming SQL Clickstream Market data Machine logs Social
  • 19. #ossummit Show Me Some Data {"uuid": "rpi4_uuid_jfx_20200826203733", "amplitude100": 1.2, "amplitude500": 0.6, "amplitude1000": 0.3, "lownoise": 0.6, "midnoise": 0.2, "highnoise": 0.2, "amps": 0.3, "ipaddress": "192.168.1.76", "host": "rp4", "host_name": "rp4", "macaddress": "6e:37:12:08:63:e1", "systemtime": "08/26/2020 16:37:34", "endtime": "1598474254.75", "runtime": "28179.03", "starttime": "08/26/2020 08:47:54", "cpu": 48.3, "cpu_temp": "72.0", "diskusage": "40219.3 MB", "memory": 24.3, "id": "20200826203733_28ce9520-6832-4f80-b17d-f36c21fd8fc9", "temperature": "47.2", "adjtemp": "35.8", "adjtempf": "76.4", "temperaturef": "97.0", "pressure": 1010.0, "humidity": 8.3, "lux": 67.4, "proximity": 0, "oxidising": 77.9, "reducing": 184.6, "nh3": 144.7, "gasKO": "Oxidising: 77913.04 OhmsnReducing: 184625.00 OhmsnNH3: 144651.47 Ohms"}
  • 21. #ossummit Connect with the Community & Stay Up-To-Date ● Join the Pulsar Slack channel - Apache-Pulsar.slack.com ● Follow @streamnativeio and @apache_pulsar on Twitter ● Subscribe to Monthly Pulsar Newsletter for major news, events, project updates, and resources in the Pulsar community
  • 22. #ossummit Deeper Content ● https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html ● https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html ● https://github.com/tspannhw/EverythingApacheNiFi ● https://github.com/tspannhw/CloudDemo2021 ● https://github.com/tspannhw/StreamingSQLExamples ● https://www.linkedin.com/pulse/2021-schedule-tim-spann/ ● https://github.com/tspannhw/StreamingSQLExamples/blob/8d02e62260e82b027b43abb911b5c366a308192 7/README.md
  • 23. #ossummit ● https://github.com/tspannhw/FLiP-SQL ● https://github.com/tspannhw/StreamingSQLExamples ● https://github.com/streamnative/pulsar-flink ● https://www.linkedin.com/pulse/2021-schedule-tim-spann/ ● https://github.com/tspannhw/SpeakerProfile/blob/main/2021/talks/20210729_HailHydrate!FromSt reamtoLake_TimSpann.pdf ● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-on-streamnative-cloud ● https://docs.streamnative.io/cloud/stable/compute/flink-sql Deeper Content @PaasDev https://www.pulsardeveloper.com/ timothyspann