ExxonMobil’s journey to
unleash time-series data
with open source
technology
June, 2018
Kevin Brown
Big Data Platform Engineer
ExxonMobil’s journey to
unleash time-series data
with open source technology
June, 2018
Kevin Brown
Big Data Platform Engineer
Introduction
3
Kevin Brown
Big Data Platform Engineer
ExxonMobil
Introduction to time-series data at
ExxonMobil
Global data collection and ingestion with
Apache NiFi™
Apache Spark™: normalization,
validation, aggregations, and interpolation
Apache HBase® & Apache Hive™
partitioning, performance
Consumption APIs
Today’s Objectives
4
A series of data points
indexed in time order.
What is time-series data?
5
• Global refineries and chemical plants
• Millions of sensors/tags
• Decades worth time-series data
Time-series data at ExxonMobil
6
Collection and
Ingestion with
Apache NiFi™
• Interoperability
• Ease of use
• Fine control of flow
Why Apache NiFi™
Single node design
9
Simple Regional Design - NiFi
10
Redundancy Considerations
11
• Repository sizing
• Run Schedule
• Back Pressure
• Monitoring
• NiFi Expression Language
Apache NiFi - Flow Design
12
Normalization,
Contextualization,
Validation…
13
“Your metadata is key, but it’s probably not consistent.”
Data Contextualization
• Global data challenges
• Language and abbreviation variation
• Diversity of vendors and tools.
• Naming standard
• Variance in functionality (confidence levels, frequency, resolution)
• Synchronization and mutable data
• Calculated tags, faulty sensors
• Delayed lab test results
Getting your data ready
14
Storage,
Interpolation,
Aggregation
15
• Archival and compression of data
• Interpolation
• Aggregation
• Partitioning
“Sequence Matters”
Storage, interpolation, aggregation …
16
Consumption
17
• Who are your users?
• Apache HBase® & Apache Hive™ ?
• Off cluster?
• APIs
• Serialization
Consumption
18
Questions
19
Apache HBase ® , Apache Hive™ and Apache NiFi™ are trademarks of the Apache
Software Foundation.

More Related Content

PPTX
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
PPTX
Cloudera training: secure your Cloudera cluster
PDF
Apache Kafka® and the Data Mesh
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PDF
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
PDF
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
PDF
Data Mesh at CMC Markets: Past, Present and Future
PPTX
Immutable Infrastructure with Packer Ansible and Terraform
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Cloudera training: secure your Cloudera cluster
Apache Kafka® and the Data Mesh
Architect’s Open-Source Guide for a Data Mesh Architecture
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Data Mesh at CMC Markets: Past, Present and Future
Immutable Infrastructure with Packer Ansible and Terraform

What's hot (20)

PPTX
Hadoop Backup and Disaster Recovery
PDF
Wide&Deep Learning for Recommender Systems
PDF
Introducing Databricks Delta
PDF
Moving to Databricks & Delta
PPTX
NiFi Best Practices for the Enterprise
PPTX
Infrastructure as Code for Network
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
How to govern and secure a Data Mesh?
PDF
Modularized ETL Writing with Apache Spark
PPTX
An Introduction to Talend Integration Cloud
PPTX
data-mesh-101.pptx
PPTX
Apache NiFi in the Hadoop Ecosystem
PDF
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
PDF
Introduction to Impala
PDF
Microsoft Fabric: How to Accelerate AI with Data
PDF
Deploying Flink on Kubernetes - David Anderson
PPTX
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
PDF
An Introduction to the WSO2 API Manager
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Hadoop Backup and Disaster Recovery
Wide&Deep Learning for Recommender Systems
Introducing Databricks Delta
Moving to Databricks & Delta
NiFi Best Practices for the Enterprise
Infrastructure as Code for Network
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Apache Tez: Accelerating Hadoop Query Processing
How to govern and secure a Data Mesh?
Modularized ETL Writing with Apache Spark
An Introduction to Talend Integration Cloud
data-mesh-101.pptx
Apache NiFi in the Hadoop Ecosystem
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Introduction to Impala
Microsoft Fabric: How to Accelerate AI with Data
Deploying Flink on Kubernetes - David Anderson
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
An Introduction to the WSO2 API Manager
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Ad

Similar to ExxonMobil’s journey to unleash time-series data with open source technology (20)

PDF
Hail hydrate! from stream to lake using open source
PPTX
Real-Time Data Flows with Apache NiFi
PDF
Breathe new life into your data warehouse by offloading etl processes to hadoop
PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
PPTX
Integração de Dados com Apache NIFI - Marco Garcia Cetax
PDF
Hadoop at datasift
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
PPTX
Apache NiFi Toronto Meetup
PDF
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
PDF
Pivotal Real Time Data Stream Analytics
PDF
Introduction to InfluxDB
PDF
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
PDF
Oil & Gas Big Data use cases
PDF
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...
PDF
The Evolving Landscape of Data Engineering
PPTX
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
PPTX
Introduction to Apache NiFi - Seattle Scalability Meetup
PPTX
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
PPTX
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
Hail hydrate! from stream to lake using open source
Real-Time Data Flows with Apache NiFi
Breathe new life into your data warehouse by offloading etl processes to hadoop
ApacheCon 2021 - Apache NiFi Deep Dive 300
Best practices and lessons learnt from Running Apache NiFi at Renault
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Hadoop at datasift
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Apache NiFi Toronto Meetup
SDM (Standardized Data Management) - A Dynamic Adaptive Ingestion Frameworks ...
Pivotal Real Time Data Stream Analytics
Introduction to InfluxDB
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Oil & Gas Big Data use cases
Hands-on Guide to Apache Spark 3: Build Scalable Computing Engines for Batch ...
The Evolving Landscape of Data Engineering
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
Introduction to Apache NiFi - Seattle Scalability Meetup
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
LMS bot: enhanced learning management systems for improved student learning e...
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Co-training pseudo-labeling for text classification with support vector machi...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Data Virtualization in Action: Scaling APIs and Apps with FME
giants, standing on the shoulders of - by Daniel Stenberg
future_of_ai_comprehensive_20250822032121.pptx
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
SGT Report The Beast Plan and Cyberphysical Systems of Control
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Auditboard EB SOX Playbook 2023 edition.
NewMind AI Weekly Chronicles – August ’25 Week IV
Early detection and classification of bone marrow changes in lumbar vertebrae...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
MuleSoft-Compete-Deck for midddleware integrations
Rapid Prototyping: A lecture on prototyping techniques for interface design
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Advancing precision in air quality forecasting through machine learning integ...
LMS bot: enhanced learning management systems for improved student learning e...
Basics of Cloud Computing - Cloud Ecosystem
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Co-training pseudo-labeling for text classification with support vector machi...

ExxonMobil’s journey to unleash time-series data with open source technology

Editor's Notes

  • #4: Personal Introduction: Education 2012 – BYU – BS in Information Technology ExxonMobil 6 years @ ExxonMobil, Linux Systems Administrations Pioneered our initial journey into Hadoop/Big Data Responsible for architecting/maintaining/supporting our current Big Data platform
  • #11: From Left to right: Data originates at site Regional NiFi Instance pulls from sites. Data is sent securely to a clustered central NiFi.
  • #17: Archival and compression of data Interpolation Rely on both Step and Linear depending usually on data type. (float with no manual input = linear) otherwise step (integer, or float with manual input) Aggregation Partitioning