0% found this document useful (0 votes)

1 views

Modern Data Stack

The modern data stack (MDS) is a suite of cloud-based tools designed for efficient data collection, processing, and storage, transitioning from traditional ETL to ELT workflows for improved flexibility and scalability. Key components include data storage solutions like Snowflake and BigQuery, transformation tools such as dbt, and orchestration platforms like Airflow. The MDS offers benefits like higher data volumes at lower costs, increased accessibility, and the ability to leverage advanced analytics and machine learning.

Uploaded by

ajiitfhem

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Modern Data Stack

Uploaded by

ajiitfhem

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Modern Data

Stack

Presented by Supervised By
Zine El Abidine ACHAGHOUR Pr. Lotfi Najdi
Hamid Elaaly
TABLE OF
CONTENTS

What is The Modern Data Stack The MDS Visualized

The History of Modern Data Stack The Future of MDS

The Shift From ETL to ELT conclusion

Modern vs Legacy Data Stack

Benefits of Modern Data Stack

INTRODUCTION
The modern data stack is a suite of cloud-based software tools designed for efficient data collection,
processing, and storage. Known for their robustness, speed, and scalability, these tools offer
simplified deployment and are highly scalable.

Components address specific data challenges including storage (Snowflake, BigQuery),

transformation (dbt, DataForm), job orchestration (Airflow, Prefect), streaming (Kafka), monitoring
(Monte Carlos, Bigeye), among others.
HISTORY OF MDS
Cloud computing and data warehousing have driven the emergence of the modern data stack,
shifting from ETL to ELT workflows for greater connectivity and flexibility. This transition, rooted in
the early 2010s, addresses the need for agile analytics and vendor-agnostic solutions.

Powered by cloud technology advancements, key players include BigQuery, Redshift, and
Snowflake, alongside BI tools such as Looker and Tableau. Data ingestion tools like Stitch and
Fivetran ensure seamless integration, while MongoDB, Cassandra, and Elasticsearch manage big
data effectively.
HISTORY OF MDS
Cloud computing and data warehousing have driven the emergence of the modern data stack, shifting
from ETL to ELT workflows for greater connectivity and flexibility. This transition, rooted in the early
2010s, addresses the need for agile analytics and vendor-agnostic solutions.

Powered by cloud technology advancements, key players include BigQuery, Redshift, and Snowflake,
alongside BI tools such as Looker and Tableau. Data ingestion tools like Stitch and Fivetran ensure
seamless integration, while MongoDB, Cassandra, and Elasticsearch manage big data effectively.
THE SHIFT FROM ETL TO ELT
The shift from ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform) represents a
fundamental change in the workflow of data processing.

ETL (Extract, Transform, Load): In the traditional ETL approach, data is first extracted from various
sources, then transformed to fit the target data model or schema, and finally loaded into the data
warehouse or destination system. Transformation typically involves cleansing, aggregating, and
formatting the data to make it suitable for analysis.
THE SHIFT FROM ETL TO ELT
The ELT (Extract, Load, Transform) : approach has gained prominence with the rise of cloud
computing. In ELT, data is extracted and loaded into the target system without significant
transformation upfront. Transformation occurs within the data warehouse or data lake, leveraging the
scalability and processing power of modern cloud platforms for faster loading times and efficient
resource use.
THE SHIFT FROM ETL TO ELT
The shift to ELT offers several advantages:

Scalability: Cloud-based data warehouses can

handle large volumes of data and scale resources
dynamically to accommodate varying workloads.

Flexibility: By deferring transformation until after

loading, organizations can perform ad-hoc analysis
on raw data and apply transformations as needed,
without having to repeat the extraction process.

Cost-effectiveness: ELT can be more cost-effective,

as it minimizes the need for extensive data
preprocessing before loading into the data
warehouse.
LEGACY VS MODERN DATA STACK
A modern data stack is typically more scalable, flexible, and efficient than a legacy data stack.

A modern data stack relies on cloud computing, whereas a legacy data stack stores data on
servers instead of in the cloud. Modern data stacks provide access for more data professionals
than a legacy data stack.

A legacy data stack usually refers to the traditional relational database management system
(RDBMS), which uses a structured query language (SQL) to store and process data.

While an RDBMS can still be used in a modern data stack, it is not as common because it is not as
well-suited for managing big data. SQL, however, remains a popular query language for both
legacy and modern data stacks.
THE BENEFITS OF MDS
Higher Volumes + Lower Costs
Compute & store higher volumes of data at a
significantly faster rate & reduced cost.

Data Accessibility
Self-service analytics programs increase your
organization's data literacy.

Built to Scale
Pricing & products supports scale for data volumes, #
for users and use cases.

Best-in-Breed Technologies
Specialization drives innovation and modularity gives
your team flexibility.
MDS VISUALIZED

SRC: FIVETRAN
MDS - DATA SOURCES

The first component of the MDS architecture is a place

or — in most cases — places where your data originates.
Data may come from hundreds (or sometimes
thousands) of different sources, including computers,
smartphones, websites, social media networks,
eCommerce platforms, and IoT devices

Common Classifications include:

Databases
Files
Applications ( Categorized by use case )
Events Collectors
MDS - INGESTION

Data pipeline is a method of ingesting data from a

variety of sources and porting the same to data
lakes or data warehouses. In other words, it is a
sequence of steps that enables to move data from
one system to another.

A modern data pipeline needs to be:

Low code / no code.
Maintained for data integrity.
MDS - INGESTION

Data pipelines are categorized based on how they

are used. Batch processing and real-time
processing are the two most common types of
pipelines.

Batch processing:

This pipeline function is specifically designed

to process large volumes of data in batches at
scheduled intervals. It excels in handling large
datasets that do not require real-time analysis.
By moving data in batches, it optimizes
efficiency and resource utilization.
MDS - INGESTION

Streaming Data:

As the name suggests, this function is designed to

handle streaming data in real-time. It is particularly
useful for applications that require immediate
analysis and response, such as fraud detection and
monitoring system performance. Processing data
on arrival enables fast decision-making and
proactive actions.

Example tools: Event Hub, Kafka, Fivetran, Airbyte

MDS - DESTINATIONS
The tools mentioned in the previous section are instrumental
in moving data to a centralized location for storage, usually, a
cloud data warehouse, although data lakes are also a popular
option.

Data warehouses are relational databases designed to

store and transform data best for quickly analyzing
historical data.

Data lakes hold high volumes of data in a raw format,

supporting structured, unstructured, and semi-structured
types of data.

Example tools: Snowflake, Databricks, Delta Lake

MDS - TRANSFORMATIONS
Data transformation is the process of revising, computing,
separating, and combining raw data into analysis-ready data
models.

Modern data Transformations tools need to be:

Ensure Data Integrity.

Supports SQL, a common language for analysts and
engineers.

Example tools: EasyMorph, Airflow, DBT

MDS - OUTPUT
Now the data is extracted, stored, cleaned, and ready to be put to use.

Where data goes to get analyzed. Needs to be accessible, and easy to

understand.
Business intelligence
Embedded analytics
Ad hoc reporting

Newer to the modern data stack:

Reverse ETL: While ETL and ELT transfer data from third-party
sources, reverse ETL does the opposite. It transfers data from a data
warehouse to the third-party system and makes sure that it meets
the formatting requirements of that platform.

Data science & AI/ML: Enables Machines to make Decisions based on

Learned Data Models.

Example tools: Power BI

MDS - REVERSE ETL
Reverse ETL, a relatively new concept in data management, involves
moving data from a centralized repository, like a data warehouse or
data lake, back to operational systems or external applications for
various purposes.

Unlike traditional ETL processes that move data from source systems
to a centralized repository for analysis, reverse ETL flips this flow,
enabling organizations to leverage insights gained from centralized
data analysis to drive actions or updates in operational systems.

Advantages: Enables organizations to leverage insights from

centralized data analysis to drive actions or updates in operational
systems in real-time.

Use Cases: Personalizing marketing campaigns, updating customer

records, and enhancing product recommendations.

Example tools: HighTouch, Census, Grouparoo

THE FUTURE OF MDS
analytics Integrity or Holistic Data Analysis: Continued advancement in data analytics,
including holistic approaches and predictive modeling, refines data strategies for more
effective business intelligence.

Open Source Strategy: Modern data companies start with open source, and then move
to the cloud for a hybrid approach, crucial for user engagement. Venture capital
investors increasingly prefer startups with open-source strategies.

More SQL in Data Engineering: SQL is crucial in data management—it's simple, widely
understood, and based on common standards. It's the backbone of the data stack and
will likely support predictive analytics in the future.
CONCLUSION

The modern data stack is powerful tools that can help companies make better data-driven
decisions. If you’re not already using one, now is the time to start putting together a modern
data stack that works for you

If you’re still using a legacy data stack, consider adopting a modern data stack. It is not
merely a rising trend – there are multiple benefits to using it! .

In the future, we can expect to see even more innovation in the modern data stack. This will
help companies to better scale, manage, and analyze their data.
SOME RESOURCES

https://www.moderndatastack.xyz

https://www.thoughtspot.com/data-trends/best-practices/modern-
data-stack
THANK YOU
FOR WATCHING

Modern Data Architecture on AWS: A Practical Guide for Building Next-Gen Data Platforms on AWS Behram Irani all chapter instant download
100% (1)
Modern Data Architecture on AWS: A Practical Guide for Building Next-Gen Data Platforms on AWS Behram Irani all chapter instant download
55 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Data Lakes in A Modern Data Architecture
100% (7)
Data Lakes in A Modern Data Architecture
23 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
124830-Rebrand Castor Book-Dark Cover - Superside
No ratings yet
124830-Rebrand Castor Book-Dark Cover - Superside
16 pages
Understanding Etl Er1
No ratings yet
Understanding Etl Er1
34 pages
CCD UNIT 4
No ratings yet
CCD UNIT 4
5 pages
Data Ingestion, Cleaning, and Transformation Tools
No ratings yet
Data Ingestion, Cleaning, and Transformation Tools
2 pages
Business Intelligence Overview
No ratings yet
Business Intelligence Overview
20 pages
Data Transformation With Advanced Data Stack
No ratings yet
Data Transformation With Advanced Data Stack
35 pages
data-science-analytics-industry-overview-july-2321
No ratings yet
data-science-analytics-industry-overview-july-2321
27 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
Big Data Architectures and The Data Lake: James Serra
No ratings yet
Big Data Architectures and The Data Lake: James Serra
53 pages
12 Best Practices For Modern Data Integration: White Paper
100% (3)
12 Best Practices For Modern Data Integration: White Paper
10 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
2wejuVA8RVy7LbK6nEbU The Starter Guide For The Modern Data Stack
No ratings yet
2wejuVA8RVy7LbK6nEbU The Starter Guide For The Modern Data Stack
6 pages
Unit-4
No ratings yet
Unit-4
11 pages
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
18 pages
DW MICRO
No ratings yet
DW MICRO
2 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Oreilly Technical Guide Understanding Etl
No ratings yet
Oreilly Technical Guide Understanding Etl
107 pages
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Azure Data Platform End2End - 1day
No ratings yet
Azure Data Platform End2End - 1day
90 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
10 pages
X Things You Should Know About Modern Data Stack
No ratings yet
X Things You Should Know About Modern Data Stack
2 pages
Data Integration
No ratings yet
Data Integration
20 pages
ELT Vs ETL
No ratings yet
ELT Vs ETL
13 pages
White Paper Modern Data Stack
No ratings yet
White Paper Modern Data Stack
21 pages
UNIT 1 To 5
No ratings yet
UNIT 1 To 5
37 pages
Introduction To Big Data, Hadoop and Spark
No ratings yet
Introduction To Big Data, Hadoop and Spark
40 pages
Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
What Is A Data Platform
No ratings yet
What Is A Data Platform
18 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
From Everand
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
Kaushal Mehta
No ratings yet
Large Scale Etl With Hadoop
No ratings yet
Large Scale Etl With Hadoop
76 pages
Data Engineering Life Cycle
No ratings yet
Data Engineering Life Cycle
33 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Engineering - Session 01
No ratings yet
Data Engineering - Session 01
34 pages
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Ebook The Evolution of The Data Warehouse
No ratings yet
Ebook The Evolution of The Data Warehouse
40 pages
Modern Data Warehouse White Paper PDF
No ratings yet
Modern Data Warehouse White Paper PDF
26 pages
crime-prevention-and-control-css402_1716304451
No ratings yet
crime-prevention-and-control-css402_1716304451
42 pages
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
MODERN ENTERPRISE Data Pipeline
No ratings yet
MODERN ENTERPRISE Data Pipeline
98 pages
Booklet 1713343657
No ratings yet
Booklet 1713343657
79 pages
De Imp Qa
No ratings yet
De Imp Qa
12 pages
Week 2 Data Rols DataPlatfro Use Cases v1 S25
No ratings yet
Week 2 Data Rols DataPlatfro Use Cases v1 S25
50 pages
Database Management System
From Everand
Database Management System
Knowledge Flow
No ratings yet
The Data Leader's Guide To Modern Analytics
100% (1)
The Data Leader's Guide To Modern Analytics
30 pages
Modern Data Architecture Concepts
No ratings yet
Modern Data Architecture Concepts
18 pages
The Ultimate Guide To Data Integration
No ratings yet
The Ultimate Guide To Data Integration
48 pages
Whitepaper: Modern Integrated Data Environment - Qubole
No ratings yet
Whitepaper: Modern Integrated Data Environment - Qubole
11 pages
Bde Imp
No ratings yet
Bde Imp
20 pages
Local Companies
No ratings yet
Local Companies
17 pages
Preoperative Visit in Nigeria PDF
No ratings yet
Preoperative Visit in Nigeria PDF
7 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
Gec2108 Lesson5 Act
No ratings yet
Gec2108 Lesson5 Act
4 pages
Checkpoint R75 Lab Manual
100% (1)
Checkpoint R75 Lab Manual
87 pages
21 Service Costing 1730866396
No ratings yet
21 Service Costing 1730866396
20 pages
Hira N. Ahuja, S. P. Dozzi, Simaan M. AbouRizk-Project Management - Techniques in Planning and Controlling Construction Projects, 2nd Edition-Wiley (1994)
No ratings yet
Hira N. Ahuja, S. P. Dozzi, Simaan M. AbouRizk-Project Management - Techniques in Planning and Controlling Construction Projects, 2nd Edition-Wiley (1994)
47 pages
Sri Kamakoti Mandali: Bhavani
No ratings yet
Sri Kamakoti Mandali: Bhavani
2 pages
ECE MCQs 2. Building Materials 1
No ratings yet
ECE MCQs 2. Building Materials 1
21 pages
Ds-Relations and Functions
No ratings yet
Ds-Relations and Functions
9 pages
1 Roles of Aircraft Structures
No ratings yet
1 Roles of Aircraft Structures
32 pages
Tangki 60 KL
No ratings yet
Tangki 60 KL
4 pages
Dispatch Sheet Ricardo & Wanderley 38546242
No ratings yet
Dispatch Sheet Ricardo & Wanderley 38546242
1 page
Music 10.1 AOS 1 Pathetique Sonata Knowledge Organiser
No ratings yet
Music 10.1 AOS 1 Pathetique Sonata Knowledge Organiser
1 page
Hugo Twardowski Footwear Folio 2021 SD
No ratings yet
Hugo Twardowski Footwear Folio 2021 SD
32 pages
RCDSO Medical History Questionnaire
No ratings yet
RCDSO Medical History Questionnaire
2 pages
Determinism vs. Free Will
No ratings yet
Determinism vs. Free Will
3 pages
Murali
No ratings yet
Murali
2 pages
Dr-SUSMITA-PANDE
No ratings yet
Dr-SUSMITA-PANDE
22 pages
Revelations of Sainte Marguerite Marie Alacoque
No ratings yet
Revelations of Sainte Marguerite Marie Alacoque
418 pages
M.A.M. School of Engineering: Siruganur, Trichy - 621 105
No ratings yet
M.A.M. School of Engineering: Siruganur, Trichy - 621 105
78 pages
Lecture Notes HPE
No ratings yet
Lecture Notes HPE
35 pages
Laudon Mis15 PPT Ch07
No ratings yet
Laudon Mis15 PPT Ch07
48 pages
A00054 HDS USP-V Full-Disclosure
No ratings yet
A00054 HDS USP-V Full-Disclosure
98 pages
TLC 21 2022.
No ratings yet
TLC 21 2022.
1 page
Msds - RBD HPKOL
No ratings yet
Msds - RBD HPKOL
3 pages
Investigation Into Pavement Curing Materials, Application Techniques, and Assessment Methods
No ratings yet
Investigation Into Pavement Curing Materials, Application Techniques, and Assessment Methods
11 pages
Lecture Remote Sensing 008 Thermal
No ratings yet
Lecture Remote Sensing 008 Thermal
29 pages
Lesson 15 Potential Theory Using Complex Analysis
No ratings yet
Lesson 15 Potential Theory Using Complex Analysis
6 pages
A Techno-Economic Analysis of Asteroid Mining
No ratings yet
A Techno-Economic Analysis of Asteroid Mining
12 pages

Modern Data Stack

Uploaded by

Modern Data Stack

Uploaded by

Modern Data

What is The Modern Data Stack The MDS Visualized

The History of Modern Data Stack The Future of MDS

The Shift From ETL to ELT conclusion

Modern vs Legacy Data Stack

Benefits of Modern Data Stack

Components address specific data challenges including storage (Snowflake, BigQuery),

Scalability: Cloud-based data warehouses can

Flexibility: By deferring transformation until after

Cost-effectiveness: ELT can be more cost-effective,

The first component of the MDS architecture is a place

Common Classifications include:

Data pipeline is a method of ingesting data from a

A modern data pipeline needs to be:

Data pipelines are categorized based on how they

This pipeline function is specifically designed

As the name suggests, this function is designed to

Example tools: Event Hub, Kafka, Fivetran, Airbyte

Data warehouses are relational databases designed to

Data lakes hold high volumes of data in a raw format,

Example tools: Snowflake, Databricks, Delta Lake

Modern data Transformations tools need to be:

Ensure Data Integrity.

Example tools: EasyMorph, Airflow, DBT

Where data goes to get analyzed. Needs to be accessible, and easy to

Newer to the modern data stack:

Data science & AI/ML: Enables Machines to make Decisions based on

Example tools: Power BI

Advantages: Enables organizations to leverage insights from

Use Cases: Personalizing marketing campaigns, updating customer

Example tools: HighTouch, Census, Grouparoo

You might also like