0% found this document useful (0 votes)
1 views

Modern Data Stack

The modern data stack (MDS) is a suite of cloud-based tools designed for efficient data collection, processing, and storage, transitioning from traditional ETL to ELT workflows for improved flexibility and scalability. Key components include data storage solutions like Snowflake and BigQuery, transformation tools such as dbt, and orchestration platforms like Airflow. The MDS offers benefits like higher data volumes at lower costs, increased accessibility, and the ability to leverage advanced analytics and machine learning.

Uploaded by

ajiitfhem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Modern Data Stack

The modern data stack (MDS) is a suite of cloud-based tools designed for efficient data collection, processing, and storage, transitioning from traditional ETL to ELT workflows for improved flexibility and scalability. Key components include data storage solutions like Snowflake and BigQuery, transformation tools such as dbt, and orchestration platforms like Airflow. The MDS offers benefits like higher data volumes at lower costs, increased accessibility, and the ability to leverage advanced analytics and machine learning.

Uploaded by

ajiitfhem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Modern Data

Stack

Presented by Supervised By
Zine El Abidine ACHAGHOUR Pr. Lotfi Najdi
Hamid Elaaly
TABLE OF
CONTENTS

What is The Modern Data Stack The MDS Visualized

The History of Modern Data Stack The Future of MDS

The Shift From ETL to ELT conclusion

Modern vs Legacy Data Stack

Benefits of Modern Data Stack


INTRODUCTION
The modern data stack is a suite of cloud-based software tools designed for efficient data collection,
processing, and storage. Known for their robustness, speed, and scalability, these tools offer
simplified deployment and are highly scalable.

Components address specific data challenges including storage (Snowflake, BigQuery),


transformation (dbt, DataForm), job orchestration (Airflow, Prefect), streaming (Kafka), monitoring
(Monte Carlos, Bigeye), among others.
HISTORY OF MDS
Cloud computing and data warehousing have driven the emergence of the modern data stack,
shifting from ETL to ELT workflows for greater connectivity and flexibility. This transition, rooted in
the early 2010s, addresses the need for agile analytics and vendor-agnostic solutions.

Powered by cloud technology advancements, key players include BigQuery, Redshift, and
Snowflake, alongside BI tools such as Looker and Tableau. Data ingestion tools like Stitch and
Fivetran ensure seamless integration, while MongoDB, Cassandra, and Elasticsearch manage big
data effectively.
HISTORY OF MDS
Cloud computing and data warehousing have driven the emergence of the modern data stack, shifting
from ETL to ELT workflows for greater connectivity and flexibility. This transition, rooted in the early
2010s, addresses the need for agile analytics and vendor-agnostic solutions.

Powered by cloud technology advancements, key players include BigQuery, Redshift, and Snowflake,
alongside BI tools such as Looker and Tableau. Data ingestion tools like Stitch and Fivetran ensure
seamless integration, while MongoDB, Cassandra, and Elasticsearch manage big data effectively.
THE SHIFT FROM ETL TO ELT
The shift from ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform) represents a
fundamental change in the workflow of data processing.

ETL (Extract, Transform, Load): In the traditional ETL approach, data is first extracted from various
sources, then transformed to fit the target data model or schema, and finally loaded into the data
warehouse or destination system. Transformation typically involves cleansing, aggregating, and
formatting the data to make it suitable for analysis.
THE SHIFT FROM ETL TO ELT
The ELT (Extract, Load, Transform) : approach has gained prominence with the rise of cloud
computing. In ELT, data is extracted and loaded into the target system without significant
transformation upfront. Transformation occurs within the data warehouse or data lake, leveraging the
scalability and processing power of modern cloud platforms for faster loading times and efficient
resource use.
THE SHIFT FROM ETL TO ELT
The shift to ELT offers several advantages:

Scalability: Cloud-based data warehouses can


handle large volumes of data and scale resources
dynamically to accommodate varying workloads.

Flexibility: By deferring transformation until after


loading, organizations can perform ad-hoc analysis
on raw data and apply transformations as needed,
without having to repeat the extraction process.

Cost-effectiveness: ELT can be more cost-effective,


as it minimizes the need for extensive data
preprocessing before loading into the data
warehouse.
LEGACY VS MODERN DATA STACK
A modern data stack is typically more scalable, flexible, and efficient than a legacy data stack.

A modern data stack relies on cloud computing, whereas a legacy data stack stores data on
servers instead of in the cloud. Modern data stacks provide access for more data professionals
than a legacy data stack.

A legacy data stack usually refers to the traditional relational database management system
(RDBMS), which uses a structured query language (SQL) to store and process data.

While an RDBMS can still be used in a modern data stack, it is not as common because it is not as
well-suited for managing big data. SQL, however, remains a popular query language for both
legacy and modern data stacks.
THE BENEFITS OF MDS
Higher Volumes + Lower Costs
Compute & store higher volumes of data at a
significantly faster rate & reduced cost.

Data Accessibility
Self-service analytics programs increase your
organization's data literacy.

Built to Scale
Pricing & products supports scale for data volumes, #
for users and use cases.

Best-in-Breed Technologies
Specialization drives innovation and modularity gives
your team flexibility.
MDS VISUALIZED

SRC: FIVETRAN
MDS - DATA SOURCES

The first component of the MDS architecture is a place


or — in most cases — places where your data originates.
Data may come from hundreds (or sometimes
thousands) of different sources, including computers,
smartphones, websites, social media networks,
eCommerce platforms, and IoT devices

Common Classifications include:

Databases
Files
Applications ( Categorized by use case )
Events Collectors
MDS - INGESTION

Data pipeline is a method of ingesting data from a


variety of sources and porting the same to data
lakes or data warehouses. In other words, it is a
sequence of steps that enables to move data from
one system to another.

A modern data pipeline needs to be:


Low code / no code.
Maintained for data integrity.
MDS - INGESTION

Data pipelines are categorized based on how they


are used. Batch processing and real-time
processing are the two most common types of
pipelines.

Batch processing:

This pipeline function is specifically designed


to process large volumes of data in batches at
scheduled intervals. It excels in handling large
datasets that do not require real-time analysis.
By moving data in batches, it optimizes
efficiency and resource utilization.
MDS - INGESTION

Streaming Data:

As the name suggests, this function is designed to


handle streaming data in real-time. It is particularly
useful for applications that require immediate
analysis and response, such as fraud detection and
monitoring system performance. Processing data
on arrival enables fast decision-making and
proactive actions.

Example tools: Event Hub, Kafka, Fivetran, Airbyte


MDS - DESTINATIONS
The tools mentioned in the previous section are instrumental
in moving data to a centralized location for storage, usually, a
cloud data warehouse, although data lakes are also a popular
option.

Data warehouses are relational databases designed to


store and transform data best for quickly analyzing
historical data.

Data lakes hold high volumes of data in a raw format,


supporting structured, unstructured, and semi-structured
types of data.

Example tools: Snowflake, Databricks, Delta Lake


MDS - TRANSFORMATIONS
Data transformation is the process of revising, computing,
separating, and combining raw data into analysis-ready data
models.

Modern data Transformations tools need to be:

Ensure Data Integrity.


Supports SQL, a common language for analysts and
engineers.

Example tools: EasyMorph, Airflow, DBT


MDS - OUTPUT
Now the data is extracted, stored, cleaned, and ready to be put to use.

Where data goes to get analyzed. Needs to be accessible, and easy to


understand.
Business intelligence
Embedded analytics
Ad hoc reporting

Newer to the modern data stack:

Reverse ETL: While ETL and ELT transfer data from third-party
sources, reverse ETL does the opposite. It transfers data from a data
warehouse to the third-party system and makes sure that it meets
the formatting requirements of that platform.

Data science & AI/ML: Enables Machines to make Decisions based on


Learned Data Models.

Example tools: Power BI


MDS - REVERSE ETL
Reverse ETL, a relatively new concept in data management, involves
moving data from a centralized repository, like a data warehouse or
data lake, back to operational systems or external applications for
various purposes.

Unlike traditional ETL processes that move data from source systems
to a centralized repository for analysis, reverse ETL flips this flow,
enabling organizations to leverage insights gained from centralized
data analysis to drive actions or updates in operational systems.

Advantages: Enables organizations to leverage insights from


centralized data analysis to drive actions or updates in operational
systems in real-time.

Use Cases: Personalizing marketing campaigns, updating customer


records, and enhancing product recommendations.

Example tools: HighTouch, Census, Grouparoo


THE FUTURE OF MDS
analytics Integrity or Holistic Data Analysis: Continued advancement in data analytics,
including holistic approaches and predictive modeling, refines data strategies for more
effective business intelligence.

Open Source Strategy: Modern data companies start with open source, and then move
to the cloud for a hybrid approach, crucial for user engagement. Venture capital
investors increasingly prefer startups with open-source strategies.

More SQL in Data Engineering: SQL is crucial in data management—it's simple, widely
understood, and based on common standards. It's the backbone of the data stack and
will likely support predictive analytics in the future.
CONCLUSION

The modern data stack is powerful tools that can help companies make better data-driven
decisions. If you’re not already using one, now is the time to start putting together a modern
data stack that works for you

If you’re still using a legacy data stack, consider adopting a modern data stack. It is not
merely a rising trend – there are multiple benefits to using it! .

In the future, we can expect to see even more innovation in the modern data stack. This will
help companies to better scale, manage, and analyze their data.
SOME RESOURCES

https://www.moderndatastack.xyz

https://www.thoughtspot.com/data-trends/best-practices/modern-
data-stack
THANK YOU
FOR WATCHING

You might also like