0% found this document useful (0 votes)
10 views

Big Data and Data Warehousing 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Big Data and Data Warehousing 1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Business Intelligence

Big Data
and
Data Warehousing
Database vs Data Warehousing
Database Data Warehouse

12/3/2023 Annual Review 2


What is a Data Warehousing?
Data warehousing is a process of collecting, storing,
and managing large volumes of data from various
sources in a central repository, typically a
specialized database system. The primary goal of
data warehousing is to provide a unified and
structured view of data that can be easily accessed
and analyzed by business analysts, data scientists,
and decision-makers to support business
intelligence and reporting activities.

12/3/2023 Annual Review 3


Big Data
Primarily refers to data sets that are too large or complex to be dealt with by traditional data
processing application software. Data with many entries offer greater statistical power, while data
with higher complexity may lead to a higher false discovery rate. Due to a lack of formal definition,
the best interpretation is that it is a large body of information that cannot be comprehended when
used in small amounts only.

12/3/2023 Annual Review 4


Six V’s of Big Data
Variety Volume
The types of data structured, semi- The amount of data from myriad sources.
structured and unstructured.

Value Veracity
The business value of the data collected. The degree to which big data can be
trusted.

Variability Velocity
The ways in which the big data can be The speed at which big data is generated.
used and formatted.

12/3/2023 Annual Review 5


How Big is Big Data?
97 ZB
or
97,000,000,000,000 GB
The estimated volume of data created
worldwide in 2022
Source : Statista

12/3/2023 Annual Review 6


Innovation of Data Warehouse

12/3/2023 Annual Review 7


Databricks Summit
 Data Warehouse is a good choice for
companies seeking a mature, structured data
solution that focuses on business intelligence
and data analytics use cases.

 Data Lakes are suitable for organizations


seeking a flexible, low-cost, big-data solution
to drive machine learning and data science
workloads on unstructured data.

 Data Lakehouse is a reasonable choice if


you’re looking for ways to implement both
advanced analytics and machine learning
workloads on your data.
12/3/2023 Annual Review 8
Architecture in Action

12/3/2023 Annual Review 9


Data Integration
The process of consolidating data from multiple applications
and creating a unified view is known as data integration.
Businesses use different data integration tools with a variety
of applications, technologies, and techniques to integrate
data from disparate sources and create a single version of the
truth (SSOT).
Data integration techniques, also called data integration
technologies, are simply the different strategies, approaches,
and tools used for combining data from multiple sources in a
single destination. Data integration technologies has evolved
at a rapid pace over the last decade. Initially, Extract,
Transform, Load (ETL) was the only available data integration
technique, used for batch processing. However, businesses
continued to add more sources to their data ecosystems and
the need for real-time data integration techniques arose.
Hence, new advancements and technologies were
introduced.

12/3/2023 Annual Review 10


Data Integration Methods
Extract, Transform and Load Extract, Load and Transform
Copies of data sets from different source A variation of ETL – data is loaded as is
systems are pulled together, harmonized into a data lake or other big data system
and loaded into database or data and transformed later for specific analytics
warehouse. uses.
Change Data Capture Data Replication
A form of real - time integration. CDC The data in one database is replicated to
identifies data changes in databases and others to keep the information they
applies them to a data warehouse or contain synchronized for backup and
other repository. operational uses.
Data Virtualization Streaming Data Integration
Instead of being loaded into a new Different streams of real - time data are
repository, data from different system is integrated on the fly and fed into data
combined virtually to create a unified view stores and analytics systems on a
for end users. continuous basis.

12/3/2023 Annual Review 11


12/3/2023 Annual Review 12
12/3/2023 Annual Review 13
12/3/2023 Annual Review 14
12/3/2023 Annual Review 15
12/3/2023 Annual Review 16
12/3/2023 Annual Review 17
12/3/2023 Annual Review 18
12/3/2023 Annual Review 19
12/3/2023 Annual Review 20
12/3/2023 Annual Review 21
12/3/2023 Annual Review 22
12/3/2023 Annual Review 23
12/3/2023 Annual Review 24

You might also like