0% found this document useful (0 votes)

22 views

Unit 5

The document discusses concepts of big data and data lakes. It defines big data, data sources, and benefits over traditional databases. It also defines data warehouses, OLTP, and OLAP. Additionally, it defines data lakes, their architecture and significance, and compares them to data warehouses.

Uploaded by

userdemo12334

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Unit 5

Uploaded by

userdemo12334

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Unit-5: Concepts of Big Data and Data Lake

5.1 Concepts of Bigdata

5.1.1 Sources of Bigdata
5.1.2 Bigdata benefits over Traditional Database
5.1.3 Concepts of Data Warehouse
5.1.3.1 Concepts of data processing techniques:
5.1.3.1.1 OLTP (Online Transaction Processing)
5.1.3.1.2 OLAP (Online Analytical Processing)
5.2 Concepts of Data Lake:
5.2.1 Data lake concepts and its architecture
5.2.2 Significance of data lake
5.2.3 Comparison of Data Lake and Data Warehousing

5.1 Concepts of Big Data

 Big Data refers to the massive volume of structured and unstructured data generated by
various sources, which is characterized by its size, complexity, and the speed at which it is
generated and processed.
 The concept revolves around managing and deriving valuable insights from these vast
datasets that traditional data processing tools may struggle to handle. Key characteristics of
Big Data are often described using the 3Vs: Volume, Velocity, and Variety.

5.1.1 Sources of Big Data

Big Data originates from various sources, and it is characterized by the 3Vs: Volume,
Velocity, and Variety.
Volume:
 Refers to the massive amounts of data generated daily. Big Data involves datasets
that are too large to be processed and analyzed using traditional databases and
tools.
 Examples include social media posts, sensor data, and log files.

Velocity:
 Describes the speed at which data is generated, processed, and analyzed. Real-time
applications and streaming data contribute to high data velocity.
 Big Data often involves real-time or near-real-time processing to keep up with the
constant influx of data from various sources.
Variety:
 Encompasses the diverse types of data, including structured, semi-structured, and
unstructured data. This includes text, images, videos, and more.
 Big Data includes diverse forms of data, such as text, images, videos, social media
posts, sensor data, and more.

5.1.2 Big Data Benefits over Traditional Database

Big Data offers advantages over traditional databases due to its ability to handle large
volumes, diverse data types, and high velocities. Benefits include:

Scalability: Big Data technologies can scale horizontally, handling massive amounts of data across
distributed systems.
Flexibility: Big Data systems can accommodate various data types and formats, allowing for flexible
data storage and processing.

Real-time Processing: Big Data platforms enable real-time data processing, critical for applications
like fraud detection and monitoring.

Cost-Effectiveness: Distributed computing and open-source solutions make Big Data cost-effective
compared to traditional databases.

5.1.3 Concepts of Data Warehouse

 A Data Warehouse is a central repository that stores and manages large volumes of
structured data from various sources, making it available for complex analysis and
reporting.
 It is designed to support decision-making processes by providing a consolidated and
organized view of an organization's historical and current data. The concept of a
Data Warehouse involves several key elements:

5.1.3.1 Concepts of Data Processing Techniques

5.1.3.1.1 OLTP (Online Transaction Processing)

 OLTP is a type of data processing that focuses on managing and processing
transaction-oriented applications. It involves short and simple queries, often related
to inserting, updating, and deleting records.
 OLTP systems are designed for consistency and handle a large number of concurrent
transactions.

5.1.3.1.2 OLAP (Online Analytical Processing)

 OLAP is geared towards complex queries and analytical processing.
 It involves aggregations and calculations over large datasets.
 OLAP systems are optimized for read-heavy operations and are crucial for business
intelligence and decision support systems.

5.2 Concepts of Data Lake

 A Data Lake is a centralized repository that allows organizations to store vast amounts of
structured, semi-structured, and unstructured data at any scale.
 Unlike traditional databases or data warehouses, a Data Lake does not require predefined
schemas before storing the data, making it a highly flexible and scalable solution.

5.2.1 Data Lake Concepts and Its Architecture

 A Data Lake is a centralized repository that allows storage of structured and unstructured
data at any scale. Key concepts include:

Storage: Data Lakes store data in its raw form, without the need for extensive structuring. This
allows for the storage of diverse data types, including raw, unprocessed data.

Scalability: Data Lakes can scale horizontally, handling vast amounts of data by distributing it across
clusters of inexpensive hardware.

Schema-on-Read: Unlike traditional databases, Data Lakes follow a schema-on-read approach. The
structure is imposed on the data only when it's read, enabling flexibility.
Data Lake Architecture

5.2.2 Significance of Data Lake

The significance of Data Lakes lies in their ability to store and process large volumes of raw
data efficiently. Key points include:

Advanced Analytics: Data Lakes support advanced analytics, machine learning, and other data-
intensive applications by providing a flexible and scalable storage solution.

Cost-Efficiency: They offer a cost-effective solution for storing large volumes of data compared to
traditional storage solutions.

Flexibility: Data Lakes allow organizations to store and analyze diverse data types without the need
for extensive upfront structuring.

5.2.3 Comparison of Data Lake and Data Warehousing

Data Lakes and Data Warehouses serve different purposes, and their comparison involves:

Data Types: Data Lakes store raw, unstructured data, while Data Warehouses store structured,
processed data.

Schema: Data Lakes use a schema-on-read approach, providing flexibility, whereas Data Warehouses
use a schema-on-write approach for structured data.

Processing Time: Data Lakes are suitable for real-time and batch processing, while Data Warehouses
are optimized for batch processing and complex queries.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Book Recommendation System PROJECT PDF
No ratings yet
Book Recommendation System PROJECT PDF
54 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Database Datalake
No ratings yet
Database Datalake
2 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
s10844-020-00608-7
No ratings yet
s10844-020-00608-7
24 pages
Bring Data Lakes and Data Warehouses Together
100% (1)
Bring Data Lakes and Data Warehouses Together
19 pages
GCP - DataPlex - Building A Data Lakehouse
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
19 pages
BigQuery
No ratings yet
BigQuery
8 pages
Unit 5 Concepts of Big Data and Data Lake
No ratings yet
Unit 5 Concepts of Big Data and Data Lake
15 pages
On Data Lake Architectures Andmetadata Management
No ratings yet
On Data Lake Architectures Andmetadata Management
24 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
DWM QB Soln
No ratings yet
DWM QB Soln
18 pages
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
No ratings yet
Unit - Iv Data Analytics Frameworks: Centralized and Distributed Functional Architectures of Relational Systems
24 pages
Apache Spark Week-5 PDF
No ratings yet
Apache Spark Week-5 PDF
9 pages
1684245766488
No ratings yet
1684245766488
33 pages
Data Warehouse OLAP
No ratings yet
Data Warehouse OLAP
21 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Nosql Datawarehouse
No ratings yet
Nosql Datawarehouse
11 pages
The Differences Between A Database, Data Warehouse, and Data Lake
No ratings yet
The Differences Between A Database, Data Warehouse, and Data Lake
3 pages
1whaaatf
No ratings yet
1whaaatf
5 pages
The Data Lakes: A Leap Forward Future of Data Warehousing
No ratings yet
The Data Lakes: A Leap Forward Future of Data Warehousing
5 pages
Warehouse Assignment MIM 106
No ratings yet
Warehouse Assignment MIM 106
8 pages
DM & DW
No ratings yet
DM & DW
5 pages
Data Lake and Data Warehouse
100% (2)
Data Lake and Data Warehouse
24 pages
Unit 5
No ratings yet
Unit 5
17 pages
DMDW1
No ratings yet
DMDW1
13 pages
Data Lake Essentials
No ratings yet
Data Lake Essentials
11 pages
Data Mining UNIT 2 LECTURE NOTES
No ratings yet
Data Mining UNIT 2 LECTURE NOTES
32 pages
Big Data
No ratings yet
Big Data
19 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
DM Chapter 4
No ratings yet
DM Chapter 4
8 pages
Database
No ratings yet
Database
2 pages
Introduction to data lakes
No ratings yet
Introduction to data lakes
6 pages
Clase 2 A
No ratings yet
Clase 2 A
12 pages
Data Mining
No ratings yet
Data Mining
98 pages
DATA WAREHOUSE - Pertemuan01
No ratings yet
DATA WAREHOUSE - Pertemuan01
20 pages
A Comparsion of Databases and DataWarehouses - 2
No ratings yet
A Comparsion of Databases and DataWarehouses - 2
29 pages
Chapter 2 Data Warehousing
No ratings yet
Chapter 2 Data Warehousing
57 pages
Module 6
No ratings yet
Module 6
7 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Module-1: Data Warehousing & Modelling
No ratings yet
Module-1: Data Warehousing & Modelling
13 pages
Data Mining ---------1.
No ratings yet
Data Mining ---------1.
34 pages
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Data Warehousing & Dimensional Modeling Concepts !!
No ratings yet
Data Warehousing & Dimensional Modeling Concepts !!
33 pages
3.1 What Is Data Warehouse?: Unit Iii
No ratings yet
3.1 What Is Data Warehouse?: Unit Iii
33 pages
7 - Data warehousing & Data Modelling_DE_Feb25
No ratings yet
7 - Data warehousing & Data Modelling_DE_Feb25
18 pages
A Comparsion of Databases and DataWarehouses
No ratings yet
A Comparsion of Databases and DataWarehouses
29 pages
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
119 pages
DWDM Lecture Notes
No ratings yet
DWDM Lecture Notes
139 pages
Tutorial 1 Answers for Data Mining and Warehousing (Universiti Malaya)
No ratings yet
Tutorial 1 Answers for Data Mining and Warehousing (Universiti Malaya)
4 pages
FDS Unit-2
No ratings yet
FDS Unit-2
36 pages
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
From Everand
Mastering Delta Lake: Optimizing Data Lakes for Performance and Reliability
Robert Johnson
No ratings yet
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
UNIT II Database & Data Warehouse
No ratings yet
UNIT II Database & Data Warehouse
26 pages
Data Lakes Powering The Future of Big Data
No ratings yet
Data Lakes Powering The Future of Big Data
8 pages
Big Data
No ratings yet
Big Data
28 pages
DWDM Lecture Materials 231015 173712
No ratings yet
DWDM Lecture Materials 231015 173712
62 pages
Data Warehousing
No ratings yet
Data Warehousing
7 pages
BDA Unit-2 (Part 3)
No ratings yet
BDA Unit-2 (Part 3)
7 pages
Data Mining & Warehousing
No ratings yet
Data Mining & Warehousing
8 pages
Shreya. K Resume
No ratings yet
Shreya. K Resume
2 pages
Card For Item Writing Speaking
No ratings yet
Card For Item Writing Speaking
6 pages
DBMS Top 30 Interview Question
No ratings yet
DBMS Top 30 Interview Question
19 pages
uni1,2,3,mcq bank
No ratings yet
uni1,2,3,mcq bank
57 pages
Moodle Software Architecture Evaluation-Jasmin Alimanovic
No ratings yet
Moodle Software Architecture Evaluation-Jasmin Alimanovic
7 pages
Uddin et al (2023)
No ratings yet
Uddin et al (2023)
21 pages
Experiment No 4
No ratings yet
Experiment No 4
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
17 pages
PYTHON PROJECTS LIST
No ratings yet
PYTHON PROJECTS LIST
13 pages
Online Hotel Reservation System
No ratings yet
Online Hotel Reservation System
87 pages
NoSql Databases (5th Sem)
No ratings yet
NoSql Databases (5th Sem)
7 pages
Data Analysis Projects PDF
No ratings yet
Data Analysis Projects PDF
4 pages
Artificial Intelligence Sandeep Reddy
No ratings yet
Artificial Intelligence Sandeep Reddy
55 pages
Mongodb JSON Schema
No ratings yet
Mongodb JSON Schema
8 pages
SPA DESCRIPTIVE PAPER Solutions by Saket Sharma
No ratings yet
SPA DESCRIPTIVE PAPER Solutions by Saket Sharma
20 pages
PUT Question Paper (DBMS)
No ratings yet
PUT Question Paper (DBMS)
2 pages
Where can buy Methodologies and Intelligent Systems for Technology Enhanced Learning, 10th International Conference. Workshops: Volume 2 Zuzana Kubincová ebook with cheap price
100% (3)
Where can buy Methodologies and Intelligent Systems for Technology Enhanced Learning, 10th International Conference. Workshops: Volume 2 Zuzana Kubincová ebook with cheap price
52 pages
Finding and Reviewing Research Evidence in The Literature
No ratings yet
Finding and Reviewing Research Evidence in The Literature
29 pages
01. Introduction to Machine Learning
No ratings yet
01. Introduction to Machine Learning
63 pages
Geographic Data Science - Birkbeck, University of London
No ratings yet
Geographic Data Science - Birkbeck, University of London
4 pages
Fresher Software Engineer Mobiloitte
No ratings yet
Fresher Software Engineer Mobiloitte
2 pages
Material de La Conferencia 02
No ratings yet
Material de La Conferencia 02
44 pages
UNIT 2 Data Analysis
No ratings yet
UNIT 2 Data Analysis
19 pages
Minor Project
No ratings yet
Minor Project
35 pages
BERT
No ratings yet
BERT
1 page
R Paper
No ratings yet
R Paper
7 pages
Unit V:Normalization: Normalization: Relational Database Design Pitfalls, Denormalized Data, Decomposition
No ratings yet
Unit V:Normalization: Normalization: Relational Database Design Pitfalls, Denormalized Data, Decomposition
30 pages
06_chapter 2
No ratings yet
06_chapter 2
29 pages
Ogi352 QB Unit 2
100% (1)
Ogi352 QB Unit 2
2 pages

Unit 5

Uploaded by

Unit 5

Uploaded by

Unit-5: Concepts of Big Data and Data Lake

5.1 Concepts of Bigdata

5.1 Concepts of Big Data

5.1.1 Sources of Big Data

5.1.2 Big Data Benefits over Traditional Database

5.1.3 Concepts of Data Warehouse

5.1.3.1 Concepts of Data Processing Techniques

5.1.3.1.1 OLTP (Online Transaction Processing)

5.1.3.1.2 OLAP (Online Analytical Processing)

5.2 Concepts of Data Lake

5.2.1 Data Lake Concepts and Its Architecture

5.2.2 Significance of Data Lake

5.2.3 Comparison of Data Lake and Data Warehousing

You might also like