Advance Database Chap One
Advance Database Chap One
http://www.ksi.mff.cuni.cz/~svoboda/courses/221‐NIE‐PDB/
Lecture 1
Introduction
Martin Svoboda
[email protected]
20. 9. 2022
Dan Ariely:
Big Data is like teenage sex: everyone talks about it, nobody
really knows how to do it, everyone thinks everyone else is
doing it, so everyone claims they are doing it.
Source: http://www.ibmbigdatahub.com/
Source: http://www.ibmbigdatahub.com/
Source: http://www.ibmbigdatahub.com/
Source: http://www.ibmbigdatahub.com/
• Volume (Scale)
Data volume is increasing exponentially, not linearly
Even large amounts of small data can result into Big Data
• Variety (Complexity)
Various formats, types, and structures
(from semi‐structured XML to unstructured multimedia)
• Velocity (Speed)
Data is being generated fast and needs to be processed fast
• Veracity (Uncertainty)
Uncertainty due to inconsistency, incompleteness, latency,
ambiguities, or approximations
• Value
Business value of the data (needs to be revealed)
• Validity
Data correctness and accuracy with respect to the intended use
• Volatility
Period of time the data is valid and should be maintained
• Cardinality
• Continuity
• Complexity
Source: https://www.xenonstack.com/blog/big‐data‐engineering/ingestion‐processing‐big‐data‐iot‐stream/
Model
• Functional dependencies
• 1NF, 2NF, 3NF, BCNF (Boyce‐Codd normal form)
Objective
• Normalization of database schema to BCNF or 3NF
• Algorithms: decomposition or synthesis
Motivation
• Diminish data redundancy, prevent update anomalies
• However:
Data is scattered into small pieces (high granularity), and so
these pieces have to be joined back together when querying!
Model
• Transaction = flat sequence of database operations
(READ, WRITE, COMMIT, ABORT)
Objectives
• Enforcement of ACID properties
• Efficient parallel / concurrent execution (slow hard drives, …)
ACID properties
• Atomicity – partial execution is not allowed (all or nothing)
• Consistency – transactions turn one valid database state into another
• Isolation – uncommitted effects are concealed among transactions
• Durability – effects of committed transactions are permanent
• Scaling
Horizontal distribution of data among hosts
• Volume
High volumes of data that cannot be handled by RDBMS
• Administrators
No longer needed because of the automated maintenance
• Economics
Usage of cheap commodity servers, lower overall costs
• Flexibility
Relaxed or missing data schema, easier design changes
• Maturity
Often still in pre‐production phase with key features missing
• Support
Mostly open source, limited sources of credibility
• Administration
Sometimes relatively difficult to install and maintain
• Analytics
Missing support for business intelligence and ad‐hoc querying
• Expertise
Still low number of NoSQL experts available in the market