Lecture04 Data Models
Lecture04 Data Models
Management Systems
Lecture 4: Data Models
• Conceptual design
5
* IBM Information Management System
Early Proposal 1: IMS*
• Hierarchical data model
• Record
– Type: collection of named fields with data types
– Instance: must match type definition
– Each instance has a key
– Record types arranged in a tree
What does
this mean?
What does
this mean?
File on disk:
Supp Part Part … Supp Part Part … …
What does
this mean?
File on disk:
Supp Part Part … Supp Part Part … …
16
Data storage
How is data physically stored in IMS?
• Root records
– Stored sequentially (sorted on key)
– Indexed in a B-tree using the key of the record
– Hashed using the key of the record
• Dependent records
– Physically sequential
– Various forms of pointers
• Selected organizations restrict DL/1 commands
– No updates allowed due to sequential organization
– No “get-next” for hashed organization
17
Data Independence
What is it?
18
Data Independence
What is it?
• Conceptual design
• Against relational
– What were the arguments?
• Against relational
– COBOL programmers cannot understand relational languages
– Impossible to implement efficiently
We say What
we want
Product(pid, name, price)
Purchase(pid, cid, store)
δ
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = y.cid and Π
x.price > 100 and z.city = ‘Seattle’ x.name,z.name
σ
price>100 and city=‘Seattle’
We say What
we want
cid=cid
pid=pid
Customer
Product Purchase
Product(pid, name, price)
Purchase(pid, cid, store)
δ hash-based
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = y.cid and Π on-the-fly
x.price > 100 and z.city = ‘Seattle’ x.name,z.name
on-the-fly σ
price>100 and city=‘Seattle’
We say What
we want hash-join
cid=cid
Index-join
pid=pid
Says How Customer
to get it
Product Purchase
Query Optimizer
• Rewrite one relational algebra
expression to a better one
SELECT *
Querying the view: FROM Big_Parts
WHERE pcolor='blue';
• Materialized views:
– Some SQL engines support them
– CREATE MATERIALIZED VIEW xyz AS
– Computed at definition time
42
Outline
• Early data models
• Conceptual design
Relational Model:
plus FD’s
(FD = functional dependency)
Normalization:
Eliminates anomalies
CSEP 544 - Spring 2021 44
Entity-Relationship Diagram
Patient Doctor
Patient Doctor
• And more...
CSEP 544 - Spring 2021 50
E/R To Relations
name
since
dno
pno
Patient Doctor
pno name zip dno since dno name spec
P311 Alice 98765 D007 2001 D007 Bob cardio
… …
Subclasses to
Relations
name
category
price
Product
isa isa
53
Subclasses to
Relations
Product Name Price Category
name
category
price Gizmo 99 gadget
Camera 49 photo
Product
Toy 39 gadget
isa isa
54
Subclasses to
Relations
Product Name Price Category
name
category
price Gizmo 99 gadget
Camera 49 photo
Product
Toy 39 gadget
55
Subclasses to
Relations
Product Name Price Category
name
category
price Gizmo 99 gadget
Camera 49 photo
Product
Toy 39 gadget
• Conceptual design
63
Discussion
• Stonebraker (circa 1998)
– “schema last” is a niche market
• Today (circa 2020)
– Major vendors scramble to offer efficient
schema discovery while ingesting Json
• Why? What changed?
– Today datasets are available in text format,
often in Json; ingest first, process later
64
NoSQL Data Model(s)
• Web boom in the 2000’s created a
scalability crises
– DBMS are single server and don’t scale;
e.g. MySQL
• NoSQL answer:
– “Shard” data, i.e. distribute on a cluster
– Simple data mode: key/value pairs