0% found this document useful (0 votes)
5 views

02 Storage (1)

Uploaded by

selezeno4ka1337
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

02 Storage (1)

Uploaded by

selezeno4ka1337
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 104

Physical Storage Organization

Outline
• Where and How data are stored?
– physical level
– logical level

2
Building a Database: High-Level
• Design conceptual schema using a data model, e.g. ER, UML,
etc.

stid 1:N 0:N


student takes course
name

3
Building a Database: Logical-Level
• Design logical schema, e.g. relational, network, hierarchical,
object-relational, XML, etc schemas
• Data Definition Language (DDL)

CREATE TABLE student


(cid char(8) primary key,name varchar(32))
student
cid name

4
Populating a Database
• Data Manipulation Language (DML)

INSERT INTO student VALUES (‘00112233’, ‘Paul’)


student
cid name
00112233 Paul

5
Transaction operations
• Transaction: a collection of operations performing a single
logical function
BEGIN TRANSACTION transfer
UPDATE bank-account SET balance = balance - 100 WHERE account=1
UPDATE bank-account SET balance = balance + 100 WHERE account=2
COMMIT TRANSACTION transfer

• A failure during a transaction can leave system in an


inconsistent state, e.g. transfers between bank accounts.

6
Where and How all this information is
stored?
• Metadata: tables, attributes, data types, constraints, etc
• Data: records
• Transaction logs, indices, etc

7
Where: In Main Memory?
• Fast!
• But:
– Too small
– Too expensive
– Volatile

8
Physical Storage Media
• Primary Storage
– Cache
– Main memory
• Secondary Storage
– Flash memory
– Magnetic disk
• Offline Storage
– Optical disk
– Magnetic tape

9
Magnetic Disks
• Random Access
• Inexpensive
• Non-volatile

10
How do disks work?
• Platter: covered with magnetic recording material
• Track: logical division of platter surface
• Sector: hardware division of tracks
• Block: OS division of tracks
– Typical block sizes:
512 B, 2KB, 4KB
• Read/write head

11
Disk I/O
• Disk I/O := block I/O
– Hardware address is converted to Cylinder, Surface and Sector
number
– Modern disks: Logical Sector Address 0…n
• Access time: time from read/write request to when data transfer
begins
– Seek time: the head reaches correct track
• Average seek time 5-10 msec
– Rotation latency time: correct block rotated
under head
• 5400 RPM, 15K RPM
• On average 4-11 msec
• Block Transfer Time

12
Optimize I/O
• Database system performance I/O bound
• Improve the speed of access to disk:
– Scheduling algorithms (elevator algorithm)
– File Organization (heap, index, hash)
• Introduce disk redundancy
– Redundant Array of Independent Disks (RAID)
• Reduce number of I/Os
– Query optimization, indexes

13
Where and How all this information is
stored?
• Metadata: tables, attributes, data types, constraints, etc
• Data: records
• Transaction logs, indices, etc

• A collection of files (or tables)


– Physically partitioned into pages or data blocks
– Logically partitioned into records

14
Storage Access
• A collection of files
– Physically partitioned into pages
– Typical database page sizes: 2KB, 4KB, 8KB
– Reduce number of block I/Os := reduce number of page I/Os
– How?

• Buffer Manager

15
Buffer Management (1/2)
• Buffer: storing a page copy
• Buffer manager: manages a pool of buffers
– Requested page in pool: hit!
– Requested page in disk:
• Allocate page frame
• Read page and pin
• Problems?

Page request Page request

disk

buffer pool
16
Buffer Management (2/2)
• What if no empty page frame exists:
– Select victim page
– Each page associated with dirty flag
– If page selected dirty, then write it back to disk
• Which page to select?
– Replacement policies (LRU, MRU)

Page request

disk

buffer pool
17
Disk Arrays
• Single disk becomes bottleneck
• Disk arrays
– instead of single large disk
– many small parallel disks
• read N blocks in a single access time
• concurrent queries
• tables spanning among disks
• Redundant Arrays of Independent Disks (RAID)
– 7 levels (0-6)
– reliability
– redundancy
– parallelism

18
RAID Technology
• A natural solution is a large array of small independent
disks acting as a single higher-performance logical disk.
• A concept called data striping is used, which utilizes
parallelism to improve disk performance.
• Data striping distributes data transparently over multiple
disks to make them appear as a single large, fast disk.
RAID Technology (cont.)
• Different raid organizations were defined based on different
combinations of the two factors of granularity of data
interleaving (striping) and pattern used to compute redundant
information.
– Raid level 0 (striping) has no redundant data and hence has
the best write performance at the risk of data loss
– Raid level 1 uses mirrored disks.
– Raid level 2 uses memory-style redundancy by using
Hamming codes, which contain parity bits for distinct
overlapping subsets of components. Level 2 includes both
error detection and correction.
– Raid level 3 uses a single parity disk relying on the disk
controller to figure out which disk has failed.
– Raid Levels 4 and 5 use block-level data striping, with level
5 distributing data and parity information across all disks.
– Raid level 6 applies the so-called P + Q (two parity)
redundancy scheme using Reed-Solomon codes to protect
against up to two disk failures by using just two redundant
disks.
RAID level 0
• Block level striping
• No redundancy
• maximum bandwidth
• automatic load balancing
• best write performance
• but, no reliability

0 1 2 3
4 5

disk 1 disk 2 disk 3 disk 4

21
Raid level 1
• Mirroring
– Two identical copies stored in two different disks
• Parallel reads
• Sequential writes
• transfer rate comparable to single disk rate
• most expensive solution

0 0 2 2
1 1

disk 1 disk 2 disk 3 disk 4


mirror of disk 1 mirror of disk 3
22
RAID levels 2 and 3
• bit level striping (next bit on a separate disk)
• error detection and correction
• RAID 2
– ECC error correction codes (Hamming code)
– Bit level striping, several parity bits
• RAID 3
– Byte level striping, single parity bit
– error detection by disk controllers (hardware)
• RAID 4
- Block level striping, single parity bit

23
RAID level 4
• block level striping
• parity block for each block in data disks
– P1 = B0 XOR B1 XOR B2
– B2 = B0 XOR B1 XOR P1
• an update:
– P1’ = B0’ XOR B0 XOR P1 (every update -> must write parity disk)

B0 B1 B2 P1

disk 1 disk 2 disk 3 disk 4

24
RAID level 5 and 6
• subsumes RAID 4
• parity disk not a bottleneck
– parity blocks distributed on all disks
• RAID 6
– tolerates two disk failures
– P+Q redundancy scheme
• 2 bits of redundant data for each 4 bits of data
– more expensive writes

BM B1
B0 PX’ B2 P1
PN BY’ PX BY

disk 1 disk 2 disk 3 disk 4


25
What pages contain logically?
• Files:
– Physically partitioned into pages (or blocks)
– Logically partitioned into records
• Each file is a sequence of records
• Each record is a sequence of fields

student
cid name
00112233 Paul

student record: 00112233 Paul


8 + 4 = 12 Bytes
26
Page Organization
• Student record size: 12 Bytes
• Typical page size: 2 KB
• Record identifiers: <Page identifier, offset>
• How records are distributed into pages:
– Unspanned organization
• Blocking factor =  pagesize 

recordsize 

– Spanned organization



Page i Page i+1 Page i Page i+1


unspanned spanned
27
Blocking
• Blocking:
– Refers to storing a number of records in one block on the disk.
• Blocking factor (bf) refers to the number of records per block.
• There may be empty space in a block if an integral number of
records do not fit in one block.
• File records can be unspanned or spanned
– Unspanned: no record can span two blocks
– Spanned: a record can be stored in more than one block

The physical disk blocks that are allocated to hold the records
of a file can be contiguous, linked, or indexed.
What if a record is deleted?
• Depending on the type of records:
– Fixed-length records
– Variable-length records

29
Fixed-length record files
• Upon record deletion:
– Packed page scheme
– Bitmap

Slot 1 Slot 1
Slot 2 Slot 2
... ... ...

Slot N Slot N
...
Free Space
Slot M
Page header N N-1 1 ... 1
0 ... 0 1 N
M N 21
Packed Bitmap
30
Variable-length record files
• When do we have a file with variable-length records?
– Column datatype: variable length
– create table t (field1 int, field2 varchar2(n))
• Problems:
– Holes created upon deletion have variable size
– Find large enough free space for new record
• Could use previous approaches: maximum record size
– a lot of space wasted
• Use slotted page structure
– Slot directory
– Each slot storing offset, size of record
...
– Record IDs: page number, slot number

32 ... 16 38 N
N 2 1 31
Record Organization
• Fixed-length record formats
– Fields stored consecutively
• Variable-length record formats
– Array of offsets
– NULL values when start offset = end offset

f1 f2 f3 f4
Base address (B)
L1 L2 L3 L4

f3 Address = B+L1+L2

f1 f2 f3 f4
Base address (B)

32
Operation on Files
• Typical file operations include:
– OPEN: Prepares the file for access and associates a pointer that will refer
to a current file record at each point in time.
– FIND: Searches for the first file record that satisfies a certain condition
and makes it the current file record.
– FINDNEXT: Searches for the next file record (from the current record) that
satisfies a certain condition and makes it the current file record.
– READ: Reads the current file record into a program variable.
– INSERT: Inserts a new record into the file & makes it the current file
record.
– DELETE: Removes the current file record from the file, usually by marking
the record to indicate that it is no longer valid.
– MODIFY: Changes the values of some fields of the current file record.
– CLOSE: Terminates access to the file.
– REORGANIZE: Reorganizes the file records.
• For example, the records marked deleted are physically removed from
the file or a new organization of the file records is created.
– READ_ORDERED: Read the file blocks in order of a specific field of the
file.
File Organization
(later we study it in a more detailed way)

• Heap files: unordered records


• Sorted files: ordered records
• Hashed files: records partitioned into buckets

34
Heap Files
• Simplest file structure
• Efficient insert
• Slow search and delete
– Equality search: half pages fetched on average
– Range search: all pages must be fetched

file
header

35
Sorted (Ordered) files
• Sorted records based on ordering field (e.g. Ename)
– If ordering field same as key field, ordering key field (e.g. Empno)
• Slow inserts and deletes
• Fast logarithmic search

start of file
Page 1 Page 2

insert

start of file
Page 1 Page 2

36
Sorted (Ordered) Files
• Also called a sequential file.
• File records are kept sorted by the values of an ordering field.
• Insertion is expensive: records must be inserted in the correct
order.
– It is common to keep a separate unordered overflow (or
transaction) file for new records to improve insertion
efficiency; this is periodically merged with the main ordered
file.
• A binary search can be used to search for a record on its
ordering field value.
– This requires reading and searching log2 of the file blocks on
the average, an improvement over linear search.
• Reading the records in order of the ordering field is quite
efficient.
Hashed Files
• Hash function h on hash field distributes pages into buckets
• Efficient equality searches, inserts and deletes
• No support for range searches

null

null
hash field h

Overflow page


38
Hashed Files
• Hashing for disk files is called External Hashing
• The file blocks are divided into M equal-sized buckets,
numbered bucket0, bucket1, ..., bucketM-1.
– Typically, a bucket corresponds to one (or a fixed number of)
disk block.
• One of the file fields is designated to be the hash key of the file.
• The record with hash key value K is stored in bucket i, where
i=h(K), and h is the hashing function.
• Search is very efficient on the hash key.
• Collisions occur when a new record hashes to a bucket that is
already full.
– An overflow file is kept for storing such records.
– Overflow records that hash to each bucket can be linked
together.
Hashed Files
Summary (1/2)
• Why Physical Storage Organization?
– understanding low-level details which affect data access
– make data access more efficient
• Primary Storage (memory), Secondary Storage (disk)
– memory fast
– disk slow but non-volatile
• Data stored in files
– partitioned into pages physically
– partitioned into records logically
• Optimize I/Os
– scheduling algorithms
– RAID
– page replacement strategies

41
Summary (2/2)
• File Organization
– how each file type performs
• Page Organization
– strategies for record deletion
• Record Organization

42
Topics for today
• How to lay out data on disk
• How to move it to memory

43
What are the data items we want to store?
• a salary
• a name
• a date
• a picture

What we have available:


Bytes

8
bits

44
To represent:
• Integer (short): 2 bytes

e.g., 35 is

00000000 00100011

• Real, floating point


n bits for mantissa, m for exponent….

45
To represent:
• Characters

 various coding schemes suggested,


most popular is ascii

Example:
A: 1000001
a: 1100001
5: 0110101
LF: 0001010

46
To represent:
• Boolean
e.g., TRUE
FALSE 1111 1111
0000 0000
• Application specific
e.g., RED  1 GREEN  3
BLUE  2 YELLOW  4 …
Can we use less than 1
byte/code?
Yes, but only if desperate...

47
To represent:
• Dates
e.g.: - Integer, # days since Jan 1, 1900
- 8 characters, YYYYMMDD
- 7 characters, YYYYDDD

• Time
e.g. - Integer, seconds since midnight
- characters, HHMMSSFF

48
To represent:
• String of characters
– Null terminated
e.g.,

– Length given
c a t
e.g.,

- Fixed length

3 c a t

49
To represent:
• Bag of bits

Length Bits

50
Key Point

• Fixed length items

• Variable length items


- usually length given at beginning

51
Also
• Type of an item: Tells us how to
interpret
(plus size if fixed)

52
Overview Data Items

Records

Blocks

Files

Memory

53
Record - Collection of related data
items (called FIELDS)

E.g.: Employee record:


name field,
salary field,
date-of-hire field, ...

54
Types of records:
• Main choices:
– FIXED vs VARIABLE FORMAT
– FIXED vs VARIABLE LENGTH

55
Fixed format
A SCHEMA (of a table record) contains
following information
- # fields
- type of each field
- order in record
- meaning of each field

56
Example: fixed format and length
Employee record
(1) E#, 2 byte integer
(2) E.name, 10 char. Schema
(3) Dept, 2 byte code

55 s m i t h 02
Records
83 j o n e s 01

57
Variable format
• Record itself contains format
“Self Describing”

58
Example: variable format and length

2 5 I 46 4 S 4 F O RD
Code identifying

Code for Ename

Length of str.
# Fields

field as E#
Integer type

String type

Field name codes could also be strings, i.e. TAGS

59
Variable format useful for:
• “sparse” records
• repeating fields
• evolving formats

But may waste space...

60
• EXAMPLE: var format record with
repeating fields
Employee  one or more  children

3 E_name: Fred Child: Sally Child: Tom

61
Note: Repeating fields does not imply
- variable format, nor
- variable size

John Sailing Chess --

• Key is to allocate maximum


number of
repeating fields (if not used 
null)
62
Many variants between
fixed - variable format:
Example: Include record type in record

record type record length


5 27 . . . .
tells me what
to expect
(i.e. points to schema)

63
Record header - data at beginning
that describes record
May contain:
- record type
- record length
- time stamp
- other stuff ...

64
Next: placing records into blocks

blocks ...
assume fixed
length blocks
a file

assume a single file (for now)

65
Options for storing records in blocks:
(1) separating records
(2) spanned vs. unspanned
(3) sequencing
(4) indirection

66
(1) Separating records
Block
R1 R2 R3
(a) no need to separate - fixed size recs.
(b) special marker
(c) give record lengths (or offsets)
- within each record
- in block header

67
(2) Spanned vs. Unspanned

• Unspanned: records must be within one block


block 1 block 2
...

• Spanned

R1 R2
block 1 R3 R42 R5
block

...

R3 R3 R7
R1 R2 (a) (b)
R4 R5 R6 (a)

68
With spanned records:

R3 R3 R7
R1 need R2
indication(a) (b)
R4 R5 R6 (a)
need indication
of partial record of continuation
“pointer” to rest (+ from where?)

69
Spanned vs. unspanned:
• Unspanned is much simpler, but may waste space…
• Spanned essential if
record size > block size

70
(3) Sequencing
• Ordering records in file (and block) by some key value

Sequential file (  sequenced)

71
Why sequencing?
Typically to make it possible to efficiently read records in order
(e.g., to do a merge-join — discussed later)

72
Sequencing Options
(a) Next record physically contiguous
...

(b) Linked
R1 Next (R1)

R1 Next (R1)

73
Sequencing Options
(c) Overflow area

Records
in sequence header
R1
R2.1
R2
R1.3
R3
R4.7
R4
R5

74
(4) Indirection

• How does one refer to records?

Rx

75
(4) Indirection

• How does one refer to records?

Rx

Many options:
Physical Indirect

76
Purely Physical
Device ID
E.g., Record Cylinder #
Address = Track #
or ID Block #
Offset in block Block ID

77
Fully Indirect

E.g., Record ID is arbitrary bit string

map
rec ID
r address
a

Rec ID Physical
addr.

78
Tradeoff

Flexibility Cost
to move records of indirection
(for deletions, insertions) (manage the map)

79
Physical Indirect

Many options
in between …

80
Example: Indirection in block

Header

A block: Free

space

R3
R4
R1 R2

81
Block header - data at beginning that
describes block
May contain:
- File ID (or RELATION or DB ID)
- This block ID
- Record directory
- Pointer to free space
- Type of block (e.g. contains recs type 4;
is overflow, …)
- Pointer to other blocks “like it”
- Timestamp ...

82
Other Topics
(1) Insertion/Deletion
(2) Buffer Management
(3) Comparison of Schemes

83
Deletion

Block

Rx

84
Options:
(a) Immediately reclaim space
(b) Mark deleted

– May need chain of deleted records


(for re-use)
– Need a way to mark:
• special characters
• delete field
• in map

85
As usual, many tradeoffs...
• How expensive is to move valid record to free space for
immediate reclaim?
• How much space is wasted?
– e.g., deleted records, delete fields, free space chains,...

86
Concern with deletions
Dangling pointers

R1 ?

87
Solution #1: Do not worry

We can never reuse the space of the deleted record.

88
Solution #2: Tombstones
E.g., Leave “MARK” in map or old location

• Physical IDs

A block

This space This space can


never re-used be re-used

89
Solution #2: Tombstones
E.g., Leave “MARK” in map or old
location
• Logical IDs
map
ID LOC

Never reuse
7788 ID 7788 nor
space in map...

90
Insert
Easy case: records not in sequence
 Insert new record at end of file or in
deleted slot
 If records are variable size, not
as easy...

91
Insert
Hard case: records in sequence
 If free space “close by”, not too bad...
 Or use overflow idea...

92
Interesting problems:

• How much free space to leave in each block, track, cylinder?


• How often do I reorganize file + overflow?

93
Free
space

94
Buffer Management
• DB features needed
• Why LRU may be bad
• Pinned blocks
• Forced output
• Double buffering (prefetch)

95
Row vs Column Store
• So far, we assumed that fields of a record are stored
contiguously (row store)...
• Another option is to store like fields together (column store)

96
Row Store
• Example: Order consists of
– id, cust, prod, store, price, date, qty

id1 cust1 prod1 store1 price1 date1 qty1

id2 cust2 prod2 store2 price2 date2 qty2

id3 cust3 prod3 store3 price3 date3 qty3

97
Column Store
• Example: Order consists of
– id, cust, prod, store, price, date, qty

id1 cust1 id1 prod1 id1 price1 qty1


id2 cust2 id2 prod2 id2 price2 qty2
id3 cust3 id3 prod3 id3 price3 qty3
id4 cust4 id4 prod4 id4 price4 qty4
... ... ... ... ... ... ...

ids may or may not be stored explicitly

98
Row vs Column Store
• Advantages of Column Store
– more compact storage (fields need not start at byte boundaries)
– efficient reads on data mining operations
• Advantages of Row Store
– writes (multiple fields of one record) more efficient
– efficient reads for record access (OLTP)

99
Comparison
• There are 10,000,000 ways to organize my data on disk…

Which is right for me?

100
Issues:
Flexibility Space Utilization

Complexity Performance

101
To evaluate a given strategy, compute following parameters:
-> space used for expected data
-> expected time to
- fetch record given key
- fetch record with next key
- insert record
- append record
- delete record
- update record
- read all file
- reorganize file

102
Summary
• How to lay out data on disk

Data Items
Records
Blocks
Files
Memory
DBMS

103
Next
How to find a record quickly,
given a key

104

You might also like