0% found this document useful (0 votes)

27 views4 pages

Lec7 Logging

The document discusses logging as a technique for achieving atomic updates of complex data structures in the presence of failures. Logging allows fast recovery by replaying log entries instead of scanning the entire disk. Key aspects of logging covered include write-ahead logging, checkpoints to limit replay on recovery, and handling uncommitted transactions.

Uploaded by

Priti Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Lec7 Logging

Uploaded by

Priti Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

6.

824 2006 Lecture 7: Logging

What's the overall topic?

Atomic updates of complex data w.r.t. failures.
Today just a single system, we'll be seeing distributed versions
later.

Why aren't synchronous meta-data updates enough?

(from last lecture on file system crash recovery)
They're slow
Recovery may require scanning the whole disk
Some operations don't have an obvious single committing write

Example: FFS rename

editor could use re-name from temp file for careful update
echo a > d1/f1
echo b > d2/f2
mv d2/f2 d1/f1
need to update two directories, stored in two blocks on disk.
remove then add? add then remove?
probably want add then remove
what if a crash?
what does fsck do?
it knows something is wrong, since link count is 1, but two links.
can't roll back -- which one to delete?
has to just increase the link count.
this is *not* a legal result of rename!
but at least we haven't lost the file.
so FFS is slow *and* it doesn't get semantics right.

You can push tree update one step farther.

Prepare a new copy of the entire affected sub-tree.
Replace old subtree in one final write.
Very expensive if done in the obvious way.
But you can share structure between old and new tree.
Only need new storage between change points and sub-tree root.
(NetApp WAFL does this and more.)
This approach only works for tree data structures.
and doesn't support concurrent operations very well

What are the reasons to use logging?

atomic commit of compound operations. w.r.t. crashes.
fast recovery (unlike fsck).
well-defined post-recovery state: serial prefix of operations.
as if synchronous and crash had occured a bit earlier
can be applied to almost any existing data structure
e.g. database tables, free lists
representation is compact on a disk, so very fast to append
useful to coordinate updates to distributed data structures
let's all do this operation
oops, someone didn't say "yes"
how to back out or complete?

Transactions
The main point of a log is to make complex operations atomic.
I.e. operations that involve many individual writes.
You want all writes or none, even if a crash in the middle.
Cite as: Robert Morris, course materials for 6.824 Distributed Computer Systems Engineering,
Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of
Technology. Downloaded on [DD Month YYYY].
A "transaction" is a multi-write operation that should be atomic.
The logging system needs to know which sets of writes form a
transaction.
re-organize code to mark start/end of group of atomic operations
create()
begin_transaction
update free list
update i-node
update directory entry
end_transaction
app sends writes to the logging system
there may be multiple concurrent transactions
e.g. if two processes are making system calls

Terminology
in-memory data cache
on-disk data
in-memory log
on-disk log
dirty vs clean
sync write vs async

naive re-do log

keep a "log" of updates
B TID [begin]
W TID B# new-data [write]
E TID [end == commit]
Example:
B T1
W T1 B1 25
E T1
B T2
W T2 B1 30
B T3
W T3 B2 99
W T3 B3 50
E T3
for now, log lives on its own infinite disk
note we include record from uncommitted xactions in the log
records from concurrent xactions may be inter-mingled
we can write dirty in-memory data blocks to disk any time we want
recovery
1. discard all on-disk data
2. scan whole log and remember all Committed TIDs
3. scan whole log, ignore non-committed TIDs, replay the writes
why can't we use any of on-disk data's contents during recovery?
don't know if a block is from an uncommitted xaction
i.e. was written to disk before commit
the *real* data is in the log!
the on-disk data structure is just a cache for speed
since it's hard to *find* things in a log
so what have we achieved?
atomic update of complex data structures: gets rename() right
recoverable
operations are fast
problems:
we have to store the whole log forever
Cite as: Robert Morris, course materials for 6.824 Distributed Computer Systems Engineering,
Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of
Technology. Downloaded on [DD Month YYYY].
recovery has to replay from the beginning of time

re-do with checkpoint

most logs work like this, e.g. FSD
allows much faster recovery: can use on-disk data
write-ahead rule
delay flushing dirty blocks from in-memory data cache
until corresponding commit record is on disk
so keep updates of uncommitted xactions in in-memory data cache (not
disk)
so no un-committed data on disk.
but disk may be missing some committed data
recovery needs to replay committed data from the log
how can we avoid re-playing the whole log on recovery?
recovery needs to know a point in log at which it can start
a "checkpoint", pointer into log, stored on disk
how to ensure recovery can ignore everything before the checkpoint?
checkpoint rule:
all data writes before check point must be stable on disk
checkpoint may not advance beyond first uncommitted Begin
in background, flush a bunch of early writes, update checkpoint ptr
three log regions:
data guaranteed on disk
(checkpoint)
data might be on disk
(log write point)
data cannot be on disk
(end of in-memory log)
on recovery, re-play commited updates from checkpoint onward
it's ok if we flush but crash before updating checkpoint pointer
we will re-write exactly the same data during recovery
can free log space before checkpoint!

problem:
uncommitted transactions use space in in-memory data cache
a problem for long-running transactions
(not a problem for file systems)

un-do/re-do with checkpoint

suppose we want to write uncommitted data to disk?
need to be able to un-do them in recovery
so include old value in each log record
W TID B# old-data new-data
now we can write data from in-memory data cache to disk
after log entry is on disk
no need to wait for the End to be on disk
so we can free in-memory data cache blocks of uncommitted
transactions
recovery:
for each block mentioned in the log
find the last xaction that wrote that block
if committed: re-do
if not committed: un-do
two pointers stored on disk: checkpoint and tail
checkpoint:
all in-memory data cache entries flushed up to this point
no need to re-do before this point
Cite as: Robert Morris, course materials for 6.824 Distributed Computer Systems Engineering,
Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of
Technology. Downloaded on [DD Month YYYY].
but may need to un-do before this point
tail:
start of first uncommitted transaction
no need to un-do before this point
so can free before this point
it's ok if we crash just before updating the tail pointer itself
we would have advanced it over committed transaction(s)
so we will re-do them, no problem
what if there's an un-do record for block never written to disk?
it's ok: un-do will re-write same value that's already there
what if
B T1
W T1 B1 old=10 new=20
B T2
W T2 B1 old=20 new=30
crash
The right answer is B1 = 10, since neither committed
But it looks like we'll un-do to 20
What went wrong? How to fix it?

careful disk writing

log usually stored in a dedicated known area of the disk
so it's easy to find after a reboot
where's the start?
checkpoint, a pointer in a known disk sector
where's the end?
hard if crash interrupted log append
append records in order
include unique ascending sequence # in each record
also a checksum for multi-sector records (maybe in End?)
recovery must search forward for highest sequential #
i'm assuming disk sector writes are atomic, and "work correctly"
see FSD paper for better handling of disk failures

why is logging fast?

group commit -- batched log writes.
could delay flushing log -- may lose committed transactions
but at least you have a prefix.
single seek to implement a transaction.
maybe less if no intervening disk activity, or group commit
write-behind of data allows batched / scheduled.
one data block may reflect many transactions.
i.e. create many files in a directory.
don't have to be so careful since the log is the real information

Cite as: Robert Morris, course materials for 6.824 Distributed Computer Systems Engineering,
Spring 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of
Technology. Downloaded on [DD Month YYYY].

Outline: File System Consistency Issues in The Presence of Failures
No ratings yet
Outline: File System Consistency Issues in The Presence of Failures
4 pages
Ext3/4 File Systems: Don Porter CSE 506
No ratings yet
Ext3/4 File Systems: Don Porter CSE 506
33 pages
14 Recovery
No ratings yet
14 Recovery
4 pages
Database Systems: Recovery Control
No ratings yet
Database Systems: Recovery Control
25 pages
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
No ratings yet
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
30 pages
Outline: Access Control Lists (ACL) : Keep Lists of Access For Each Domain With
No ratings yet
Outline: Access Control Lists (ACL) : Keep Lists of Access For Each Domain With
5 pages
Dbms Unit 4 Notes.
No ratings yet
Dbms Unit 4 Notes.
21 pages
File System Consistency and Exam Review
No ratings yet
File System Consistency and Exam Review
43 pages
Buffer Cache Algorithms: Session No:5 Operating System Design @KL University, 2020
No ratings yet
Buffer Cache Algorithms: Session No:5 Operating System Design @KL University, 2020
21 pages
33-M5- Transaction concepts -Transaction states-30-09-2024
No ratings yet
33-M5- Transaction concepts -Transaction states-30-09-2024
15 pages
Failure Recovery: Checkpointing Undo/Redo Logging
No ratings yet
Failure Recovery: Checkpointing Undo/Redo Logging
22 pages
8 - RecoveryTechniques - Ch19
No ratings yet
8 - RecoveryTechniques - Ch19
83 pages
Journaling
No ratings yet
Journaling
22 pages
22 File Systems 2
No ratings yet
22 File Systems 2
28 pages
935ede972b992acb7e5bbbd82ad8ad68_MIT6_830F10_lec13
No ratings yet
935ede972b992acb7e5bbbd82ad8ad68_MIT6_830F10_lec13
4 pages
537-L22-LFS
No ratings yet
537-L22-LFS
64 pages
Data Access
No ratings yet
Data Access
18 pages
Database System Recovery: CSEP 545 Transaction Processing For E-Commerce Philip A. Bernstein
No ratings yet
Database System Recovery: CSEP 545 Transaction Processing For E-Commerce Philip A. Bernstein
45 pages
GNR-18 DBMS Unit-5
No ratings yet
GNR-18 DBMS Unit-5
22 pages
Crash Recovery Method: Kathleen Durant CS 3200
No ratings yet
Crash Recovery Method: Kathleen Durant CS 3200
35 pages
Recovery
No ratings yet
Recovery
35 pages
Ext3 Journaling File System: Chadd Williams Shrug 10/05/2001
No ratings yet
Ext3 Journaling File System: Chadd Williams Shrug 10/05/2001
21 pages
Lecture No - 43
No ratings yet
Lecture No - 43
8 pages
Session 19 Recovery
No ratings yet
Session 19 Recovery
18 pages
Chapter17 2
No ratings yet
Chapter17 2
23 pages
Chap6 Recovery Techniques
No ratings yet
Chap6 Recovery Techniques
35 pages
Crash Recovery: Transaction
No ratings yet
Crash Recovery: Transaction
11 pages
Unit 3 GRP
No ratings yet
Unit 3 GRP
12 pages
Group 19 Assign
No ratings yet
Group 19 Assign
9 pages
Recovery System
No ratings yet
Recovery System
27 pages
Crash Consistency
No ratings yet
Crash Consistency
14 pages
Cheat Sheet Dbms
No ratings yet
Cheat Sheet Dbms
1 page
Recovery
No ratings yet
Recovery
26 pages
10 UW Crash Recovery
No ratings yet
10 UW Crash Recovery
52 pages
Assign ppt
No ratings yet
Assign ppt
9 pages
Lecture 18
No ratings yet
Lecture 18
57 pages
Chapter17 2
No ratings yet
Chapter17 2
23 pages
CH 5 Daatabase Recovery
No ratings yet
CH 5 Daatabase Recovery
21 pages
L20 FS Reliability
No ratings yet
L20 FS Reliability
18 pages
Transaction Processing Concepts Concurrency Control and Recovery Part 3
No ratings yet
Transaction Processing Concepts Concurrency Control and Recovery Part 3
34 pages
Lecture 21
No ratings yet
Lecture 21
53 pages
Serial Schedule Non-Serial Schedule: Checkpoints
No ratings yet
Serial Schedule Non-Serial Schedule: Checkpoints
7 pages
DataBase Recovery Techniques
100% (1)
DataBase Recovery Techniques
37 pages
Implementing Transaction Processing Using Undo Logs
No ratings yet
Implementing Transaction Processing Using Undo Logs
14 pages
LectDB 26recovery-1
No ratings yet
LectDB 26recovery-1
16 pages
Session 5 6 Revision
No ratings yet
Session 5 6 Revision
47 pages
Recovery System
No ratings yet
Recovery System
55 pages
Slides11 Recovery
No ratings yet
Slides11 Recovery
14 pages
20 Windows Tools Every SysAdmin Should Know
From Everand
20 Windows Tools Every SysAdmin Should Know
padmin
4.5/5 (3)
Recovery
No ratings yet
Recovery
4 pages
Recovery
No ratings yet
Recovery
36 pages
Bluffs: BSD Logging Updated Fast File System Stephan Uphoff
No ratings yet
Bluffs: BSD Logging Updated Fast File System Stephan Uphoff
52 pages
Crash Recovery
No ratings yet
Crash Recovery
5 pages
Database Recovery Techniques (Ref: Navathe)
0% (1)
Database Recovery Techniques (Ref: Navathe)
33 pages
Crash Recovery
No ratings yet
Crash Recovery
30 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Recovery Techniques Dbms
No ratings yet
Recovery Techniques Dbms
18 pages
Unit 3 - Dbms
No ratings yet
Unit 3 - Dbms
6 pages
Log Structured File Systems: Motivation
No ratings yet
Log Structured File Systems: Motivation
4 pages
Build Your Own Database From Scratch - James Smith
No ratings yet
Build Your Own Database From Scratch - James Smith
375 pages
OECD-APEC Global Forum: Policy Frameworks For The Digital Economy Honolulu, Hawaii 15 January 2003
No ratings yet
OECD-APEC Global Forum: Policy Frameworks For The Digital Economy Honolulu, Hawaii 15 January 2003
6 pages
Hyder Khan M365
No ratings yet
Hyder Khan M365
5 pages
tr
No ratings yet
tr
7 pages
Adding Device Android BYOD
No ratings yet
Adding Device Android BYOD
5 pages
Unit 1: Basic Components of SAP Business Technology Platform
No ratings yet
Unit 1: Basic Components of SAP Business Technology Platform
8 pages
unity-edgeconnect-sd-wan-solution
No ratings yet
unity-edgeconnect-sd-wan-solution
6 pages
Aura Component in Lighting-Notes
No ratings yet
Aura Component in Lighting-Notes
8 pages
DD Os 7.2 Admin Guide 01
No ratings yet
DD Os 7.2 Admin Guide 01
376 pages
PUSL3111 API Software Development Course Work
No ratings yet
PUSL3111 API Software Development Course Work
5 pages
Yahia Omar Data Engineering CV
No ratings yet
Yahia Omar Data Engineering CV
2 pages
AJP
100% (1)
AJP
10 pages
Vaibhav Vasantrao Naldurgkar-BORN New Format Resume
No ratings yet
Vaibhav Vasantrao Naldurgkar-BORN New Format Resume
3 pages
GIS Basic3 Presentation
No ratings yet
GIS Basic3 Presentation
3 pages
Gevindu Dayal Jayasinghe_240912_085920
No ratings yet
Gevindu Dayal Jayasinghe_240912_085920
2 pages
Lecture 1 Fundamentals of GIS
No ratings yet
Lecture 1 Fundamentals of GIS
38 pages
Unit 1
No ratings yet
Unit 1
26 pages
(Insert Project, Program or Organization Name Here) : State of Michigan Test Strategy
No ratings yet
(Insert Project, Program or Organization Name Here) : State of Michigan Test Strategy
22 pages
Introducing Spring Batch Slides
No ratings yet
Introducing Spring Batch Slides
20 pages
IP Address and Port Number
No ratings yet
IP Address and Port Number
3 pages
PBI
No ratings yet
PBI
34 pages
Information Security Management Handbook 5th Edition Harold F Tipton instant download
100% (1)
Information Security Management Handbook 5th Edition Harold F Tipton instant download
58 pages
Sopra Steria India - Corporate Presentation - Nasscom
No ratings yet
Sopra Steria India - Corporate Presentation - Nasscom
19 pages
Senior Network Administrator Resume Sample
100% (19)
Senior Network Administrator Resume Sample
3 pages
Ondc For Corporates: India's Ecommerce Platform
No ratings yet
Ondc For Corporates: India's Ecommerce Platform
15 pages
Model Paper SQA - Answers
No ratings yet
Model Paper SQA - Answers
6 pages
The Role of Logistics in MC Donald's
No ratings yet
The Role of Logistics in MC Donald's
4 pages
EMCO Network Inventory Data Sheet
No ratings yet
EMCO Network Inventory Data Sheet
10 pages
Mini Project REPORT
No ratings yet
Mini Project REPORT
39 pages
Ay Ki Ro Glasscubes OR CASE STUDY
No ratings yet
Ay Ki Ro Glasscubes OR CASE STUDY
10 pages
Antonio Ortega
No ratings yet
Antonio Ortega
3 pages

Lec7 Logging

Uploaded by

Lec7 Logging

Uploaded by

6.

824 2006 Lecture 7: Logging

What's the overall topic?

Why aren't synchronous meta-data updates enough?

Example: FFS rename

You can push tree update one step farther.

What are the reasons to use logging?

naive re-do log

re-do with checkpoint

un-do/re-do with checkpoint

careful disk writing

why is logging fast?

You might also like