Internals of PostgreSQL Wal
Internals of PostgreSQL Wal
PostgreSQL
Contents
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
REDO Definition
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
Jargons
To guarantee above,
- Each data page (either heap or index) is marked with the
LSN (log sequence number --- in practice, a WAL file location) of
the latest XLOG record affecting the page.
- Before the bufmgr can write out a dirty page, it must ensure
that xlog has been flushed to disk at least up to the page's LSN.
UPDATE
UPDATE score
SET
runs = 104,
wickets = 5
WHERE team = 'AUS';
DELETE
DELETE FROM score
WHERE team = 'AUS';
Data : tupleid
WALInsertLock
This lock is used to insert transaction log record contents into
transaction log memory buffer. First this lock is taken then whole
contents including full buffers (if full_page_writes is on) are copied
into log buffers.
Other places where this lock is used
- During flush of log buffers to check if there are any more
additions to log buffer since last it is decided till log buffer flush
point.
- To determine the Checkpoint redo location
- During Online backup to enforce Full page writes till the
backup is finished.
- to get the current WAL insert location from built in function.
WALWriteLock
This lock is used to write transaction log buffer data to WAL file.
After taking this lock all the transaction log buffer data upto predecided point will get flushed.
Places where it get used are
- Flush of transaction log which can be due to Commit, Flush
of data buffers, truncate of commit log etc.
- During Switch of Xlog.
- During get of new Xlog buffer, if all buffers are already
occupied and not flushed.
- Get the time of the last xlog segment switch
Block 0
1. Seg Hdr
2. Block Header
3. WAL Records
Each WAL record
has header.
WAL 1, 2 ,3
Block 1
1. Block Header
2. WAL Records
3. Each WAL record
has header.
WAL 4,5
Block 2
1. Block Header
2. WAL Records
3. Each WAL record
has header.
WAL 5,6,7,8
Block 255
1. Block Header
2. WAL Records
3. Each WAL record
has header.
WAL m,n,
Async Commit
In this mode, the WAL data gets flushed to disk after predefined
time by a background WAL writer process.
To protect the data page partial write, the first WAL record
affecting a given page after a checkpoint is made to contain a copy
of the entire page, and we implement replay by restoring that page
copy instead of redoing the update.
Each WAL record contains CRC and checking the validity of the
WAL record's CRC will detect partial write.
Each WAL page contains a magic number, the validity of which is
checked after reading each page.
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
XLogRecord
Fixed size log record header which sits in the beginning of each log
record.
XLogRecData
when the buffer is backed up, it does not insert the data pointed to
by this XLogRecData struct into the XLOG record
BkpBlock
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
Advantages/Disadvantages Of PG Implementation
Advantages
1. One of the Advanced features of PostgreSQL is it its ability to perform
transactional DDLs via its Write Ahead Log design.
2. Removing holes of data page and then write to WAL will have less I/O if
pages are not full.
3. WAL data written for Insert and Delete operation is lesser than systems
having UNDO (Oracle).
4. During ASync Commit, writing data only in blocks ensures less usage of
I/O bandwidth.
5. Keeping Log Sequence Number on each page ensures that during dirty
page flush Buffer Manager doesnot need to wait for Flush of WAL until
necessary.
Advantages/Disadvantages Of PG Implementation
Disadvantages
1. Flushing data pages during Commit can be heavier.
2. Update operation writes whole row in WAL even if 1 or 2
columns are modified. This can lead to increase in overall
WAL traffic.
3. During Async Commit, each time to check tuple visibility it
needs to refer CLOG Buffer/File which is costly.
4. Calculating CRC for each WAL can be costly especially in
case during full data page writes.
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
Block 0
File
Header
Block 1
Redo
Header
Block 2
Block 3
Block 4
Redo
Redo
Redo
Record Records Records
2&3
3&4
1
...
Block M
Redo
Record
N
Redo Records
Each redo record contains undo and redo for an atomic change
Redo
Record
Header
Change
#1
Change
#2
Change
#3
.....
Change
#N
Fields include
Thread
Thread Number
RBA
LEN
Length of record in
bytes
SCN
System Change
Number
Date and Time of
Change
Change Vector
Change Vector
For example
Change Header
Length Vector
16
Change Record 1
20
Change Record 2
48
Change Record 3
28
Change Record 4
29
Change Record 5
Change Record 6
Change Record 7
10
Fields include
CHANGE Change number
TYP
Change type
CLS
Class
AFN
DBA
SCN
SEQ
OP
Operation Code
Transactions
Statements
UPDATE t1
SET c2 = 101
WHERE c1 = 1;
Redo Logs
Undo Header
Undo Header
5.2
Undo
Slot 0 c2 = 100
5.1
Redo
11.5
Slot 0 c2 = 101
UPDATE t1
SET c2 = 201
WHERE c1 = 2;
COMMIT;
Undo
Slot 1 c2 = 200
5.1
SLOT
0
Undo Block
Undo
Slot 0 c2 = 100
Undo
Slot 1 c2 = 200
11.5
Redo
Slot 1 c2 = 201
Commit
5.4
STATUS
10
9
Data Block
SLOT
C1
C2
101
100
201
200
Undo
Redo
UPDATE score
SET
runs = 104,
wickets = 5
WHERE team = 'AUS';
DELETE
DELETE FROM score
WHERE team = 'AUS';
OP 5.1 (11.1)
Delete Row Piece - DRP
OP 5.1 (11.1)
Update Row Piece - URP
OP 5.1 (11.1)
Insert Row Piece - IRP
Slot 4:
Slot 4:
Slot 4:
OP 11.2
Insert Row Piece - IRP
c1: 100
c2: 4
Slot 4:
OP 11.5
Update Row Piece - URP
c0: 'AUS'
c1: 104
c2: 5
c0: 'AUS'
c1: 100
c2: 4
STOP
UPDATE
Slot 4:
c1: 104
c2: 5
OP 11.3
Delete Row Piece - DRP
Slot 4:
Statements
-- Statement #1
INSERT INTO t1 VALUES (1);
HEADER
5.2
UNDO #1
5.1
REDO #1
11.2
-- Statement #2
INSERT INTO t1 VALUES (2);
UNDO #2
5.1
REDO #2
11.2
-- Statement #3
INSERT INTO t1 VALUES (3);
UNDO #3
5.1
REDO #3
11.2
COMMIT
5.4
COMMIT;
Oracle writes the checksum in the header of the block. Oracle uses
the checksum to detect corruption in a redo log block.
LGWR Process
The log writer process writes one contiguous portion of the buffer
to disk. LGWR write:
A commit record when a user process commits a transaction
Redo log buffers
Every three seconds
When the redo log buffer is one-third full
When a DBWn process writes modified buffers to disk, if
necessary.
LGWR Process
In times of high activity, LGWR can write to the redo log file using
group commits.
After the first transaction's entries are written to the redo log file,
the entire list of redo entries of waiting transactions (not yet
committed) can be written to disk in one operation, requiring less I/O
than do transaction entries handled individually.
During flush of Redo log buffer if redo log buffer is partially filled then
the empty space will be wasted in Redo Log File.
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
Advantages/Disadvantages Of Oracle
Implementation
Advantages
Advantages/Disadvantages Of Oracle
Implementation
Disadvantages
1. There can be lot of space wastage in Redo log
files during high activity in database.
2. Redo of Insert and Delete SQL statements will be
more as
Redo for
can have
Redo
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
Improvements in PostgreSQL
1. For Update operation the amount of WAL required can be reduced
by writing only changed column values and reconstruct full row
during recovery.
2. Flushing Data page contents during Commit by main user process
is costly, other databases does it in background process.
3. We can introduce a concept similar to Group Commits by WAL
writer which can improve performance during high volume of
transactions.
4. Improve the Tuple visibility logic for setting the transaction status in
a tuple during Asynchronous Commits.
5. To improve the writing of same Redo Block again and again if the
transaction data is small.
REDO Definition
Redo Implementation in PostgreSQL
Key Structures Used in PostgreSQL
Advantages & Disadvantages of PostgreSQL Implementation
Redo Implementation in Oracle
Advantages & Disadvantages of Oracle Implementation
Improvements in PostgreSQL
Detailed method for one of the improvements
Only send the changed data to WAL and reconstruct tuple during
recovery.
Reconstruction would need the old tuple data and the new tuple
changed data to reconstruct the row at time of recovery.
It is better to do apply this method when old and new tuple are on
same page, otherwise it need to do I/O during recovery.
Method-1 Contd..
Method-1 Contd..
UPDATE foo SET col2 = 100 WHERE col1 = 1;
will generate diff instructions (assuming 4 byte alignment for now)
COPY 4 (bytes from old to new tuple)
IGNORE 4 (bytes on old tuple)
ADD 4 (bytes from new tuple)
COPY 90 (bytes from old to new tuple)
Method-1 Contd..
With a terse instruction set the diff format can encode the diff
instructions in a small number of bytes, considerably reducing the
WAL volume.
This method of reducing WAL will be applied only if table has fixed
length columns(int,char,float).
Reconstruction would need the old tuple data and the new tuple
changed data to reconstruct the row at time of recovery.
It is better to do apply this method when old and new tuple are on
same page, otherwise it need to do I/O during recovery.
Method-2 Contd..
log the offset, length, value format for changed data to reconstruct
the row during recovery.
As the log format is only for fixed length columns, so during recovery
it can be directly applied at mentioned locations to generate a new
tuple.
This method can also be optimized such that it will log in described
format if all changed columns are before any variable data type
column.
Method-2 Contd..
For Example
CREATE TABLE foo (col1 integer, col2 integer, col3 varchar(50),
col4 varchar(50));
INSERT INTO foo values (1, 1, repeat('abc',15), repeat(def,15));
UPDATE foo SET col2 = 100 WHERE col1 = 1;
offset and length can be stored in 2-3 bytes considering this will be
applied tuples of length less than 2000 bytes.
Thank You