Journaling or write-ahead logging
Last Updated :
12 Jul, 2025
Journaling, or write-ahead logging is a sophisticated solution to the problem of file system inconsistency in operating systems. Inspired by database management systems, this method first writes down a summary of the actions to be performed into a "log" before actually writing them to the disk. Hence the name, "write-ahead logging". In the case of a crash, the OS can simply check this log and pick up from where it left off. This saves multiple disk scans to fix inconsistency, as is the case with FSCK. Good examples of systems that implement data journaling include Linux ext3 and ext4 file systems, and Windows NTFS. Data Journaling: A log is stored in a simple data structure called the journal. The figure below shows its structure, which comprises of three components.
- TxB (Transaction Begin Block): This contains the transaction ID, or the TID.
- Inode, Bitmap and Data Blocks (Metadata): These three blocks contain a copy of the contents of the blocks to be updated in the disk.
- TxE (Transaction End Block) This simply marks the end of the transaction identified by the TID.
As soon as an update is requested, it is written onto the log, and thereafter onto the file system. Once all these writes are successful, we can say that we have reached the checkpoint and the update is complete. What if a crash occurs during journaling ? One could argue that journaling, itself, is not atomic. Therefore, how does the system handle an un-checkpointed write ? To overcome this scenario, journaling happens in two steps: simultaneous writes to TxB and the following three blocks, and then write of the TxE. The process can be summarized as follows.
- Journal Write: Write TxB, inode, bitmap and data block contents to the journal (log).
- Journal Commit: Write TxE to the journal (log).
- Checkpoint: Write the contents of the inode, bitmap and data block onto the disk.
A crash may occur at different points during the process of journaling. If a crash occurs at step 1, i.e. before the TxE, we can simply skip this transaction altogether and the file system stays consistent. If a crash occurs at step 2, it means that although the transaction has been logged, it hasn't been written onto the disk completely. We cannot be sure which of the three blocks (inode, bitmap and data block) were actually updated and which ones suffered a crash. In this case, the system scans the log for recent transactions, and performs the last transaction again. This does lead to redundant disk writes, but ensures consistency. This process is called redo logging. Using the Journal as a Circular Buffer: Since many transactions are made, the journal log might get used up. To address this issue, we can use the journal log as a circular buffer wherein newer transactions keep replacing the old ones in a circular manner. The figure below shows an overall view of the journal, with tr1 as the oldest transaction and tr5 the newest.
The super block maintains pointers to the oldest and the newest transactions. As soon as the transaction is complete, it is marked as "free" and the super block is updated to the next transaction.
The benefits of journaling, or write-ahead logging, in file systems, are as follows:
- Improved Recovery Time: It ensures quick recovery after a crash occurs, all actions are logged prior to being written on disk on file for examination. This eliminates the need to perform lengthy disk scans or consistency checks.
- Enhanced Data Integrity: It ensures data integrity by maintaining the consistency of the file system. By writing the actions to the journal before committing them to the disk, the system can ensure that updates are complete and recoverable. In case of a crash, the system can recover by referring to the journal and redoing any incomplete transactions.
- Reduced Disk Scans: It minimizes the need for full disk scans to fix file system inconsistencies. Instead of scanning the entire disk to identify and repair inconsistencies, the system can rely on the journal to determine the state of the file system and apply the necessary changes. This leads to faster recovery and reduced overhead.
Similar Reads
Thomas Write Rule in DBMS The Thomas Write Rule (TWR) is an extension of the Basic Timestamp Ordering Protocol used in DBMS to manage concurrent transactions while maintaining data consistency.Unlike basic TO, which rolls back outdated writes, TWR ignores them if they donât affect the final result. This boosts concurrency wi
4 min read
Log based Recovery in DBMS Log-based recovery in DBMS ensures data can be maintained or restored in the event of a system failure. The DBMS records every transaction on stable storage, allowing for easy data recovery when a failure occurs. For each operation performed on the database, a log file is created. Transactions are l
10 min read
ChandyâLamport's global state recording algorithm Each distributed system has a number of processes running on a number of different physical servers. These processes communicate with each other via communication channels using text messaging. These processes neither have a shared memory nor a common physical clock, this makes the process of determ
3 min read
Operating System Debugging Debugging is the activity of finding and fixing errors in a system, both in hardware and in software. Debugging can also include performance tuning, which seeks to improve performance by removing processing bottlenecks(points in a system or process where the flow or speed of movement is limited or c
4 min read
Storage Structure in Operating Systems Basically we want the programs and data to reside in main memory permanently. This arrangement is usually not possible for the following two reasons: Main memory is usually too small to store all needed programs and data permanently. Main memory is a volatile storage device that loses its contents w
2 min read
Stable-Storage Implementation in Operating system By definition, information residing in the Stable-Storage is never lost. Even, if the disk and CPU have some errors, it will never lose any data. Stable-Storage Implementation: To achieve such storage, we need to replicate the required information on multiple storage devices with independent failure
3 min read