Ext4 Forensics: Journaling

 



Modern filesystems commonly employ journaling to safeguard data integrity. A journal acts as a write-ahead log: it records all pending changes to file-system metadata (and, depending on the mode, data) before those changes are applied to their final locations on disk. This design ensures that, after a sudden crash or power loss, the file system can be quickly returned to a consistent state simply by replaying the journal during the next mount. In ext4, the journal’s primary role is crash recovery. When the file system is found to be “out of sync” on boot, the kernel (via the JBD2 layer) replays any uncommitted transactions, preventing corruption that would otherwise require lengthy fsck scans.


Journaling itself is marked in the ext4 superblock as a compatible feature (EXT4_FEATURE_COMPAT_HAS_JOURNAL). Older kernels and tools can still mount the volume even if they do not fully understand the journal. If the INCOMPAT_JOURNAL_DEV is set, the journal can be located on another device described by its UUID, defined in the superblock. The main file-system superblock sets s_journal_inum to zero and stores the UUID of the external journal device in s_journal_uuid. The external device itself carries its own superblock with the incompatible journal_dev flag. This arrangement keeps the journal data entirely off the main volume, freeing space and allowing the journal to reside on faster storage such as an SSD or RAID array.


The internal journal is a regular (but hidden/system-reserved) file using inode number 8 in nearly all ext4 filesystems with journaling enabled. You can see this in tune2fs -l output under "Journal inode: 8". While inode 8 is the default and overwhelmingly common case, it is theoretically possible (via special tools or corruption) to have a journal inode placed elsewhere as defined in the superblock. The first 68 bytes of the journal inode are replicated in the ext4 superblock. The superblock (at offset 0xE0) includes fields like s_journal_inum, s_journal_dev, s_journal_uuid, and a backup array s_jnl_blocks[17]. These replicate the beginning of the journal inode's on-disk structure (the i_blocks array and related fields) for bootstrap/bootstrapping and recovery purposes when the actual inode might be inaccessible. The journal is not a special on-disk structure; it is just an ordinary inode + data blocks, marked as hidden/reserved (no directory entries point to it, and it's protected by the kernel). The journal is often large enough (default 128 MB on bigger filesystems, but adjustable) to fill or nearly fill one block group. mke2fs (and mkfs.ext4) places it roughly in the middle of the device when possible to reduce average seek time for journal I/O on rotational (HDD) disks (less relevant on SSDs).


The journaling process follows a strict commit protocol. All pending updates are first written to the journal as descriptor blocks, data/revocation blocks, and finally a commit block. Only after the commit block is safely on disk does ext4 consider the transaction complete and begin writing the changes to their permanent locations. Ext4’s journal is a fixed-size, circular log—whether stored internally (as the hidden inode 8) or externally. Once the journal fills, new transactions simply wrap around and overwrite the oldest committed entries. Because journaling operates at the block level, even a tiny change (such as updating a single inode) causes the entire 4 KB block containing that inode to be copied into the journal. This design simplifies recovery but also creates a rich forensic artifact.


For digital forensic analysts, the ext4 journal is a time machine. Because it retains copies of recently modified metadata blocks—often including inodes, directory entries, and extent trees—investigators can recover deleted files whose inode copies have not yet been overwritten, reconstruct earlier versions of files, and build precise timelines of system activity. Although the journal is relatively small (default sizes range from 128 MB to 1 GB) and older entries are eventually recycled, it frequently holds evidence that no longer exists anywhere else on the live file system. There are three journaling modes available in ext4:


  • Writeback mode (data=writeback): Only metadata is journaled. File data can be written to disk at any time — before, during, or after the metadata is committed to the journal — with no guarantees about ordering. This is indeed the least reliable for data integrity. In a crash:
    • Metadata remains consistent (filesystem structure is protected).
    • But file contents can become corrupted or show stale/old data (e.g., a partially appended file might appear larger than it is on disk, with garbage/trash at the end, or old file versions could unexpectedly reappear after recovery). This mode offers the best performance (no extra data flushes), but it's riskier and not recommended for most use cases.
  • Ordered mode (data=ordered—the default in ext4): Only metadata is journaled, but the filesystem ensures that associated file data blocks are written to disk before the corresponding metadata is committed to the journal. This provides a logical ordering guarantee without journaling the data itself. In a crash:
    • If appending to a file, incomplete new data is typically purged/reverted (the file reverts to its old size/content, avoiding garbage at the end).
    • For overwrites (replacing existing content), corruption is possible: the file can end up in a half-updated intermediate state (some old blocks, some new, mixed unpredictably—especially if disk hardware reorders writes). Neither the fully old nor the fully new version may be recoverable, as old data isn't preserved anywhere. This mode balances good performance (faster than full journaling) with reasonable protection against many common corruption scenarios—better than writeback, but not as safe as journal mode.
  • Journal mode (data=journal—sometimes called full/data journaling): Both metadata and file data are written to the journal first, then (after commit) to their final locations on the filesystem. In a crash: The journal replay ensures files end up with either the old complete version or the new complete version — no torn/intermediate states or garbage. This provides the strongest data integrity. Performance: Data is written twice (journal + final location), so it's usually the slowest mode (especially for write-heavy workloads). However, if the journal is on a fast device or has sufficient space, sequential journal writes can sometimes offer better throughput in specific scenarios.


Forensic examination of a system can be affected by journaling. In full data journaling mode (data=journal), both metadata and file data blocks are copied to the journal before being written to their final location. This creates a temporary "backup" of old versions or deleted/overwritten content in the journal, which forensic tools can carve or analyze for recovery—even after deletion or overwrite on the main filesystem. Many forensic papers and tools exploit this for recovering previous file versions or deleted data without scanning the entire disk. In the other two modes, file data is usually not in the journal (so content recovery is limited or impossible from there), but metadata copies (inodes, directory entries, timestamps like mtime/ctime/atime, file names via directory records) often persist in the journal. This provides evidentiary value—e.g., historical MAC times, evidence of deletion timestamps, or prior file existence—even if the actual file contents are gone or unrecoverable.

Journaled filesystems perform journal replay (replaying pending transactions to reach a consistent state) during mount if the filesystem appears "dirty" (e.g., from improper shutdown). This replay can involve writing changes to the main filesystem structures—even on a supposedly read-only (ro) mount in some cases. This is a well-known forensic pitfall, as it modifies the evidence. Forensic best practices require special mount options mount -o ro,noload to prevent this. Without them, mounting can trigger unwanted writes (replay/rollback of incomplete transactions), altering timestamps, superblocks, or other metadata. If the journal isn't replayed (e.g., using noload to preserve evidence), the mounted view of the filesystem may show inconsistencies—missing recent changes, incomplete operations, or an "incorrect" state compared to what the live system would have after recovery. This can make files appear missing, truncated, or with outdated metadata until the journal is applied. In forensics, examiners often deliberately avoid replay to preserve the "as-found" state, accepting potential inconsistencies in exchange for no alteration.


Ext4 Journal Layout

All filesystem metadata (and sometimes data, depending on the mount mode like data=journal) updates are grouped into atomic transactions. Every journal block includes a transaction ID in its header (h_sequence field), which serves as the sequence number for that transaction. Journal blocks are one of the following types:


  • Administrative (descriptor blocks, commit blocks, revocation blocks) — contain control information, tags, checksums, etc.
  • File system update data (data blocks) — contain the actual copies of filesystem blocks (usually metadata) being updated/journaled.


There are five types of blocks that a journal could have, the first four are administrative and are known as: Superblock, Descriptor, Commit and Revoke blocks. The fifth type is the one that stores the metadata or data that is recorded in the journal depending on the journal operation mode. Each administrative block type holds information related to its type, but all four administrative blocks share the same format on the first 12 bytes. The fifth type can be either metadata blocks that holds copies of the inodes being modified in the file system, if the journal is using ordered/write back modes and content blocks if the journal is in journaled mode.


A transaction typically begins with a descriptor block (block type 1), which has: (a) the common journal header including h_sequence (transaction ID); (b) an array of block tags listing the target on-disk block numbers (final locations) for each subsequent data block, plus flags/checksums/etc. Immediately after the descriptor come the actual metadata/data blocks (verbatim copies of the filesystem blocks being updated, as listed in the descriptor tags). The transaction ends with a commit block (block type 2), which also carries the same h_sequence value. This commit block acts as the "atomic seal" — its presence (and valid checksum) means the entire transaction is durable and can be replayed during recovery. The journal is a continuous circular log: after one transaction's commit block, the next transaction's descriptor block (or revocation block) follows immediately. Multiple transactions are chained in sequence.


Figure 1:  Ext4 Journal transaction overview


The ext4 filesystem (via the JBD2 journaling layer) begins its journal with a superblock that stores essential metadata, including the current transaction sequence number, the starting block of the valid log tail, and pointers to locate active transactions in the circular journal log. Due to the journal's circular nature, new transactions wrap around and overwrite old checkpointed ones, so the first valid descriptor block may appear anywhere after the superblock rather than at the beginning. Each transaction starts with one or more descriptor blocks (containing tags that point to the on-disk locations of modified filesystem blocks), followed by the actual metadata and/or data blocks being journaled, and concludes with a commit block that includes checksums and a timestamp to confirm completion. If a descriptor block fills up during a large transaction, additional descriptor blocks follow, all sharing the same sequence number. Revoke blocks (or revocation records within them) are used during normal operation to list filesystem blocks that should not be replayed from earlier transactions—typically when a block is freed, overwritten, or reallocated—preventing stale data from corrupting the filesystem during recovery; these revokes carry a sequence number, and during replay, a block is skipped if it appears in a revoke record from a transaction with an equal or higher sequence number. On crash or unclean shutdown, recovery scans the journal in multiple passes: first to find the log end, second to collect all revoke records into a table (mapping revoked blocks to transaction IDs), and third to replay only committed transactions (those with a valid commit block and matching checksums) while respecting the revoke table to avoid applying superseded changes. Incomplete transactions—those lacking a commit block or failing checksum validation—are simply discarded and not replayed, ensuring the filesystem returns to a consistent state without creating new revoke blocks during recovery for aborted transactions.


The journal data is all written in big-endian ordering. This is quite unusual in ext (and in file systems in general!). Every superblock, commit, revoke and descriptor block begins with the same 12-byte journal header. The structure of this header is shown in the table below.


Offset

Size

Name

Description

0x00

4 bytes

h_magic

jbd2 magic number, 0xC03B3998.

0x04

4 bytes

h_blocktype

Field describing the block type of the current block:

 

  • 1 → Descriptor. This block precedes a series of data blocks that were written through the journal during a transaction.
  • 2 → Block commit record. This block signifies the completion of a transaction.
  • 3 → Journal superblock, v1. The default action of the first version of the journal is to record all data updates.
  • 4 → Journal superblock, v2. The second version records only metadata updates. A mount option can force a second version journal to record all data updates.
  • 5 → Block revocation records. This speeds up recovery by enabling the journal to skip writing blocks that were subsequently rewritten.

0x08

4 bytes

h_sequence

The transaction ID that goes with this block.


The journal superblock is the first block in the journal log, and it holds important metadata information. It is recorded as struct journal_superblock_s, which is 1024 bytes long. The superblock has the fields given in the following table.


Offset

Size

Name

Description

0x00

12 bytes

s_header

Common header identifying this as a superblock (See the preceding table above).

0x0C

4 bytes

s_blocksize

Journal device block size.

0x10

4 bytes

s_maxlen

Total number of blocks in this journal.

0x14

4 bytes

s_first

First block of log information.

0x18

4 bytes

s_sequence

First commit ID expected in log.

0x1C

4 bytes

s_start

Block number of the start of the log. If zero, the journal is clean.

0x20

4 bytes

s_errno

Error value, as set by jbd2_journal_abort().

The remaining fields are only valid in a version 2 superblock.

0x24

4 bytes

s_feature_compat

Compatible feature set. Only one possible value, 0x01, meaning that checksums are enabled.

0x28

4 bytes

s_feature_incompat

Incompatible feature set. Possible values include:

  • 0x01 - Journal has block revocation records.
  • 0x02 - Journal can deal with 64-bit block numbers.
  • 0x04 - Journal commits asynchronously.
  • 0x08 - Version 2 checksum.
  • 0x10 - Version 3 checksum.

0x2C

4 bytes

s_feature_ro_compat

Read-only compatible feature set. There aren't any of these currently.

0x30

16 bytes

s_uuid[16]

128-bit UUID for journal. This is compared against the copy in the ext4 super block at mount time.

0x40

4 bytes

s_nr_users

Number of file systems sharing this journal.

0x44

4 bytes

s_dynsuper

Location of dynamic super block copy. (Not used)

0x48

4 bytes

s_max_transaction

Limit of journal blocks per transaction. (Not used)

0x4C

4 bytes

s_max_trans_data

Limit of data blocks per transaction. (Not used)

0x50

1 byte

s_checksum_type

 

 Checksum algorithm type (e.g., CRC32C).

 

0x51

3 bytes

s_padding2[3]

Padding for alignment.

0x54

4 bytes

s_num_fc_blocks

Number of fast-commit blocks (newer feature).

0x58

4 bytes

s_head

Current head of the journal (when empty).

0x5C

160 bytes

s_padding[40]

Padding (reserved).

0xFC

4 bytes

s_checksum

Checksum of the superblock (computed with this field zeroed).

0x100

768 bytes

s_users[16*48]

 

An array of UUIDs for filesystems sharing the journal (up to 48 entries).


The first thing in a descriptor block is journal_header_s, which has the magic number, h_blocktype field in the header is set accordingly to identify the block as the descriptor block. Following the header, there will be an array of journal block tags that store the final location of data blocks of the transaction.


Offset

Type

Description

0x00

journal_header_t

Common block header.

0x0C

struct journal_block_tag_s or struct journal_block_tag3_s

Enough tags either to fill up the block or to describe all the data blocks that follow this descriptor block.

The superblock must be consulted to process the descriptor blocks. Specifically, it is needed to determine if the version 3 checksum and 64-bit block flags are set. If JBD2_FEATURE_INCOMPAT_CSUM_V3 is set in an incompatible feature set, then journal_block_tag3_s will be used; otherwise, journal_block_tag_s will be used.

Old journal descriptor block structure (journal_block_tag_s)

Offset

Size

Name

Description

0x00

4 bytes

t_blocknr

Lower 32 bits of the location of where the corresponding data block should end up on disk.

0x04

2 bytes

t_checksum

Truncated checksum of the journal UUID, the sequence number, and the data block.

0x06

2 bytes

t_flags

Flags that go with the descriptor. Possible values include:

  • 0x01 - On-disk block is escaped. The first four bytes of the data block just happened to match the jbd2 magic number.  In this situation, the four bytes are cleared when it is written to the journal.
  • 0x02 - This block has the same UUID as the previous, therefore the UUID field is omitted.
  • 0x04 - The data block was deleted by the transaction.
  • 0x08 - This is the last tag in this descriptor block.

0x08

4 bytes

t_blocknr_high

Upper 32 bits of target block number (0 if 64-bit feature disabled).

Version 3 journal descriptor block structure (journal_block_tag3_s)

Offset

Size

Name

Description

0x00

4 bytes

t_blocknr

Lower 32-bits of the location of where the corresponding data block should end up on disk.

0x04

4 bytes

t_flags

Flags that go with the descriptor. Possible values include:

  • 0x01 - On-disk block is escaped. The first four bytes of the data block just happened to match the jbd2 magic number.  In this situation, the four bytes are cleared when it is written to the journal.
  • 0x02 - This block has the same UUID as the previous, therefore the UUID field is omitted.
  • 0x04 - The data block was deleted by the transaction.
  • 0x08 - This is the last tag in this descriptor block.

0x08

4 bytes

t_blocknr_high

Upper 32-bits of the location of where the corresponding data block should end up on disk.  

0x0C

4 bytes

t_checksum

Checksum of the journal UUID, the sequence number, and the data block.


Data blocks are written verbatim to the journal immediately following a descriptor block. To prevent accidental misinterpretation during recovery, if the first four bytes of a data block happen to equal the JBD2 magic number (0xC03B3998), they are replaced with zeros on disk, and the escape flag (0x0001) is enabled in the descriptor block’s tag for that entry. During journal replay, the original magic value is then restored.


The commit block serves as a marker confirming that an entire transaction has been fully written into the journal. Only after this commit block is successfully persisted to the journal can the associated data blocks be copied (checkpointed) to their permanent locations on the filesystem. The commit block does not guarantee that the data has already been written to final locations — it only guarantees that the journal copy is complete and safe. In data=ordered mode (the default), file data is written to its final location before the metadata is journaled and the commit block is issued. In data=writeback mode, file data can be written at any time (no strict ordering). In data=journal mode, both data and metadata go through the journal first. The actual writing of journaled blocks to their home locations is called checkpointing, which can happen later (often lazily) to free up space in the journal. The commit block is described by struct commit_header, which is 32 bytes long (but uses a full block).


Offset

Size

Name

Description

0x00

12 bytes

journal_header_s

Common header

0x0C

1 byte

h_chksum_type

The type of checksum to use to verify the integrity of the data blocks in the transaction. One of:

  • 0x01 - CRC32
  • 0x02 - MD5
  • 0x03 - SHA1

0x0D

1 byte

h_chksum_size

The number of bytes used by the checksum. Most likely 4.

0x0E

2 bytes

h_padding[2]

Padding

0x10

32 bytes

h_chksum[JBD2_CHECKSUM_BYTES]

32 bytes of space to store checksums.

0x30

8 bytes

h_commit_sec

The time that the transaction was committed, in seconds since the epoch.

0x38

4 bytes

h_commit_nsec

Nanoseconds component of the above timestamp.


A revocation block starts with the standard journal_header_t (which includes the magic number, block type JBD2_REVOKE_BLOCK, sequence number, etc.) followed by a list of block numbers that are being revoked. They are at least 16 bytes in length. However, revocation blocks always occupy one full journal block (typically 4 KiB, matching the filesystem block size). After the 16-byte header comes a variable-length array of revoked block numbers (4 bytes or 8 bytes each, depending on whether 64-bit block numbers are enabled), and the very end of the block usually contains a tail/checksum structure.


Offset

Size

Name

Description

0x00

12 bytes

journal_header_t

Common block header

0x0C

4 bytes

r_count

Number of bytes used in this block.

0x10

4 bytes or 8 bytes

blocks[0]

Blocks to revoke.


A revoke applies to transactions whose sequence number is equal to or less than the sequence number of the revoke record. Let’s look at a journal from a file system. We can view the contents using debugfs as shown below.



We see the signature (0xC03B3998) in bytes 0x00-0x03, and bytes 0x4-0x07 show that this block has a type of 4, which is the version 2 superblock. Bytes 0x08-0x0B show the sequence number is 0, and bytes 0x0C-0x0F show the journal block size is 1,024 bytes (0x0400). Bytes 0x10-0x13 show that there are 4,096 blocks in the journal, and 0x14-0x17 show that the journal entries start in journal block 1. To identify the first transaction in the journal, we refer to bytes 0x18-0x1B to see that the first sequence number is 5 (0x0005), and bytes 0x1C-0x1F show that it is in block 1. If the first transaction is in block 0, it is because the file system was cleanly unmounted and all transactions are complete; hence, there are no valid transactions in the journal. In bytes 0x28-0x2B, we see that the incompatible features (0x12 = 0x02 + 0x10) are set. This means that the JBD2_FEATURE_INCOMPAT_64BIT and JBD2_FEATURE_INCOMPAT_CSUM_V3 features are set. Thus, the journal_block_tag3_s block descriptor will be used. At bytes 0x30-0x3F, we see the 128-bit UUID for the journal - 7B4747CF2AEB4961A9A9C58C7652701D. At bytes 0xFC-0xFF, we see the checksum of the superblock - 0x497D.


We now examine the contents of journal block 1. Keep in mind that this is not file system block 1; this is the block inside the journal file. We can view this jcat as shown below.



We see from the type value in bytes 0x04-0x07 that this is a descriptor block, and its sequence number is 5 (0x05). The first descriptor entry starts at byte 0x0C, and we can observe that the filesystem block being modified and hence copied to the journal is 292 (0x124). The flags field in bytes 0x10-0x13 is 0x00, which means that no special flags are set for the transaction. It represent a transaction, without any additional conditions or modifiersWe can examine the commit block now, but we need to determine the appropriate commit block to examine based on the sequence number 5 as follows.



 We can examine journal block 5 for the commit block as seen in the figure below.



Bytes 0x04-0x07 show us that it is a commit block (0x02), and bytes 0x08-0x0B show us that its sequence number is 5 (0x05). Having the same sequence number 5 in the journal superblock, descriptor, and commit block is normal and healthy. It indicates that transaction 5 is properly recorded as a complete, committed transaction. The jls tool in TSK will display the contents of the journal. Here is the output of our forensic image.





Post a Comment

Previous Post Next Post