Ext4 Forensics: Extents

 


Earlier versions of the extended file systems used a traditional Unix-style mapping where each file's inode pointed to individual data blocks via direct pointers and single/double/triple indirect blocks. This worked well for small-to-medium files on smaller disks but became very inefficient as hard drive capacities grew into hundreds of GB (and then TB) and large files (videos, VM images, and databases) became common. With indirect blocks, locating and accessing file data often requires traversing extra layers of indirection (single, double, or triple indirect pointers). Moreover, because the file system tracks every individual block separately—even when they are physically contiguous—the mapping structures grow very large, creating substantial metadata overhead and performance costs during operations like reads, writes, and lookups. This per-block tracking approach also makes it harder to keep allocations contiguous, leading to increased fragmentation on disk. The severity of that fragmentation largely depends on how intelligently the block allocator can place new blocks. For these very reasons, extents are used in ext4.


Ext4 Extents

Extents address these issues by storing data in contiguous ranges of blocks with a single compact entry (start block + length), dramatically reducing metadata size, minimizing indirection, improving I/O efficiency, and naturally encouraging better contiguous allocation to limit fragmentation. An extent is a single descriptor that represents a range of contiguous physical blocks. Extents are similar to cluster runs in the NTFS file system—essentially, they specify an initial block address and the number of blocks that make up the extent. A file that is fragmented will have multiple extents, but ext4 tries very hard to keep files contiguous. Extents are more efficient at mapping data blocks of large contiguous files, as their structure generally consists of the address of the first physical data block, followed by a length. The Ext4 extent structure is shown in the figure below.


Figure 1: Ext4 extent structure. Originally appeared in Mathur et al. (2007)

In ext4, each extent is represented using a 96-bit (12-byte) structure within the inode or extent tree. It contains a 32-bit logical block number (ee_block) which identifies the offset into the file on which the block run begins; a 16-bit length field, (ee_len) where the most significant bit is reserved for flagging uninitialized/preallocated extents; and a 48-bit physical block number (ee_start_hi + ee_start_lo) which indicates the file system block on which the extent begins. ee_start_hi denotes the high 16 bits of the physical block, while ee_start_lo denotes the low 32 bits of the physical block. If the most significant bit of ee_len is set (ee_len > 32768), it is uninitialized; the actual length is ee_len - 32768. If the most significant bit is clear (ee_len <= 32768), it is initialized. For initialized extents, the usable length is 15 bits, allowing a single extent to cover up to 2¹⁵ contiguous blocks—equivalent to approximately 128 MB with a 4 KB block size. Uninitialized extents allow space to be reserved without zeroing blocks immediately; when read, the virtual file system (VFS) returns zero-filled data to applications.


An inode can store up to four extents directly in its i_block area (which repurposes the old indirect block pointers from ext2/ext3). This setup suffices for smaller or mostly contiguous files. When a file becomes more fragmented or grows significantly, requiring more than four extents, ext4 builds an extent tree (often called an extent HTree) whose root is stored in the inode. This is a high-fanout, constant-depth tree similar to a B+ tree: The root resides in the inode and includes an extent header; intermediate (index) nodes contain pointers to child nodes; leaf nodes hold the actual extent descriptors, each pointing to a range of contiguous physical disk blocks. This tree organization enables efficient lookups and scalability for very large or highly fragmented files while keeping metadata compact and access fast. Ext4 tries to avoid fragments whenever possible; therefore, it is unusual to have many levels in the tree.


Figure 2: Ext4 extent tree structure. Originally appeared in Mathur et al. (2007).


A comparison of the ext3 and ext4 inode structures reveals that ext4 places its extent data in the same 60-byte region (i_block[EXT4_N_BLOCKS] at offset 0x28) previously occupied by ext3's block pointers. This 60-byte area begins with a 12-byte extent header (ext4_extent_header). The header includes four 16-bit (2-byte) fields—eh_magic ( magic number 0xF30A); eh_entries (The number of entries following the extent header); eh_max (the maximum number of entries that might follow the extent header.); eh_depth (depth of the node)followed by a 32-bit (4-byte) eh_generation (generation of the tree) field. The magic number is designed to differentiate between different extent implementations. As new features are added, the magic number can change to ensure backwards compatibility with older implementations. The eh_depth field exactly determines this: (i) Depth > 0 → interior/index node → contains ext4_extent_idx entries pointing to child blocks; (ii) Depth == 0 → leaf node → contains ext4_extent entries describing contiguous data blocks. The extent header's magic number is fixed at 0xF30A, as defined in the ext4 kernel source code.


Offset

Size

Name

Description

0x00

0x2

eh_magic

Value 0xF30A.

0x02

0x2

eh_entries

Number of extent entries.

0x04

0x2

eh_max

Maximum number of entries.

0x06

0x2

eh_depth

Depth of this extent tree. If this value is 0 the extents point to data blocks. Any value greater than zero means that the extents are pointing to other extent-tree blocks.

0x8

0x4

eh_generation

Generation of the tree.


After the extent header (ext4_extent_header), the entries follow. These are:


  • ext4_extent_idx (index entries, 12 bytes each) when eh_depth > 0 — these point to lower-level nodes in the tree (other index or leaf blocks).
  • ext4_extent (actual extent entries, also 12 bytes each) when eh_depth == 0 — these are leaf nodes that point directly to file data blocks.


The extent index (ext4_extent_idx) serves as an intermediate layer in the extent tree hierarchy. Rather than directly referencing file data blocks, it points to another block that holds additional structures, which ultimately lead to the actual data through further pointers. These index structures are also 12 bytes in size, though only 10 bytes are actively used: The 4-byte ei_block field specifies the starting logical block number in the file that the index covers. The combined ei_leaf_lo (4 bytes) and ei_leaf_hi (2 bytes) fields form a 48-bit pointer to the file system block containing the next level of the tree (either another index node or a leaf node with actual extents). Because Ext4 limits physical block addressing to 48 bits, the high-order 16 bits are stored in ei_leaf_hi, with the remaining lower 32 bits in ei_leaf_lo.


Offset

Size

Name

Description

0x00

0x4

ei_block

This extent index covers blocks from this point onwards. A value of zero signifies the start of the file.

0x04

0x4

eh_leaf_lo

Lower 32 bits of the block containing next extent node in the tree.

0x08

0x2

eh_leaf_hi

Higher 16 bits of the block containing next extent node in the tree.

0xA

0x2

eh_unused

Not in use.


As earlier stated, if the extent header has a depth of 0, it will contain the leaf extent node (ext4_extent), which describes the blocks used for file content. Each ext4_extent includes (i) ee_block (4 bytes) field which describes the starting logical block number covered by the extent; (ii) ee_len which describes the number of blocks in the contiguous range (up to 32767 for initialized; higher values indicate uninitialized/preallocated extents); (iii) the ee_start_hi + ee_start_lo fields — combine to form the 48-bit starting physical block number (hi 16 bits + lo 32 bits). The first block of a file will always start from logical block 0, but have a completely different physical disk location. The logical start block (ee_block) is defined to the extent that is necessary in order to organize the fragments correctly.


Offset

Size

Name

Description

0x00

0x4

ee_block

First logical file block of this extent

0x04

0x2

ee_len

The length of the extent in blocks

0x06

0x2

ee_start_hi

Higher 16 bits of the extent physical start block

0x08

0x4

ee_start_lo

Lower 32 bits of the extent physical start block


For an Ext4 extent tree to be considered valid, it must follow strict structural rules:


  • Every node in the tree—including the root stored within the inode—begins with an extent header (ext4_extent_header).
  • When the tree depth exceeds zero, the inode's root contains extent indexes (the starting point of the tree).
  • Interior (index) nodes store one or more ext4_extent_idx entries, each directing to a child node.
  • These child pointers lead to leaf nodes (at depth 0), which hold the concrete ext4_extent entries that map logical file blocks to physical disk locations.
  • The tree's depth value (recorded in the header) decrements by one with each level traversed downward. The inode stores the maximum depth for the entire tree.
  • Within any leaf node, the ee_block fields of extents must appear in strictly ascending order, and the covered block ranges (accounting for each extent's length in ee_len) must not overlap.
  • In index nodes containing multiple entries, the ei_block values must also increase monotonically to enable efficient binary search.
  • The overall block ranges claimed by an index node should align with and be verifiable against the combined ranges present in its descendant leaf nodes.


After the last extent/index entry in a non-inode extent block (i.e., in allocated extent tree blocks, not the root in the inode), there is a 4-byte eb_checksum field (at the end of the 4 KB block, typically in the last 4 bytes of slack space). This is a CRC32C checksum computed over: the filesystem UUID (from the superblock) + inode number + inode generation + the entire extent block contents up to, but not including, the checksum field itself. It is optional/not always present (depends on the metadata_csum feature being enabled in the superblock). For the root extents stored directly in the inode's i_block (the first up to 4 extents), no separate checksum is needed because the entire inode is already checksummed (via the inode checksum feature). The checksum helps detect corruption or tampering in extent tree blocks.


Let us examine the hex dump of inode 2540873 of our ext4 system to understand its extent information. The first step is to get its inode table location as follows:




Then we dump the block 9964660 with the following command and locate the inode offset 0x0800 within the block.


sudo dd if=/dev/sda3 bs=4096 skip=9964660 count=1 | xxd




If we look at the first two bytes of this inode, we can see the value 0x81FF. The four most significant bits are 1000, which means it is a regular file (S_IFREG). The extent tree starts in inode byte offset 0x28, starting with a header. The header includes the 0xF30A magic for extents, it contains one valid extent (eh_entries) as revealed the two bytes following the magic value. The maximum number of extents that can follow the extent header (eh_max) is 0x0004 (= 4). The extent tree has a depth of 1. Consequently, the extent header is followed by ext4_extent_idx instances. The generation field is not in use. The figure below showcases the highlighted ext4_extent_idx.


Examining the extent index above shows that this extent index refers to the start of the file’s content (logical block number is zero). The physical block number at which the next node is found is 0x988130. Let us examine the content of this block.



From the figure above, we observe that the first 12 bytes correspond to the extent header. The magic value is 0xF30A as expected. This informs the analyst that this node contains 6 (0x0006) entries from a maximum of 340 (0x154). Each of these extents is associated with a specific range of data blocks and contains a pointer that indicates the location of the first data block within its respective extent. The depth of this node is zero, indicating that we reached leaf block of the extent tree. As previously mentioned, in a leaf block, the entries following the extent header are ext4_extent structures. Below  is the hexdump of the same block, but this time, the first three ext4_extent entries are highlighted.



The first extent has a logical block value of 0. This extent encompasses 14,336 (0x3800) blocks, with the first block address pointing to the 16,558,080th (0xFCA800) block. Therefore, the 14,336 consecutive blocks starting from the 16,558,080th block belong to this extent, and they represent actual data blocks.


In the second extent, the logical block value is 14336 (0x3800), indicating that the first 14336 blocks are part of the previous extent. This extent comprises 24,576 (0x6000) blocks, and the first block of this extent is located at the 16,580,608th (0xFD0000) block.


The third extent has a logical block value of 38912 (0x9800), which is the sum of the blocks present in the previous two extents (14336 + 24576). This extent encompasses 20,480 (0x5000) blocks, and its first block is situated at the 16,650,240th (0xFE1000) block.


Similarly, the remaining three ext4_extent entries provide information about the extents associated with the inode, allowing for the determination of the corresponding data blocks and their respective ranges.


In the ext4 file system, the extent tree organizes how data blocks are mapped for a file. For a tree with depth 1, the inode's i_block field (which is 60 bytes) contains an ext4_extent_header followed by an ext4_extent_idx structure. This index points to an intermediate block that holds the actual ext4_extent entries, which in turn reference the data blocks. Thus, there is one layer of indirection between the inode and the file's data.


If the tree depth increases to 2, two levels of intermediate blocks (using ext4_extent_idx structures) would exist between the i_block field and the leaf nodes containing the ext4_extent structures that point to the data blocks.


In the special case of depth 0 (a flat or leaf-only tree), no intermediate index blocks are needed. Here, the i_block area directly stores the ext4_extent_header (12 bytes) followed by up to four ext4_extent structures (each 12 bytes, totaling 48 bytes). This fills the entire 60-byte space exactly. These ext4_extent entries point straight to the file's data blocks.


When a file requires a fifth extent, ext4 promotes the tree: it allocates a new intermediate block, increases the depth (typically to 1), moves the existing extents into that block, and updates the root node in i_block to use an ext4_extent_idx pointing to the new intermediate block. This variable-depth design in ext4 enables efficient block mapping and fast access to data, scaling from small files (handled directly in the inode with no extra indirection) to very large files (using deeper trees as needed) while keeping metadata overhead low.


Forensic tools and researchers use the magic number 0xF30A as a reliable signature to carve for potential ext4 inodes, especially since ext4 inodes lack a unique static magic across all cases. This works because the root extent header (when the extents feature is used and the EXT4_INODE_EXTENTS flag is set) appears at a fixed offset in the inode. It's particularly useful for recovering metadata (timestamps, size, ownership, and extent pointers) even from deleted or partially overwritten inodes, as the magic often persists longer than other zeroed fields. In deeper extent trees (when files are large/fragmented), extent index nodes (internal tree nodes) and leaf extent headers are stored in regular data blocks (pointed to by higher-level indexes). These also begin with the same 12-byte header containing 0xf30a. A simple byte-pattern search for 0xf30a will therefore hit both: (i) Valid inode-embedded headers (at inode-table locations + 0x28 offset); (ii) Non-inode extent headers in allocated/unallocated data blocks. This causes false positives—carved "inodes" may actually be extent index/leaf nodes from files' extent trees, not inode structures themselves. Advanced carving (e.g., AFEIC method) adds filters like checking i_mode file type bits, timestamps, link counts, or extent flags to reduce false positives. As Ext4 inodes do not store filenames—names live only in directory entries within directory data blocks (which are themselves referenced via extents)—carving isolated inodes recovers file metadata and content pointers but loses the path/name association unless additional work is done. Nevertheless, forensic methods exist to link recovered inodes back to filenames, which include the following:


  • Carve directory inodes separately (using the same magic + filters for i_mode indicating a directory).
  • Parse their extent trees to recover directory data blocks.
  • Extract directory entries (name + inode number pairs).
  • Match recovered inode numbers from file inodes to those in directory entries.


Tools like ext4magic, Sleuth Kit (with custom carving), or research prototypes (e.g., AFEIC integrated into Sleuth Kit) attempt this linkage. Success depends on how much metadata remains unzeroed/unoverwritten (ext4 zeros key inode fields on deletion, but directory entries may linger longer).







Post a Comment

Previous Post Next Post