Indexes are used to store groups of attributes in a sorted order. One of the most commonly encountered index structures in NTFS is the directory. In this case, several $FILE_NAME attributes are stored in the index. Directories in NTFS are indexed to facilitate faster retrieval of specific entries. They are stored in a B-tree in alphabetical order. A tree is a group of data structures called nodes that are linked together, forming a hierarchy with a head node and branches extending to other nodes.
![]() |
| Figure 1: A tree with ten nodes |
Consider Figure 1(A) in which node A serves as the root of the tree. Node A has two children, B and C, while node B further branches into nodes D and E. In tree terminology, a parent node is any node that references one or more subordinate nodes, whereas a child node is any node referenced by a parent. Accordingly, A is the parent of B and C, which are its children. A leaf node is defined as a node with no children; in this example, nodes F, G, H, I, and J are leaf nodes. Node with same parent nodes, meaning there are two or more nodes that derived from the same nodes, are called “siblings”. The depth of a node is the length of unique path from the root to that node, which is equal to the depth of the deepest leaf. Height of a node is the length of longest downward path to a leaf from that node. All leaves are at height 0, and its parents are at height 1, and so on. The structure shown is a binary tree, as each node is limited to a maximum of two children.
Figure 1(B) depicts the same tree structure, now populated with values at each node. Value lookup begins at the root node and proceeds by comparing the target value with the current node’s value. If the target value is smaller, traversal continues to the left child; if larger, traversal continues to the right child. For example, to locate the value 42, it is first compared with the root value of 50, prompting a move to the left child. The comparison with 30 results in traversal to the right child, followed by a comparison with 35, which again directs traversal to the right. The value 42 is then found after only four comparisons. In contrast, locating the value 98 in an unsorted linear list could require up to ten comparisons, whereas the binary tree enables the same lookup in only three comparisons. If indexing is done linearly (instead of in a tree structure), the process would be slow and cumbersome to the system as it would require more power to process it. This minimizes the number of disk reads necessary to pull in the data during lookups.
B-Tree
A B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. It differs from binary trees in that a node can have more than two children. B-trees are specifically designed for storage systems that read and write large blocks of data (such as disks), as they keep the tree shallow by allowing a high number of children per node, which reduces the number of disk accesses.
The order of a B-tree, denoted as m, is defined as the maximum number of children that any node can have. Thus, a B-tree of order m is an m-way search tree with the following properties:
- All leaves are at the same level. This ensures the tree remains perfectly balanced with consistent height and performance.
- The keys in each node are stored in increasing (sorted) order.
- Every node (except the root) has at most m children and at least ⌈m/2⌉ children.
- A node with k children contains exactly k − 1 keys. Therefore:
- The maximum number of keys in a node is m − 1.
- The minimum number of keys in a non-root node is ⌈m/2⌉ − 1.
- The root is a special case: it has at most m children. If the root is not a leaf, it has at least 2 children (or it may have none if the tree consists only of the root).
Note: ⌈x⌉ is the ceiling function — the smallest integer greater than or equal to x. Examples: ⌈3⌉ = 3, ⌈3.35⌉ = 4, ⌈1.98⌉ = 2, ⌈5.01⌉ = 6, ⌈7⌉ = 7.
Figure 2 illustrates a B-tree where file names are stored as values rather than numeric keys. The root node (Node A) contains three values and has four child nodes. To locate the file ggg.txt, we first compare its name with the values in the root node and observe that, alphabetically, it falls between eee.txt and iii.txt. Based on this ordering, we follow the corresponding pointer to Node C. Examining the values in Node C reveals the entry for ggg.txt.
![]() |
| Figure 2: A B-tree with file names as values. |
Now, let’s add some complexity by examining how values are inserted and removed. This is a critical concept because it helps explain why recovering deleted file names in NTFS can be challenging. Assume that each node can hold only three file names and that a new file, jjj.txt, is added. At first glance, this appears straightforward; however, in practice, it results in the removal of two nodes and the creation of five new nodes. When we determine where jjj.txt belongs, we see that it should be placed at the end of Node C, after iii.txt. The top portion of Figure 3 illustrates this state. However, this insertion causes Node C to contain four names, exceeding its capacity of three. To resolve this, Node C is split into two nodes, ggg.txt is promoted to the parent level, and two new nodes, F and G, are created to hold the redistributed values. The final structure is shown at the bottom of Figure 3.
.
![]() |
| Figure 3: The top tree shows ‘jjj.txt’ added to node C, and the bottom tree is the result of removing node C because each node can have only three names. |
Unfortunately, Node A now contains four values, exceeding its capacity. To correct this, the node is split, and ggg.txt is promoted to a new top-level (root) node. The resulting structure is shown in Figure 4. As a result of inserting a single file, Nodes A and C are removed, and Nodes F, G, H, I, and J are created. Any residual data that may have remained in Nodes A and C from previously deleted files is likely to have been overwritten and is no longer recoverable.
![]() |
| Figure 4: The final state from adding the ‘jjj.txt’ file. |
Now consider deleting the file zzz.txt. This operation simply removes the file name from Node E and does not require any additional structural changes. Depending on the specific implementation, metadata associated with zzz.txt may still remain within the node and could potentially be recovered. A more complex scenario occurs when fff.txt is deleted. In this case, Node F becomes empty and must be rebalanced. To restore the tree’s structure, eee.txt is moved from Node I to Node F, and bbb.txt is moved from Node B to Node I. These adjustments maintain a balanced tree, ensuring that all leaf nodes remain at the same depth from Node H. The resulting configuration is shown in Figure 5.
![]() |
| Figure 5: Tree after deleting the ‘zzz.txt’ file and the ‘fff.txt’ file. |
Node B will still contain bbb.txt in its unallocated space because the entry was moved to Node I rather than truly removed. As a result, a forensic analysis tool may report bbb.txt as deleted, even though it was not. The file name was merely relocated as part of the rebalancing operation triggered by the deletion of fff.txt.
We mentioned earlier that the B-tree is designed specifically for storage systems that read and write large blocks of data. In practice, the classic B-tree is seldom implemented exactly as described in textbooks. What is far more common — especially in databases and file systems — is a closely related variant called the B+tree.
NTFS implements its directory indexes (the $I30 index using the $INDEX_ROOT and $INDEX_ALLOCATION attributes) using a structure that closely resembles a B+tree. Understanding this distinction helps with accurate filesystem analysis and digital forensics.
The main difference between a classic B-tree and a B+tree lies in where the actual records are stored. In a traditional B-tree, both internal nodes and leaf nodes can contain keys along with data or pointers to records. As a result, records can appear at any level of the tree. In a B+tree, however, all data records reside exclusively in the leaf nodes. Internal nodes contain only key values (which may be duplicated) and child pointers, serving solely as navigation structures that guide searches down the tree. These internal keys are effectively redundant from a data-storage perspective, as the actual records always reside at the leaf level. Both structures are generalizations of binary search trees: each node can hold multiple keys and more than two child pointers. Each child pointer delimits a key range, directing the search to the appropriate subtree.
NTFS benefits from the B+tree-like design because internal nodes do not store full data entries. This allows more keys to fit in each disk block (index record), resulting in a shallower tree, reduced disk I/O, and faster lookups. From a forensic perspective, this means that directory entries containing evidentiary value (the $FILE_NAME attributes) are always located in the leaf-level index entries. Additionally, in B+tree designs, leaf nodes are logically ordered and often linked together. This enables efficient sequential traversal. In NTFS, this design allows directory listings to be read efficiently by walking the leaf nodes in a single logical pass. If a pure classic B-tree were used instead, records could be scattered across internal and leaf nodes, making full directory enumeration and forensic reconstruction more complex. In summary, while NTFS documentation and tools often refer to the structure simply as a B-tree, its actual implementation for directory indexes exhibits the key advantages and characteristics of a B+tree.
NTFS Directory Indexing
Having introduced the general concept of B-trees, we can now examine how NTFS uses them to implement indexing. In NTFS, indexes are primarily used to organize file and subdirectory names within a directory. This structure enables rapid lookup of directory entries instead of performing slow linear scans. Each node in the index tree consists of a sequence of index entries. For directory indexes (the $I30 index), the key value in each index entry is a filename, represented by the $FILE_NAME attribute structure. These index entries are stored sequentially within the node and are kept in sorted (alphabetical) order. The list of index entries in a node is terminated by a special empty entry (also called the terminator or last entry). This special entry marks the end of the valid index entries in that node and typically contains minimal data with a flag indicating it is the final entry.
![]() |
| Figure 6: NTFS directory structure when indexing in use |
NTFS stores directory index data using the $INDEX_ROOT and $INDEX_ALLOCATION attributes (along with a $BITMAP attribute for larger directories) within the directory’s MFT record. These attributes, which together implement the B-tree structure for the directory, are collectively known as the $I30 index. $I30 is not an actual file on disk; it is simply the name assigned to these index-related attributes. The three attributes involved are:
- $INDEX_ROOT (attribute type 0x90)
- $INDEX_ALLOCATION (attribute type 0xA0)
- $BITMAP (attribute type 0xB0)
The $INDEX_ROOT attribute is always resident (stored directly in the MFT record) and contains the root node of the directory’s B-tree index. It begins with the standard NTFS attribute header, followed by the INDEX_ROOT header and then the index node header, which describes the list of index entries stored in the root. For small directories, all entries fit inside the $INDEX_ROOT. For larger directories, additional nodes are stored as fixed-size index records (also called index buffers, typically 4 KB) within the non-resident $INDEX_ALLOCATION attribute.
![]() |
| Figure 7: Data structure of the $INDEX_ROOT attribute |
The root index node consists of one or more index entry structures. Each normal index entry contains a $FILE_NAME structure that represents a file or subdirectory within the directory. The index entry stores two primary components:
- The filename (from the $FILE_NAME attribute), which serves as the index key.
- The MFT file reference number, which acts as a pointer to the corresponding MFT record of that file or subdirectory.
Index entries are arranged sequentially and kept in sorted (alphabetical) order within the node. The list is terminated by a special empty index entry (also called the terminator or last entry). This empty entry does not contain a $FILE_NAME structure, has no key, and marks the end of the valid entries in the node. The data structure for an index entry is shown below.
![]() |
| Figure 8: Data structure for “index entry” (specifically for directory) |
When a directory grows too large for all its index entries to fit inside the resident $INDEX_ROOT attribute, NTFS creates a non-resident $INDEX_ALLOCATION attribute to store the additional nodes of the B-tree. The $INDEX_ALLOCATION attribute holds the extra index nodes (stored as fixed-size index records, typically 4 KB each) that cannot fit in the root. Unlike $INDEX_ROOT, which is always resident, the $INDEX_ALLOCATION attribute is always non-resident and follows the standard layout of a non-resident NTFS attribute (including data runs that point to the actual index records on disk). For small directories, the $INDEX_ALLOCATION attribute does not exist — all indexing information is contained entirely within the $INDEX_ROOT structure. The data structure of the $INDEX_ALLOCATION attribute is shown below.
![]() |
| Figure 9: Data structure for $INDEX_ALLOCATION attribute |
The $INDEX_ALLOCATION attribute (type 0xA0) is created when a directory index grows beyond the capacity of the resident $INDEX_ROOT attribute. It stores the additional (overflow) nodes of the B-tree on disk in the form of one or more fixed-size Index Records. Each Index Record represents a single node in the NTFS B-tree (or B+tree-like) structure. It contains a sequence of Index Entry structures — the same format used in the $INDEX_ROOT attribute. In directory indexes, these are commonly known as $I30 entries. Each normal entry contains a filename (from the $FILE_NAME attribute) as the sorting key, along with an MFT file reference that points to the corresponding MFT record.
The size of each Index Record is defined in the $INDEX_ROOT attribute header and is typically 4096 bytes (4 KB), which usually matches the cluster size on most NTFS volumes. An Index Record begins with an Index Record Header, followed by an Index Node Header, and then the sequence of Index Entries (terminated by a special empty entry). Together, these records form the non-resident portion of the directory index, allowing NTFS to efficiently scale directory lookups using the B-tree structure.
.
![]() |
| Figure 10: Data structure of Index Record |
From a forensic perspective, the $INDEX_ROOT attribute is highly significant. It contains the root node of the directory index and, in small directories, may hold all directory entries without requiring the $INDEX_ALLOCATION attribute.
Each index entry provides a direct link between a human-readable filename (stored in an embedded $FILE_NAME structure) and the corresponding MFT record. This creates a valuable mapping between visible directory listings and the underlying file system metadata. Empty index entries (the terminator entries that mark the end of the valid entry list) are especially noteworthy. Although they serve as structural markers, the surrounding slack space within the index node often contains remnants of previously allocated index entries. These remnants can survive file deletions and rebalancing of the B-tree.
Consequently, the $INDEX_ROOT attribute — along with index records in $INDEX_ALLOCATION when present — serves as a rich source of evidence for reconstructing directory activity, recovering traces of deleted or overwritten files, identifying file creation and deletion timelines, and detecting potential anti-forensic techniques.
To demonstrate how NTFS implements tree-based indexing, we examine the root directory of an NTFS volume. The figure below presents a hexadecimal view of the index-related attributes stored in MFT entry 5, which represents the root directory of the volume and contains the initial index structures used for directory enumeration.
![]() |
| Figure 11: INDEX attributes in MFT entry 5 (NTFS volume’s root directory) |
In this analysis, we focus on three attributes, beginning with the $INDEX_ROOT attribute. The attribute header is shown in the image below.
| Figure 12: INDEX_ROOT attribute header |
As shown in the figure above, the first four bytes of the attribute header contain the value 0x00000090 (144 decimal), identifying the attribute as an $INDEX_ROOT. The total length of the attribute is 0x00000058 (88 decimal), as indicated by bytes relative offsets 0x04–0x07 of the attribute header. The attribute content begins at byte offset 0x0020 (32 decimal), as specified in relative offset 0x20–0x21 of the header. This offset marks the start of the $INDEX_ROOT structure, also referred to as the $INDEX_ROOT header, which contains the metadata and index node information used to describe the root of the directory index. The entire data structure is interpreted as follows.
|
Relative offsets |
Description |
Values |
|
0x00-0x03 |
Attribute ID |
0x00000090 |
|
0x04-0x07 |
Length of Attribute |
0x00000058 = 88 (i.e., from absolute byte 0x170-0x1C7 in Figure 11) |
|
0x08 |
Resident/Non-Resident Flag |
0x00 = resident |
|
0x09 |
Length of Name of Attribute |
0x04 = 4 |
|
0x0A-0x0B |
Offset to Name of Attribute |
0x0018 = 24 (i.e., absolute offset 0x170 + 0x18 = 0x188 in Figure 11) |
|
0x0C-0x0D |
Flags |
0x0000 = normal |
|
0x0E-0x0F |
Not Yet Known |
0x0006 = possible ID |
|
0x10-0x13 |
Length of Attribute content |
0x00000038 = 56 (From absolute offset 0x190-0x1C7 in Figure 11) |
|
0x14-0x15 |
Offset to the Start of Attribute content |
0x0020 = 32 (i.e., absolute offset 0x170 + 0x20 368 = 0x190 in Figure 11) |
|
0x16 |
Indexed flag |
0x00 = not indexed |
|
0x17 |
Padding to 8-byte boundary |
0x00 |
Relative offsets 0x18-0x1F contain the details of this name, which is “$I30” in Unicode.
To extract the contents of an $INDEX_ROOT attribute, we can use icat and supply the 144 type as shown. Notice that it is the same as the attribute content as contained in Figure 11. Bytes 0x00-0x0F (highlighted red) constitute the INDEX_ROOT content header.
![]() |
| Figure 13: INDEX_ROOT attribute content |
Bytes 0x00-0x03 identify the type of entry as listed in the $AttrDef file. In this case, we note that 0x30000000 is a FILE_NAME type. Bytes 0x04-0x07 are reported to be an indication of the collation sorting rule. This byte sequence identifies the rule that is to be used to sort the following index entries. If the type is File Name, as is the case here, then the rule COLLATION_FILENAME is used. The collation sorting rule can be any of the following:
- 0x00000000 → Binary
- 0x00000001 → File Name
- 0x00000002 → Unicode String
- 0x00000010 → Unsigned Long
- 0x00000011 → SID
- 0x00000012 → Security Hash
- 0x00000013 → Multiple Unsigned Long
Bytes 0x08-0x0B contain the size of the Index Allocation Entry. The value here of 0x00001000 is equal to 4096 decimal. A scan of the sample volume reveals that “INDX” files, which are directory listings held as non-resident data in blocks on the “user” area of the disk, are 4096 bytes long. Any further space that is required by the file is allocated in blocks of the same size, which may or may not be contiguous.
Byte 0x0C is declared to be the Number of Clusters per Index Record. It is known from the BPB at byte offset 0x13 that the cluster size on this disk is eight sectors, and thus this value of 0x01 (1 decimal) equates to 8× 512 = 4096 bytes. The remaining three bytes, offsets 0x0D-0x0F, are padding.
The $INDEX_ROOT attribute contains an Index node, which is the root node of the B-Tree describing the root directory here. The Index Node starts with an Index header (highlighted yellow in Figure 11), immediately following the end of the $INDEX_ROOT Header.
Byte offsets 0x10-0x13 record the offset to the first index entry, which is relative to start of node header, each index entry containing a “$FILE_NAME” structure. In this case the value is 0x00000010 (16 decimal). This value, therefore, points to offset 0x10 + 0x10 = 0x20.
The allocated size of the Index Entries is recorded in byte offsets 0x18-0x1B. The value is again 0x00000028, which is equal to 40 decimal. It means the area containing index entries starts at byte offset 0x20 and ends at byte offset 37 (highlighted orange in Figure 13). Clearly, all the allocated index entries' space is occupied here.
Byte offset 0x1C-0x1F is reported to be a flag that signals whether the Index is a “Small” index, which will fit within the MFT record, or a “Large” index that requires an external Index Allocation Unit. In this case, 0x00000001 signals that this is a “Large” index, whereas 0x00000000 would signal it was a “Small” index. Experiments have shown that part directory listings may be retained within the MFT record both with and without external “INDX” buffer files. Where both types are present (internal and external) this flag will be set to 0x00000001. Where there are no entries in the listing (an empty directory), or entries are resident only, this flag will be set to 0x00000000.
The index entries can be examined to identify stored file names and to determine whether child index nodes exist. Figure 14 shows the index entries located in the $INDEX_ROOT attribute.
| Figure 14: Index Entries in $INDEX_ROOT attribute |
The eight bytes at relative offsets 0x00-0x07 point to the MFT Base File Reference for this item. The two bytes at relative offsets 0x08-0x09 are stated to be the value of the length of the Index Entry. The value here is 0x0018, which is 24 decimal. The two bytes at offsets 0x0A-0x0B contain the length of the $FILE_NAME attribute associated with the index entry. In this case, the value is 0x0000, meaning no file name is stored. This confirms that the entry is an empty index entry, used only for structural purposes.. The byte at offset 0x0C is a flag byte with values as follows:
- 0x00 → Child node.
- 0x01 → Child node in INDEX_ALLOCATION.
- 0x02 → Last entry.
- 0x03 → Last Entry, Child node in INDEX_ALLOCATION.
The flag value 0x03 is present in this case. This value is a combination of two flags 0x00000003 = 0x01 || 0x02:
- 0x01 – Indicates that the index entry has a child node.
- 0x02 – Indicates that this is the last index entry in the node.
The remaining three bytes at offsets 0x0D-0x0F are probably padding. This is the end of the Index Entry Header and the start of the Index Entry content. Because the entry references one child node, the last 8 bytes of the structure contain the Virtual Cluster Number (VCN) of that child. Here, the VCN value is 0, indicating that the child index node is located at VCN 0 within the $INDEX_ALLOCATION attribute.
If the Index Flag ‘Last Entry’ at relative offset 0x0C is not set (in the directory of interest), the Index node entry will have the following structure:
|
Relative byte offset |
Length |
Description |
|
0x00 |
6 BYTES |
MFT record number for file name or unused (index) |
|
0x06 |
WORD |
MFT record sequence number or unused (index) |
|
0x08 |
WORD |
Length of Node Entry |
|
0x0A |
WORD |
Length of content (length of Filename attribute) |
|
0x0C |
BYTE |
Index Flags |
|
0x10 |
6 BYTES |
MFT Record number of the parent directory |
|
0x16 |
WORD |
MFT Record Sequence number of the parent directory |
|
0x18 |
LONGLONG |
File Create time |
|
0x20 |
LONGLONG |
File Modified time |
|
0x28 |
LONGLONG |
MFT Record Modified time |
|
0x30 |
LONGLONG |
File Last Accessed time |
|
0x38 |
LONGLONG |
File Allocated Size |
|
0x40 |
LONGLONG |
File Real Size |
|
0x48 |
DWORD |
File Type Flags |
|
0x4C |
DWORD |
$EA buffer size needed or $Reparse_Point Tag |
|
0x50 |
BYTE |
Filename Length (Number of Unicode characters) |
|
0x51 |
BYTE |
File name type (Namespace) |
|
0x52 |
FileName length * 2 |
File name |
The second attribute is the $INDEX_ALLOCATION attribute, shown in the figure below. Records will only have an $Index_Allocation attribute if the $Index_Root attribute header has its flag set to ‘0x00000001’ (Index points to $Index_Allocation Child Nodes). This attribute is always non-resident, and its content is stored in data runs, which point to the actual storage location for all sub-nodes of the B+tree that implements a directory index.
![]() |
| Figure 15: MFT entry 5’s $INDEX_ALLOCATION attribute |
The start of an $INDEX_ALLOCATION Attribute is signalled by the identifier 0x000000A0 (160 decimal) at relative offsets 0x00-0x03. As with all attributes, this $INDEX_ALLOCATION Attribute starts with an Attribute Header, the layout of which is identical to the attribute header that we encountered and deconstructed in the $INDEX_ROOT attribute. The entire data structure is interpreted as follows.
|
Relative offsets |
Description |
Values |
|
0x00-0x03 |
Attribute ID |
0x000000A0 |
|
0x04-0x07 |
Length of Attribute |
0x00000050 = 80 (i.e., from absolute byte 0x1C8-0x217 in Figure 11) |
|
0x08 |
Resident/Non-Resident Flag |
0x01 =non-resident |
|
0x09 |
Length of Name of Attribute |
0x04 = 4 |
|
0x0A-0x0B |
Offset to Name of Attribute |
0x0040 = 64 (i.e., absolute offset 0x1C8 + 0x40 = 0x208 in Figure 11) |
|
0x0C-0x0D |
Flags |
0x0000 = normal |
|
0x0E-0x0F |
Not Yet Known |
0x0008 = possible ID |
|
0x10-0x17 |
Starting VCN |
0x0000000000000000 = 0 |
|
0x18-0x1F |
Last VCN |
0x0000000000000000 = 0 |
|
0x20-0x21 |
Offset to the Data Runs |
0x0048 = 72 (i.e., absolute offset 0x1C8 + 0x48 = 0x210 in Figure 11) |
|
0x22-0x23 |
Compression unit size |
0x0000 = 0 |
|
0x24-0x27 |
Padding to 8-byte boundary |
0x00000000 |
|
0x28-0x2F |
Allocated Size of Attribute |
0x0000000000001000 = 4096 |
|
0x30-0x37 |
Real Size of Attribute |
0x0000000000001000 = 4096 |
|
0x38-0x3F |
Initialized size of Stream |
0x0000000000001000 = 4096 |
|
0x40-0x47 |
Attribute Name |
$I30 |
|
0x48-0x4F |
Data Run Descriptor |
11 01 2C 00 00 00 00 00 |
It should be noted that bytes 0x36 and 0x37 are shown in Figure 15 as 0x0500, respectively. These are, of course, the Update Sequence Number for this record, and they thus need to be replaced, before analysis, by the appropriate values from the Update Sequence Array. In this case, the values are 0x0000, and this is what is included in the value for the Real Size of the Attribute entry in the table above.
Next, we analyze the extracted data runs to determine which disk clusters store the B+tree structures used by the root directory’s index sub-nodes. The first byte of the data run is 0x11. This byte is split into two 4-bit values:
- Lower nibble (0x1) → size of the run length field.
- Upper nibble (0x1) → size of the starting cluster offset field.
This indicates that one byte is used to describe the run length and one byte is used to store the starting cluster number. The next byte (0x01) represents the run length, meaning the run spans one cluster. This confirms that only a single cluster is allocated to the $INDEX_ALLOCATION attribute, consistent with earlier observations. The following byte (0x2C, decimal 44) specifies the starting cluster number. Therefore, the data for this index allocation begins at Cluster 44. At this point, the first data run has been fully parsed. Since only one cluster is listed, the entire $INDEX_ALLOCATION attribute resides in Cluster 44. The next byte in the sequence is 0x00, which marks the end of the data run list, indicating that no additional runs exist. In summary, the $INDEX_ALLOCATION attribute for this directory occupies a single cluster—Cluster 44—which contains the B+tree index data for the directory.
![]() |
| Figure 16: Data runs in the $INDEX_ALLOCATION attribute |
Next, we obtain and analyze the contents of Cluster 44 which contains one index record or index node. The Figure below shows part of hex dump of Cluster 44.
![]() |
| Figure 17: Hex dumpof the beginning of Cluster 44 containing INDX header, index node header, and first index entry |
As shown in the figure above, the index record begins with the signature 0x494E4458 (“INDX”) at bytes 0-3, identifying it as an NTFS index record. Bytes 4-5 represent the offset to the Update Sequence Array (0x0028 = 40), and bytes 6-7 represent the size of the Update Sequence Array (0x0009 = 9). Bytes 8-15 indicate the $LogFile Sequence Number (0x0000000000103BF2). Bytes 16-23 represent the Virtual Cluster of the index record in the index allocation (0x0000000000000000). Please refer to Figure 10 for the full data structure. The first 24 bytes constitute the INDX header, which is immediately followed by the index node header. The first four bytes of the index node header contain the offset to the first index entry. In this example, the value 0x00000040 indicates an offset of 0x40 (64 bytes) from the start of the index node header. When combined with the 24-byte INDX header, the first index entry therefore begins at byte offset 0x58 (88 decimal) from the start of the index record.
The first eight bytes of the index entry represent the MFT file reference number associated with the filename stored in the entry. In this case, the value 0x0400000000000000 corresponds to MFT entry 0x000000000004, which maps to the NTFS system file $AttrDef. At offsets 0x08–0x09 within the index entry, the entry length is stored. Here, the value 0x68 indicates a total index entry size of 104 bytes. Offsets 0x0A–0x0B specify the size of the embedded $FILE_NAME attribute, which in this case is 0x52 (82 bytes). Finally, bytes 0x0C–0x0F contain the index entry flags. A value of 0x00000000 indicates that this entry does not point to a child node, meaning it is a leaf entry in the B-tree structure.
The $FILE_NAME structure, which contains the name of the file referenced by the first index entry, begins at byte offset 104. This offset is calculated as 88 + 16, where 88 bytes represent the start of the first index entry and 16 bytes correspond to the size of the index entry header. The first 8 bytes of the $FILE_NAME content (05 00 00 00 00 00 00 05 00) represent the MFT file reference number of the parent directory. The lower six bytes contain the actual MFT entry number, which in this case is 0x000000000005 (5). MFT entry 5 corresponds to the NTFS root directory, confirming that the referenced file resides in the root of the volume.
At byte offset 66 within the $FILE_NAME structure, the file name field begins. NTFS stores filenames in Unicode, using two bytes per character. The value at byte offset 64 (0x08) specifies the length of the filename in characters, which in this case is 8. Following this length field, the Unicode characters representing the filename appear (highlighted by the dotted line in Figure 17). Decoding these characters reveals the filename “$AttrDef”.
In this example, a total of 13 index entries are present. For brevity, we omit a detailed discussion of most of these entries, including $BadClus (second entry), $Bitmap (third entry), $Boot (fourth entry), $Extend (fifth entry), $LogFile (sixth entry), $MFT (seventh entry), $MFTMirr (eighth entry), $Secure (ninth entry), $UpCase (tenth entry), $Volume (eleventh entry), and the root directory (.) entry (twelfth entry). Instead, we proceed directly to the final index entry and examine it in detail, as illustrated in the figure below.
![]() |
| Figure 18: Hex dump of the ending of Cluster 44 containing the last two index entries |
- File name → canada.txt
- MFT reference number of the file → MFT entry 35
- MFT reference of parent directory → MFT entry 5
- Has child node? → No.
The B-tree index structures for the root directory in our sample image is shown in the figure below. These entries are organized in alphabetical order by the name of the files included.
![]() |
| Figure 19: B-tree index structures for the root directory |
The third attribute is the $BITMAP attribute. When an index becomes too large to fit in the resident $INDEX_ROOT, NTFS stores additional entries in the $INDEX_ALLOCATION attribute and uses a $BITMAP attribute to track allocation. The $BITMAP attribute maintains a bit-level map in which each bit represents the allocation state of a corresponding index buffer within $INDEX_ALLOCATION, typically 4 KB in size. This structure enables NTFS to efficiently manage index growth and reuse freed index records.
This structure is used by named indexes such as $I30, $O (Object ID), $R (Reparse Points), and the indexes maintained by $Secure. The $BITMAP attribute itself may be resident or non-resident depending on its size, rather than the presence of a stream name. The $BITMAP attribute has a type identifier of 0xBO (176) and is organized by bytes, and
each bit corresponds to an index record.
![]() |
| Figure 20: MFT entry 5's $BITMAP attribute |
As shown in the figure above, the attribute flags at relative offset 0x08 are set to 0x00, indicating that the attribute is resident. The attribute content begins at a relative offset of 0x0020 (32 decimal), as specified by the value in relative offsets 0x14–0x15 (20 00). The total length of the attribute content is 0x08 bytes, as indicated by relative 0x10–0x13 (08 00 00 00). The attribute content (relative offset 0x32-0x390 is extracted as follows.
![]() |
| Figure 21: MFT entry 5's $BITMAP attribute content |
Except for the first bit (bit 0) of the first byte, all other bits in the attribute content are zero. This indicates that only the first index node (Index Record #0) is currently in use. It is important to note that a root node always exists in a B-tree, stored in the $INDEX_ROOT attribute; however, the root node is not represented in the $BITMAP. The $BITMAP attribute tracks the allocation status of sub-nodes within the B-tree index only.
As files are added to the directory, the listings are initially built within the MFT record itself. When no further entries can be added because all space within the MFT record has been used, the MFT directory listing is changed from resident to non-resident. When this occurs, the MFT record is truncated, and some of the Index Entries in the Index Root attribute are overwritten by two new attributes which define the external storage: the $INDEX_ALLOCATION attribute and the $BITMAP attribute. It should be noted that, in some cases, directory listings are present both in the MFTrecord itself and in an external “INDX” file. The data between the end of the logical record and the end of the record space is known as MFTRecord Slack. Such data could be of significant value in a forensic investigation, as it may be the only record of the file or files having been present on the machine.
References
- Brian Carrier → File system forensic analysis.
- Xiaodong Lin → Introductory computer forensics.




















Post a Comment