Indexes are used to store groups of attributes in a sorted order. One of the most commonly encountered index structures in NTFS is the directory. In this case, several $FILE_NAME attributes are stored in the index. Directories in NTFS are indexed to facilitate faster retrieval of specific entries. They are stored in a B-tree in alphabetical order. A tree is a group of data structures called nodes that are linked together, forming a hierarchy with a head node and branches extending to other nodes.

Figure 1: A tree with ten nodes

Consider Figure 1(A) in which node A serves as the root of the tree. Node A has two children, B and C, while node B further branches into nodes D and E. In tree terminology, a parent node is any node that references one or more subordinate nodes, whereas a child node is any node referenced by a parent. Accordingly, A is the parent of B and C, which are its children. A leaf node is defined as a node with no children; in this example, nodes F, G, H, I, and J are leaf nodes. Node with same parent nodes, meaning there are two or more nodes that derived from the same nodes, are called “siblings”. The depth of a node is the length of unique path from the root to that node, which is equal to the depth of the deepest leaf. Height of a node is the length of longest downward path to a leaf from that node. All leaves are at height 0, and its parents are at height 1, and so on. The structure shown is a binary tree, as each node is limited to a maximum of two children.

Figure 1(B) depicts the same tree structure, now populated with values at each node. Value lookup begins at the root node and proceeds by comparing the target value with the current node’s value. If the target value is smaller, traversal continues to the left child; if larger, traversal continues to the right child. For example, to locate the value 42, it is first compared with the root value of 50, prompting a move to the left child. The comparison with 30 results in traversal to the right child, followed by a comparison with 35, which again directs traversal to the right. The value 42 is then found after only four comparisons. In contrast, locating the value 98 in an unsorted linear list could require up to ten comparisons, whereas the binary tree enables the same lookup in only three comparisons. If indexing is done linearly (instead of in a tree structure), the process would be slow and cumbersome to the system as it would require more power to process it. This minimizes the number of disk reads necessary to pull in the data during lookups.

B-Tree

A B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time. They differ from binary trees in that there can be more than two children per node. They are designed specifically for storage systems that read and write large blocks of data.

The maximum number of children that a node can have is called the order of the B-Tree. For example, B-Tree of order m means the maximum number of children per node is m. A B-tree of order m is an m-way search tree that has the following properties:

All leaves are on the same level, ensuring balanced tree height and consistent performance.
Key values (data items) in all nodes are in increasing order.
All internal nodes except the root have at most m non-empty children and at least ceil(m/2) non-empty children.
The number of keys (or data values) in each node is one less than the number of its non-empty children. So that m-1 is the maximum number of keys that a node can contain or a minimum of ceil(m/2)-1 keys.
The root has at most m children but may have as few as 2 if it is not a leaf or none if the tree consists of the root alone.

Note that ceil(x) is the so-called ceiling function. It's value is the smallest integer that is greater than or equal to x. Thus ceil(3) = 3, ceil(3.35) = 4, ceil(1.98) = 2, ceil(5.01) = 6, ceil(7) = 7, etc.

Figure 2 illustrates a B-tree where file names are stored as values rather than numeric keys. The root node (Node A) contains three values and has four child nodes. To locate the file ggg.txt, we first compare its name with the values in the root node and observe that, alphabetically, it falls between eee.txt and iii.txt. Based on this ordering, we follow the corresponding pointer to Node C. Examining the values in Node C reveals the entry for ggg.txt.

Figure 2: A B-tree with file names as values.

Now, let’s add some complexity by examining how values are inserted and removed. This is a critical concept because it helps explain why recovering deleted file names in NTFS can be challenging. Assume that each node can hold only three file names and that a new file, jjj.txt, is added. At first glance, this appears straightforward; however, in practice, it results in the removal of two nodes and the creation of five new nodes. When we determine where jjj.txt belongs, we see that it should be placed at the end of Node C, after iii.txt. The top portion of Figure 3 illustrates this state. However, this insertion causes Node C to contain four names, exceeding its capacity of three. To resolve this, Node C is split into two nodes, ggg.txt is promoted to the parent level, and two new nodes, F and G, are created to hold the redistributed values. The final structure is shown at the bottom of Figure 3.

Figure 3: The top tree shows ‘jjj.txt’ added to node C, and the bottom tree is the result of removing node C because each node can have only three names.

Unfortunately, Node A now contains four values, exceeding its capacity. To correct this, the node is split, and ggg.txt is promoted to a new top-level (root) node. The resulting structure is shown in Figure 4. As a result of inserting a single file, Nodes A and C are removed, and Nodes F, G, H, I, and J are created. Any residual data that may have remained in Nodes A and C from previously deleted files is likely to have been overwritten and is no longer recoverable.

Figure 4: The final state from adding the ‘jjj.txt’ file.

Now consider deleting the file zzz.txt. This operation simply removes the file name from Node E and does not require any additional structural changes. Depending on the specific implementation, metadata associated with zzz.txt may still remain within the node and could potentially be recovered. A more complex scenario occurs when fff.txt is deleted. In this case, Node F becomes empty and must be rebalanced. To restore the tree’s structure, eee.txt is moved from Node I to Node F, and bbb.txt is moved from Node B to Node I. These adjustments maintain a balanced tree, ensuring that all leaf nodes remain at the same depth from Node H. The resulting configuration is shown in Figure 5.

Figure 5: Tree after deleting the ‘zzz.txt’ file and the ‘fff.txt’ file.

Node B will still contain bbb.txt in its unallocated space because the entry was moved to Node I rather than truly removed. As a result, a forensic analysis tool may report bbb.txt as deleted, even though it was not. The file name was merely relocated as part of the rebalancing operation triggered by the deletion of fff.txt.

We mentioned earlier that the B-tree is designed specifically for storage systems that read and write large blocks of data. Actually, the B-tree, which entails more than what was earlier described, is seldom implemented. What is most commonly implemented is a variant of the B-tree, called the B+tree.

NTFS uses a structure that closely resembles a B+tree to implement directory indexes (e.g., $INDEX_ROOT and $INDEX_ALLOCATION). Understanding this distinction is important for accurate filesystem and forensic analysis.

The key difference between a B-tree and a B+tree lies in how records are stored. In a traditional B-tree, both internal nodes and leaf nodes may store keys along with pointers to actual records. Consequently, file or directory records can exist at multiple levels of the tree. A B+tree, however, stores records exclusively in leaf nodes. Internal nodes contain only key values and child pointers, serving solely as navigation structures that guide searches down the tree. These internal keys are effectively redundant from a data-storage perspective, as the actual records always reside at the leaf level. Both structures are generalizations of a binary search tree, allowing a node to contain multiple keys and more than two child pointers. Each child pointer represents a key range, directing the search to the subtree that contains values less than, equal to, or greater than a given key.

NTFS benefits from the B+tree design because internal nodes do not store data, allowing more keys to fit within a single disk cluster. This results in a shallower index tree, which reduces disk I/O and improves lookup performance. From a forensic standpoint, this also means that directory entries of evidentiary value are always located in leaf nodes, not in internal index nodes. Another important characteristic is that leaf nodes in a B+tree are logically ordered and often linked, enabling efficient sequential traversal. In NTFS, this design allows directory listings to be read efficiently and enables investigators to perform linear scans of directory entries by walking the leaf nodes. A complete directory enumeration, therefore, requires only a single pass through the leaf level. In contrast, if NTFS used a traditional B-tree, directory records would be scattered across internal and leaf nodes, making sequential scans less efficient and increasing the complexity of both normal operation and forensic reconstruction. Regardless, you will see the two different B-tree types in NTFS, where the B+tree structure is more often used than the B-tree.

NTFS Directory Indexing

Having introduced the general concept of B-trees, we can now examine how NTFS uses them to implement indexing. In NTFS, indexes are primarily used to organize file and subdirectory names within a directory, allowing rapid lookup of directory entries.

Each node in the tree is composed of index entries, which are the data structures used to store values within the node. For directory indexes, the key values are file names, represented by $FILE_NAME structures. Index entries are arranged sequentially within a node and terminated by a special empty entry that marks the end of the list.

Figure 6: NTFS directory structure when indexing in use

NTFS stores directory index data in the $INDEX_ROOT and $INDEX_ALLOCATION attributes of the directory’s MFT entry. These structures, which hold the B-tree nodes, are commonly referred to as index buffers or index records and are collectively identified by the name $I30. It is important to note that $I30 is not an actual file; rather, it is an identifier that denotes the directory entry index and is implemented using multiple NTFS structures, including the index root, index allocation, and bitmap.

The $INDEX_ROOT attribute begins with a 16-byte attribute header (whose data structure is shown in the Figure below), followed immediately by the $INDEX_ROOT header, which marks the start of the attribute’s content. The $INDEX_ROOT attribute is always resident and has a type identifier of 0x90 (144 decimal). The attribute contains a single index node that functions as the root node of the directory’s B-tree structure.

Figure 7: Data structure of the $INDEX_ROOT attribute

This root index node consists of one or more index entry structures, as shown in the Figure above. Each index entry may contain a $FILE_NAME structure that represents a file or subdirectory within the directory. The index entry stores two primary components: a file name, which serves as the index key, and an MFT file reference number, which acts as a pointer to the corresponding MFT record. An index entry is not required to contain a $FILE_NAME structure. When the structure is absent, the index entry is considered an empty index entry, indicating the absence of a key and typically marking a boundary or termination condition within the index node. The data structure for an index entry is shown below.

Figure 8: Data structure for “index entry” (specifically for directory)

When a directory grows too large to store all of its index entries within the resident $INDEX_ROOT attribute, NTFS uses a non-resident $INDEX_ALLOCATION attribute instead. The $INDEX_ALLOCATION attribute stores the additional nodes of the B-tree that cannot fit in the index root. Unlike $INDEX_ROOT, the $INDEX_ALLOCATION attribute is always non-resident and follows the standard layout of a non-resident NTFS attribute. For small directories, this attribute does not exist, as all indexing information can be fully contained within the $INDEX_ROOT structure. The data structure of the $INDEX_ALLOCATION attribute is shown below.

Figure 9: Data structure for $INDEX_ALLOCATION attribute

The $INDEX_ALLOCATION attribute (type 0xA0) is used when an index grows beyond the space available in the $INDEX_ROOT attribute. It stores the overflow index data on disk in the form of one or more Index Records. Each Index Record represents a single node in the NTFS B-tree structure and contains a sequence of Index Entry structures, which are the same ones found in the $INDEX_ROOT attribute. In directory indexes, these entries are commonly referred to as $I30 entries and contain filename keys along with references to the corresponding Master File Table (MFT) records.

The size of each Index Record is defined in the $INDEX_ROOT attribute header and is typically 4096 bytes, matching the system cluster size in most NTFS volumes. An Index Record begins with an Index Record Header, followed by an Index Node Header, and then a sequence of Index Entries. Together, these records form the non-resident portion of the directory index, enabling NTFS to scale directory lookups using a B-tree structure efficiently.

Figure 10: Data structure of Index Record

From a forensic perspective, the $INDEX_ROOT attribute is significant because it contains the root of the directory index and, in small directories, may hold all directory entries without the use of $INDEX_ALLOCATION. Each index entry links a human-readable file name to its corresponding MFT record, providing a direct mapping between directory listings and MFT metadata.

Empty index entries are particularly noteworthy during analysis. While they normally function as structural markers within the index, their presence and surrounding slack space may contain remnants of previously allocated index entries. As a result, the $INDEX_ROOT attribute can be a valuable source of evidence when examining directory activity, file creation, deletion, and potential anti-forensic behavior.

To demonstrate how NTFS implements tree-based indexing, we examine the root directory of an NTFS volume. The figure below presents a hexadecimal view of the index-related attributes stored in MFT entry 5, which represents the root directory of the volume and contains the initial index structures used for directory enumeration.

Figure 11: INDEX attributes in MFT entry 5 (NTFS volume’s root directory)

In this analysis, we focus on three attributes, beginning with the $INDEX_ROOT attribute. The attribute header is shown in the image below.

Figure 12: INDEX_ROOT attribute header

As shown in the figure above, the first four bytes of the attribute header contain the value 0x00000090 (144 decimal), identifying the attribute as an $INDEX_ROOT. The total length of the attribute is 0x00000058 (88 decimal), as indicated by bytes relative offsets 0x04–0x07 of the attribute header. The attribute content begins at byte offset 0x0020 (32 decimal), as specified in relative offset 0x20–0x21 of the header. This offset marks the start of the $INDEX_ROOT structure, also referred to as the $INDEX_ROOT header, which contains the metadata and index node information used to describe the root of the directory index. The entire data structure is interpreted as follows.

Relative offsets	Description	Values
0x00-0x03	Attribute ID	0x00000090
0x04-0x07	Length of Attribute	0x00000058 = 88 (i.e., from absolute byte 0x170-0x1C7 in Figure 11)
0x08	Resident/Non-Resident Flag	0x00 = resident
0x09	Length of Name of Attribute	0x04 = 4
0x0A-0x0B	Offset to Name of Attribute	0x0018 = 24 (i.e., absolute offset 0x170 + 0x18 = 0x188 in Figure 11)
0x0C-0x0D	Flags	0x0000 = normal
0x0E-0x0F	Not Yet Known	0x0006 = possible ID
0x10-0x13	Length of Attribute content	0x00000038 = 56 (From absolute offset 0x190-0x1C7 in Figure 11)
0x14-0x15	Offset to the Start of Attribute content	0x0020 = 32 (i.e., absolute offset 0x170 + 0x20 368 = 0x190 in Figure 11)
0x16	Indexed flag	0x00 = not indexed
0x17	Padding to 8-byte boundary	0x00

Relative offsets 0x18-0x1F contain the details of this name, which is “$I30” in Unicode.

To extract the contents of an $INDEX_ROOT attribute, we can use icat and supply the 144 type as shown. Notice that it is the same as the attribute content as contained in Figure 11. Bytes 0x00-0x0F (highlighted red) constitute the INDEX_ROOT content header.

Figure 13: INDEX_ROOT attribute content

Bytes 0x00-0x03 identify the type of entry as listed in the $AttrDef file. In this case, we note that 0x30000000 is a FILE_NAME type. Bytes 0x04-0x07 are reported to be an indication of the collation sorting rule. This byte sequence identifies the rule that is to be used to sort the following index entries. If the type is File Name, as is the case here, then the rule COLLATION_FILENAME is used. The collation sorting rule can be any of the following:

0x00000000 → Binary
0x00000001 → File Name
0x00000002 → Unicode String
0x00000010 → Unsigned Long
0x00000011 → SID
0x00000012 → Security Hash
0x00000013 → Multiple Unsigned Long

Bytes 0x08-0x0B contain the size of the Index Allocation Entry. The value here of 0x00001000 is equal to 4096 decimal. A scan of the sample volume reveals that “INDX” files, which are directory listings held as non-resident data in blocks on the “user” area of the disk, are 4096 bytes long. Any further space that is required by the file is allocated in blocks of the same size, which may or may not be contiguous.

Byte 0x0C is declared to be the Number of Clusters per Index Record. It is known from the BPB at byte offset 0x13 that the cluster size on this disk is eight sectors, and thus this value of 0x01 (1 decimal) equates to 8× 512 = 4096 bytes. The remaining three bytes, offsets 0x0D-0x0F, are padding.

The $INDEX_ROOT attribute contains an Index node, which is the root node of the B-Tree describing the root directory here. The Index Node starts with an Index header (highlighted yellow in Figure 11), immediately following the end of the $INDEX_ROOT Header.

Byte offsets 0x10-0x13 record the offset to the first index entry, which is relative to start of node header, each index entry containing a “$FILE_NAME” structure. In this case the value is 0x00000010 (16 decimal). This value, therefore, points to offset 0x10 + 0x10 = 0x20.

Byte offsets 0x14-0x17 record the total size of the Index Entries. The value 0x00000028 is equal to 40 decimal. Applied from the start of this Index Header, the last byte in the index should be at byte offset 0x10 + 0x28 – 0x1 = 0x37. This coincides with the declaration in the Attribute Header.

The allocated size of the Index Entries is recorded in byte offsets 0x18-0x1B. The value is again 0x00000028, which is equal to 40 decimal. It means the area containing index entries starts at byte offset 0x20 and ends at byte offset 37 (highlighted orange in Figure 13). Clearly, all the allocated index entries' space is occupied here.

Byte offset 0x1C-0x1F is reported to be a flag that signals whether the Index is a “Small” index, which will fit within the MFT record, or a “Large” index that requires an external Index Allocation Unit. In this case, 0x00000001 signals that this is a “Large” index, whereas 0x00000000 would signal it was a “Small” index. Experiments have shown that part directory listings may be retained within the MFT record both with and without external “INDX” buffer files. Where both types are present (internal and external) this flag will be set to 0x00000001. Where there are no entries in the listing (an empty directory), or entries are resident only, this flag will be set to 0x00000000.

The index entries can be examined to identify stored file names and to determine whether child index nodes exist. Figure 14 shows the index entries located in the $INDEX_ROOT attribute.

Figure 14: Index Entries in $INDEX_ROOT attribute

The eight bytes at relative offsets 0x00-0x07 point to the MFT Base File Reference for this item. The two bytes at relative offsets 0x08-0x09 are stated to be the value of the length of the Index Entry. The value here is 0x0018, which is 24 decimal. The two bytes at offsets 0x0A-0x0B contain the length of the $FILE_NAME attribute associated with the index entry. In this case, the value is 0x0000, meaning no file name is stored. This confirms that the entry is an empty index entry, used only for structural purposes.. The byte at offset 0x0C is a flag byte with values as follows:

0x00 → Child node.
0x01 → Child node in INDEX_ALLOCATION.
0x02 → Last entry.
0x03 → Last Entry, Child node in INDEX_ALLOCATION.

The flag value 0x03 is present in this case. This value is a combination of two flags 0x00000003 = 0x01 || 0x02:

0x01 – Indicates that the index entry has a child node.
0x02 – Indicates that this is the last index entry in the node.

The remaining three bytes at offsets 0x0D-0x0F are probably padding. This is the end of the Index Entry Header and the start of the Index Entry content. Because the entry references one child node, the last 8 bytes of the structure contain the Virtual Cluster Number (VCN) of that child. Here, the VCN value is 0, indicating that the child index node is located at VCN 0 within the $INDEX_ALLOCATION attribute.

If the Index Flag ‘Last Entry’ at relative offset 0x0C is not set (in the directory of interest), the Index node entry will have the following structure:

Relative byte offset	Length	Description
0x00	6 BYTES	MFT record number for file name or unused (index)
0x06	WORD	MFT record sequence number or unused (index)
0x08	WORD	Length of Node Entry
0x0A	WORD	Length of content (length of Filename attribute)
0x0C	BYTE	Index Flags
0x10	6 BYTES	MFT Record number of the parent directory
0x16	WORD	MFT Record Sequence number of the parent directory
0x18	LONGLONG	File Create time
0x20	LONGLONG	File Modified time
0x28	LONGLONG	MFT Record Modified time
0x30	LONGLONG	File Last Accessed time
0x38	LONGLONG	File Allocated Size
0x40	LONGLONG	File Real Size
0x48	DWORD	File Type Flags
0x4C	DWORD	$EA buffer size needed or $Reparse_Point Tag
0x50	BYTE	Filename Length (Number of Unicode characters)
0x51	BYTE	File name type (Namespace)
0x52	FileName length * 2	File name

The second attribute is the $INDEX_ALLOCATION attribute, shown in the figure below. Records will only have an $Index_Allocation attribute if the $Index_Root attribute header has its flag set to ‘0x00000001’ (Index points to $Index_Allocation Child Nodes). This attribute is always non-resident, and its content is stored in data runs, which point to the actual storage location for all sub-nodes of the B+tree that implements a directory index.

Figure 15: MFT entry 5’s $INDEX_ALLOCATION attribute

The start of an $INDEX_ALLOCATION Attribute is signalled by the identifier 0x000000A0 (160 decimal) at relative offsets 0x00-0x03. As with all attributes, this $INDEX_ALLOCATION Attribute starts with an Attribute Header, the layout of which is identical to the attribute header that we encountered and deconstructed in the $INDEX_ROOT attribute. The entire data structure is interpreted as follows.

Relative offsets	Description	Values
0x00-0x03	Attribute ID	0x000000A0
0x04-0x07	Length of Attribute	0x00000050 = 80 (i.e., from absolute byte 0x1C8-0x217 in Figure 11)
0x08	Resident/Non-Resident Flag	0x01 =non-resident
0x09	Length of Name of Attribute	0x04 = 4
0x0A-0x0B	Offset to Name of Attribute	0x0040 = 64 (i.e., absolute offset 0x1C8 + 0x40 = 0x208 in Figure 11)
0x0C-0x0D	Flags	0x0000 = normal
0x0E-0x0F	Not Yet Known	0x0008 = possible ID
0x10-0x17	Starting VCN	0x0000000000000000 = 0
0x18-0x1F	Last VCN	0x0000000000000000 = 0
0x20-0x21	Offset to the Data Runs	0x0048 = 72 (i.e., absolute offset 0x1C8 + 0x48 = 0x210 in Figure 11)
0x22-0x23	Compression unit size	0x0000 = 0
0x24-0x27	Padding to 8-byte boundary	0x00000000
0x28-0x2F	Allocated Size of Attribute	0x0000000000001000 = 4096
0x30-0x37	Real Size of Attribute	0x0000000000001000 = 4096
0x38-0x3F	Initialized size of Stream	0x0000000000001000 = 4096
0x40-0x47	Attribute Name	$I30
0x48-0x4F	Data Run Descriptor	11 01 2C 00 00 00 00 00

It should be noted that bytes 0x36 and 0x37 are shown in Figure 15 as 0x0500, respectively. These are, of course, the Update Sequence Number for this record, and they thus need to be replaced, before analysis, by the appropriate values from the Update Sequence Array. In this case, the values are 0x0000, and this is what is included in the value for the Real Size of the Attribute entry in the table above.

Next, we analyze the extracted data runs to determine which disk clusters store the B+tree structures used by the root directory’s index sub-nodes. The first byte of the data run is 0x11. This byte is split into two 4-bit values:

Lower nibble (0x1) → size of the run length field.
Upper nibble (0x1) → size of the starting cluster offset field.

This indicates that one byte is used to describe the run length and one byte is used to store the starting cluster number. The next byte (0x01) represents the run length, meaning the run spans one cluster. This confirms that only a single cluster is allocated to the $INDEX_ALLOCATION attribute, consistent with earlier observations. The following byte (0x2C, decimal 44) specifies the starting cluster number. Therefore, the data for this index allocation begins at Cluster 44. At this point, the first data run has been fully parsed. Since only one cluster is listed, the entire $INDEX_ALLOCATION attribute resides in Cluster 44. The next byte in the sequence is 0x00, which marks the end of the data run list, indicating that no additional runs exist. In summary, the $INDEX_ALLOCATION attribute for this directory occupies a single cluster—Cluster 44—which contains the B+tree index data for the directory.

Figure 16: Data runs in the $INDEX_ALLOCATION attribute

Next, we obtain and analyze the contents of Cluster 44 which contains one index record or index node. The Figure below shows part of hex dump of Cluster 44.

Figure 17: Hex dumpof the beginning of Cluster 44 containing INDX header, index node header, and first index entry

As shown in the figure above, the index record begins with the signature 0x494E4458 (“INDX”) at bytes 0-3, identifying it as an NTFS index record. Bytes 4-5 represent the offset to the Update Sequence Array (0x0028 = 40), and bytes 6-7 represent the size of the Update Sequence Array (0x0009 = 9). Bytes 8-15 indicate the $LogFile Sequence Number (0x0000000000103BF2). Bytes 16-23 represent the Virtual Cluster of the index record in the index allocation (0x0000000000000000). Please refer to Figure 10 for the full data structure. The first 24 bytes constitute the INDX header, which is immediately followed by the index node header. The first four bytes of the index node header contain the offset to the first index entry. In this example, the value 0x00000040 indicates an offset of 0x40 (64 bytes) from the start of the index node header. When combined with the 24-byte INDX header, the first index entry therefore begins at byte offset 0x58 (88 decimal) from the start of the index record.

The first eight bytes of the index entry represent the MFT file reference number associated with the filename stored in the entry. In this case, the value 0x0400000000000000 corresponds to MFT entry 0x000000000004, which maps to the NTFS system file $AttrDef. At offsets 0x08–0x09 within the index entry, the entry length is stored. Here, the value 0x68 indicates a total index entry size of 104 bytes. Offsets 0x0A–0x0B specify the size of the embedded $FILE_NAME attribute, which in this case is 0x52 (82 bytes). Finally, bytes 0x0C–0x0F contain the index entry flags. A value of 0x00000000 indicates that this entry does not point to a child node, meaning it is a leaf entry in the B-tree structure.

The $FILE_NAME structure, which contains the name of the file referenced by the first index entry, begins at byte offset 104. This offset is calculated as 88 + 16, where 88 bytes represent the start of the first index entry and 16 bytes correspond to the size of the index entry header. The first 8 bytes of the $FILE_NAME content (05 00 00 00 00 00 00 05 00) represent the MFT file reference number of the parent directory. The lower six bytes contain the actual MFT entry number, which in this case is 0x000000000005 (5). MFT entry 5 corresponds to the NTFS root directory, confirming that the referenced file resides in the root of the volume.

At byte offset 66 within the $FILE_NAME structure, the file name field begins. NTFS stores filenames in Unicode, using two bytes per character. The value at byte offset 64 (0x08) specifies the length of the filename in characters, which in this case is 8. Following this length field, the Unicode characters representing the filename appear (highlighted by the dotted line in Figure 17). Decoding these characters reveals the filename “$AttrDef”.

In this example, a total of 13 index entries are present. For brevity, we omit a detailed discussion of most of these entries, including $BadClus (second entry), $Bitmap (third entry), $Boot (fourth entry), $Extend (fifth entry), $LogFile (sixth entry), $MFT (seventh entry), $MFTMirr (eighth entry), $Secure (ninth entry), $UpCase (tenth entry), $Volume (eleventh entry), and the root directory (.) entry (twelfth entry). Instead, we proceed directly to the final index entry and examine it in detail, as illustrated in the figure below.

Figure 18: Hex dump of the ending of Cluster 44 containing the last two index entries

File name → canada.txt
MFT reference number of the file → MFT entry 35
MFT reference of parent directory → MFT entry 5
Has child node? → No.

The B-tree index structures for the root directory in our sample image is shown in the figure below. These entries are organized in alphabetical order by the name of the files included.

Figure 19: B-tree index structures for the root directory

The third attribute is the $BITMAP attribute. When an index becomes too large to fit in the resident $INDEX_ROOT, NTFS stores additional entries in the $INDEX_ALLOCATION attribute and uses a $BITMAP attribute to track allocation. The $BITMAP attribute maintains a bit-level map in which each bit represents the allocation state of a corresponding index buffer within $INDEX_ALLOCATION, typically 4 KB in size. This structure enables NTFS to efficiently manage index growth and reuse freed index records.

This structure is used by named indexes such as $I30, $O (Object ID), $R (Reparse Points), and the indexes maintained by $Secure. The $BITMAP attribute itself may be resident or non-resident depending on its size, rather than the presence of a stream name. The $BITMAP attribute has a type identifier of 0xBO (176) and is organized by bytes, and each bit corresponds to an index record.

Figure 20: MFT entry 5's $BITMAP attribute

As shown in the figure above, the attribute flags at relative offset 0x08 are set to 0x00, indicating that the attribute is resident. The attribute content begins at a relative offset of 0x0020 (32 decimal), as specified by the value in relative offsets 0x14–0x15 (20 00). The total length of the attribute content is 0x08 bytes, as indicated by relative 0x10–0x13 (08 00 00 00). The attribute content (relative offset 0x32-0x390 is extracted as follows.

Figure 21: MFT entry 5's $BITMAP attribute content

Except for the first bit (bit 0) of the first byte, all other bits in the attribute content are zero. This indicates that only the first index node (Index Record #0) is currently in use. It is important to note that a root node always exists in a B-tree, stored in the $INDEX_ROOT attribute; however, the root node is not represented in the $BITMAP. The $BITMAP attribute tracks the allocation status of sub-nodes within the B-tree index only.

As files are added to the directory, the listings are initially built within the MFT record itself. When no further entries can be added because all space within the MFT record has been used, the MFT directory listing is changed from resident to non-resident. When this occurs, the MFT record is truncated, and some of the Index Entries in the Index Root attribute are overwritten by two new attributes which define the external storage: the $INDEX_ALLOCATION attribute and the $BITMAP attribute. It should be noted that, in some cases, directory listings are present both in the MFTrecord itself and in an external “INDX” file. The data between the end of the logical record and the end of the record space is known as MFTRecord Slack. Such data could be of significant value in a forensic investigation, as it may be the only record of the file or files having been present on the machine.

References

Brian Carrier → File system forensic analysis.
Xiaodong Lin → Introductory computer forensics.

Facebook SDK

NTFS Indexing Forensic Analysis

B-Tree

NTFS Directory Indexing

References

Joseph Moronwi

Post a Comment

Post a Comment

Contact Form