Figure 1: Decoding a FAT32 root directory entry

In FAT file systems, every file and folder is described by a structure called a directory entry. Directories in the FAT file system are merely normal files that contain a sequence of directory entries as content. A directory entry is a 32-byte structure that contains the file’s name and all metadata associated with the file. The directory entry structure is of vital importance in FAT forensics. It enables the listing of files on the volume, gathering of file metadata, and recovery of file content. The directory entry is organized as a table, known as the Directory Table (DIR) or File Directory Table (FDT). Each entry records the name, extension, attributes (archive, directory, hidden, read-only, system, and volume), the date and time of creation, the address of the first cluster of the file/directory's data, and finally the size of the file/directory. Many people confuse the FAT table (cluster allocation map) with the directory table (collection of directory entries)—they are different. The directory table contains file metadata, while the FAT table is the cluster allocation map (pointer/linked list index).

The Root Directory

The FAT file system employs a hierarchical directory structure, with the “root” directory serving as the topmost directory in the hierarchy. Rather than having file data, the root directory has data structures that describe each directory entry. This would be a collection of files or directories. The root directory allows us to locate files on the partition. In order to find files, you have to start with the BIOS Parameter Block to locate the root directory. From there, you can locate files by following their directory path. Each directory structure entry points to a cluster on the drive. In the case of directories, that cluster will contain another set of directory entries. In the case of files, the clusters contain the file data. For FAT12 and FAT16 media, the first sector of the root directory is the sector number relative to the first sector of the FAT volume:

FirstRootDirSecNum = number of reserved sectors + ((number of FATs) * (number of sectors occupied by one FAT))

It is important to understand that the root directory behaves differently depending on the FAT variant being used. On FAT12 and FAT16, the root directory is special. It is not stored like an ordinary directory in the data area. Instead, it lives immediately after the FAT area, in a fixed-sized table computed from byte offsets 0x11-0x12 of the BIOS Parameter Block. Because of this legacy layout:

The root directory itself has no timestamps. Please note that the entries still have timestamps, only the “root directory as an object” doesn’t act like a normal directory in the data area.
It does not contain the implicit "." or ".." directory entries.
It does not have a conventional file name (beyond the implied root path \).
And importantly, this is the only place on the volume where a directory entry may exist that has only the ATTR_VOLUME_ID attribute bit set (i.e., the volume label).

This is why older FAT variants treat the root directory as a structural “header” for the volume rather than a normal directory object. On FAT32, this changed. Starting with FAT32, the root directory is stored just like any other directory—it has a cluster chain, resides in the data area, contains "." and ".." entries, and its directory entry fields include valid timestamps. In FAT32, the root directory is no longer a special zero-cluster no-timestamp table. It is simply the top of the directory tree. The format of all directories (root and subdirectories) is the same. Each entry is 32 bytes (0x20) in size. Figures 1 and 2 below show the decoding of a hexadecimal dump of a FAT32 root directory. Offset is merely from the start of that particular entry, not from the start of the block.

Figure 1: Decoding a FAT32 root directory entry

Figure 2: Decoding a FAT32 root directory entry (continued)

The byte offset 0x0-0xa represents the filename. Specifically, the byte offset 0x0–0x7 represents the eight-character base filename, followed by a three-character filename extension at byte offset 0x8–0xA. It is also known as the 8.3 file name or short file name and the corresponding directory entry is known as the short name directory entry The 8.3 file names are stored in ASCII code. Either base filenames or extensions are padded with white spaces (0x20). It can be observed that the file name is longy.txt. Note that an 8.3 filename is written in a way that a dot is added between the base filename and the filename extension.

The characters comprising a short file name may be any combination of letters, digits, or characters with code point values greater than 127. The following special characters are also allowed:

$   %   '   -   _   @   ~    `   !   (    )   {   }  ^  #  &

Names are stored in a short name directory entry in the OEM code page that the system is configured for at the time the directory entry is created. Short name directory entries remain in OEM character set for compatibility. OEM characters are single 8-bit characters or can be DBCS character pairs for certain code pages. Short file names passed to the file system are always converted to upper case and their original case value is lost. A problem that is generally true of most OEM code pages is that they map lower to upper case extended characters in a non-unique fashion i.e. they map multiple extended characters to a single upper case character. This creates problems because it does not preserve the information that the extended character provides. This mapping also prevents the creation of some file names that would normally differ, but because of the mapping to upper case they become the same file name.

Also, the byte offset 0x1C–0x1F, which represents the file size in bytes, is 42 39 00 00 in raw form. When this raw data is parsed in a meaningful way, it becomes 0x00003942, because it is stored using the Little Endian format. Therefore, by converting this base-16 value to a base-10 value for our own understanding, the size of the file is 14,658 bytes.

Another important step involves identifying the first cluster address given in offsets 0x14-0x15 and 0x1A-0x1B. The cluster address would be 3 because combining the high 2 bytes 0x0000 with the low 2 bytes 0x0003 results in 0x00000003. How are these combined? The high bytes are written first, and the low bytes are concatenated directly afterwards to create a 4-byte value of 0x00000003. This can now be converted into decimal. Next, we look through the FAT table to determine the chain of clusters that are allocated to the file.

Suppose the cluster size (i.e., sectors per cluster, obtainable from the BPB) is 4096 bytes and the file occupies four clusters; it means the disk space occupied by the file is 4 * 4096 = 16,384 bytes. However, as we have seen previously, the file size is 14,658 bytes. Obviously, they are not equal, and the difference is called Slack Space, which is 1726 bytes.

Parsing this data manually allows one to gather information, as defined in the table below. The table shows a summary of a single directory entry; note that the offset is merely from the start of that particular entry, not from the start of the block.

Byte offset (within directory entry) in hexadecimal

Length

Description

0x0

BYTE

First character of file name in ASCII and allocation status (0xe5 or 0x00=unallocated, 0x2e=not a normal file, but directory)

0x1

10 bytes

Character 2-11 of file name in ASCII

0xB

BYTE

File attributes (0x01 = read file only, 0x02 = hidden file, 0x04 = system file, 0x08 = disk's volume label, 0x0f = long file name, 0x10 = subdirectory, 0x20 = archive flag, 0x40 and 0x80 = not used, must be set to zero)

0x0C

BYTE

Reserved

0x0D

BYTE

This is the millisecond timestamp at the time of file creation. This field actually contains a count of tenths of a second. The granularity of the seconds part of the creation time (hours, minutes, seconds) is 2 seconds, so this field is a count of tenths of a second, and its valid value range is 0-199 inclusive.

0x0E

WORD

Creation time (hours, minutes, seconds). The hour, minute, and second are encoded according to the following bitmap:

Bits	Description
15 - 11	Hours (0-23)
10 - 5	Minutes (0-59)
4 - 0	Seconds /2 (0-29)

Note that the seconds are recorded only to a 2-second resolution. Finer resolution for file creation is found at offset 0x0d.

0x10

WORD

Creation date. The year, month, and day are encoded according to the following bitmap:

Bits	Description
15 - 9	This specifies the number of years since 1980 (0 = 1980, 127 = 2107)
8 - 5	This specifies the month number, 1 to 12 (1 = January, 12 = December)
4 - 0	This specifies the day of the month (1 - 31)

0x12

WORD

Last access date. Note that there is no last access time, only a date. This is the date of the last read or write. In the case of a write, this should be set to the same date as offset 0x16-0x17.

0x14

WORD

High WORD of the entry's first cluster address (always 0 for a FAT12 or FAT16 volume).

0x16

WORD

Last write time (hours, minutes, seconds). Note that file creation is considered a write. See offset 0x0e for description.

0x18

WORD

Date of last write. Note that file creation is considered a write. See offset 0x10 for description.

0x1A

WORD

Low WORD of the entry's first cluster address.

0x1C

DWORD

32-bit DWORD holding the file's size in bytes (0 for directories).

So if we want to look up a file, we first have to examine that the entry is "real". So we want to skip all entries that has file attributes = 0x0F because those are entries that contains data to support a long file name. Those entries have a different format (to be discussed in a later section). We also want to check that the file hasn't been deleted. So we check the first byte/character of the filename. If it's 0xE5 it means that the file has been deleted.

In Figure 3 below, a directory entry has a file named “file.dat.” In addition to the file name, the entry also contains the file size and the address of the first cluster allocated to the file. With the address information of the starting cluster, we can locate the corresponding FAT entry (FAT entry 16), whose content is the next cluster allocated for the file, i.e., the second cluster (or block 17) (note that the FAT entry number starts with 0). This continues until the FAT entry (FAT entry 20) corresponding to the last cluster is reached. The last FAT entry contains a special signature “EOF,” which marks the end of the file. Except for FAT entries 0 and 1, each FAT entry is associated with a cluster whose address is equal to the FAT entry number. As such, we can find out address information for all clusters (or the chain of clusters) allocated for the file, which are Clusters 16–20. According to this address information, we can determine the physical disk location of these clusters and obtain the file contents stored there.

Figure 3: FAT directory entry and cluster chain relationship

Let us look at the raw contents of a directory entry from the root directory using the Sleuth Kit. The starting location of the root directory in a FAT32 file system is given in the boot sector. It is 8192 for my sample FAT image.

Figure 4: Hex dump of a root directory using TSK

The figure above shows a directory entry with the attribute at byte 11 set to 0x20, which means that only the archive (file) attribute bit is set. Byte offset 0x0D shows the tenths of a second for the create time, which is 124 (0x7c), and this gives 124*10ms = 1.24 sec. Bytes 0x0E - 0x0F show the created time, 0xaf68 (1010 1111 0110 1000₂), which, when converted as shown below, gives 21:59:16.

Hours = bits 15 - 11 converted to decimal (i.e., 10101₂ = 21_d)
Minutes = bits 10 - 5 converted to decimal (i.e., 111011₂ = 59_d)
Seconds = bits 4 - 0 converted to decimal (i.e., 01000₂ = 16_d)

The created date is in bytes 0x10 - 0x11 and has a value of 0x4036 (0100 0000 0011 0110₂), which, when converted, is January 22, 2012.

Year = bits 15 - 9 converted to decimal (i.e., 0100000₂ = 32_d) = 1980 + 32 = 2012
Month = bits 8 - 5 converted to decimal (i.e., 0001₂ = 1_d) = January
Day = bits 4 - 0 converted to decimal (i.e., 10110₂ = 22_d) = 22nd.

The accessed date is in bytes 0x12 - 0x13 and has a value of 0x4036 (0100 0000 0011 01102), which, when converted, is January 22, 2012. While both creation and modification have both date and time values, the access value has only a date available. There is no access time structure in the FAT directory entry! Sleuth Kit (and many other tools) choose to display a value of 00:00:00 when in reality there is no value. It might help to mention that forensic analysts should treat 00:00:00 as meaning no recorded time rather than “midnight”. Because in court or analysis it matters. It is hoped the reader can see the potential issues if asked about this in court. It is the writer’s belief that an expert's testimony can lend itself to dubitation by simply asking how the file was accessed before it was created. If the analyst is unaware of the FAT file system’s stored data and how forensic tools report that data, the analyst would be unable to answer the question!

We can see from bytes 0x14 to 0x15 and 0x1A to 0x1B that the starting cluster is 3 (0x00000003), and bytes 0x1C to 0x1F show that the file size is 8,827 (0x0000 227b) bytes. To determine all the clusters in this file, we will need to refer to the FAT. First, let us consider the output of the fsstat Sleuth Kit command for our FAT image.

Figure 5: Output of the Sleuth Kit's fsstat command

The table entry for cluster 3 is located in bytes 12 to 15 of the FAT table (which in our case is sector 6316, the first sector following the reserved region) as shown in Figure 6 below. We see that the value is 4 (0x00000004), which indicates that cluster 4 is the next cluster in the chain. The table entry for cluster 4 is located in bytes 16 to 19, and we see that the value is 5 (0x00000005). We can see that consecutive clusters were allocated to this file until we get to entry 11 at bytes 44 to 47, which has an end-of-file marker (0x0fffffff).

Figure 6: Cluster 3 and the FAT cluster chain of the file

We can verify that we have the correct number of clusters by comparing the file size with the allocated space. The file has allocated nine 1,024-byte clusters, so there are 9,216 bytes of storage space. However, as we have seen from the above, the file size is 8827 bytes. Obviously, they are not equal, and the difference is called Slack Space, which is 389 bytes.

Remember that the Sleuth Kit uses sector addresses instead of cluster addresses. To convert cluster 3 to its sector address, we need the sector address of cluster 2. Since we are dealing with a FAT32 file system, we know the sector address of cluster 2 is the first sector of the data area, i.e. sector 8192 (from Figure 5). Also from Figure 5, we know that the sectors per cluster = 1024/512 = 2. The sector address S of a given cluster number C is determined by the following formula:

S = (Sector address of Cluster 2) + ((C - 2) * (Sectors per cluster))

The Sector address of Cluster 3 is therefore:

S = 8192 + ((3 - 2) * 2) = 8194

The fsstat tool in TSK dumps the contents (in sectors) of the FAT structures. We previously saw the output of the fsstat command in Figure 5. Below is the content of the FAT structures.

Figure 7: FAT content (in sectors)

Here the output shows us the cluster chain for readme.txt from sectors 8194 to 8211 and the End of File. Each cluster was 2 sectors in size, so we can see in the parentheses that there are 18 sectors in the cluster chain. The istat tool in TSK shows the details of a directory entry and its output for this entry is shown in the Figure below. Using the metadata-addressing scheme of TSK, the readme.txt file is the first entry in the root directory, which means that it has an address of 3.

Figure 8: Output of the istat command

The Long File Name

With Windows 95 and 98, Microsoft removed the limitation of only 8 characters for the file name and 3 characters for the file type from file and directory names and introduced the so-called long file name, which can be up to 255 characters in length. Both upper- and lower-case letters are permitted in long file names and the range of other characters that may be used has also been increased. However, in order to maintain compatibility with MS-DOS and to continue to operate within the existing file systems, the long file name is spread over a number of standard 32-byte directory entries and it is always matched with a short file name (SFN) alias which conforms to the old 8.3 naming conventions. Windows automatically generates the SFN alias from the long file name. This set of additional directory entries (also known as the long name directory entries) must immediately precede the corresponding short name directory entry and is, therefore, physically contiguous with the short name directory entry.

It is worth noting that it is possible to have more than one LFN entries for a file with a long name; each of these entries stores a corresponding part of the long filename in Unicode so that each character in the name uses two bytes in the directory entry. Note that the Unicode characters are stored in little-endian.

Byte offset (within LFN directory entry) in hexadecimal	Length	Description
0x0	BYTE	Sequence number, starting at 1 and increasing for each LFN entry until the final entry, and the last one is XORed with 0x40; and allocation status (0xe5 if unallocated).
0x1	10 bytes	5 filename characters (UTF-16).
0xB	BYTE	File attributes (always 0x0F). This gives file attributes of read only (0x01), hidden (0x02), system (0x04), and volume (0x08).
0xC	BYTE	Reserved (always 0x00).
0xD	BYTE	Checksum of DOS file name.
0xE	12 bytes	6 filename characters (UTF-16).
0x1A	WORD	Reserved.
0x1C	DWORD	2 filename characters (UTF-16).

There is an 8-bit checksum in each LFN entry, which is created from the short name data. It can be used to link the “Long File Name Directory Entry” to the standard “Directory Entry” representing the file. Also, there is a special field called “sequence number”, a.k.a. ordinal field, which explains the order of multiple LFN entries (first LFN entry, second LFN entry, etc.). The sequence of long name directory entries is stored in reverse order (last entry in the set is stored first, followed by entry n-1, followed by entry n-2, and so on, until entry 1). Therefore, the first entry that you will find in the directory will be the last LFN entry for the file and will have the largest sequence value. The last one has a special mark in its “sequence number” field, with its sixth bit set as 1—this indicates that it is the last entry. In the order the LFN entries are laid out, the long file name can be pieced together from characters stored in these fields. Unused characters are padded with 0xff, and the name should be NULL-terminated if there is room.

From the table above, it can be observed that each LFN can hold 13 filename characters. More LFNs are needed if a filename is longer than 13 characters. As discussed above, the LFN entries are laid out in an order in which the directory entry for the file is immediately preceded by first LFN entry, second LFN entry, ... Each LFN is labelled with a sequence number, starting at 1 and increasing for each LFN entry until the final entry whose sequence number is ORed with 0x40.

The file attributes of a LFN entry must be 0x0F. The checksum is calculated using the short name of the file, and it should be the same for each of the LFN entries for the file. If any of the checksums in the set of long entries do not agree with the computed checksum of the name contained in the short entry, then the long entries are treated as orphans. This can occur if a disk containing long and short entries is taken to a previous version of MS-DOS/Windows and only the short name of a file or directory with a long entries is renamed. The checksum algorithm iterates over each letter in the name, and at each step it rotates the current checksum by one bit to the right and then adds the ASCII value of the next letter. The algorithm, implemented in C, for computing the checksum is given below.

chk = 0
for i = 0..10:
	 // Rotate c to the right and add ASCII character from name
    chk = ( (chk & 0x1) ? 0x80 : 0 ) + (chk >> 1) + short_name[i]
    chk &= 0xFF

The checksum is used by Windows to detect orphaned or corrupt LFN entries. Let us look at the LFN and normal entry for a file in the root directory of our test image:

We see that byte offset 0xB is set to 0x0f, so we know it is an LFN entry. The sequence number is in byte offset 0x0, and it is 0x41. As stated previously, the sequence byte in the first entry is OR’d with 0x40. In this case, the sequence number of 0x41 means it is the first entry and also the last. The checksum in byte offset 0xD gives a value of 0x13. The first five characters in this entry are in bytes offset 0x1-0xA, and when decoded in ASCII give ‘prima'. The second section of characters in bytes 0xE-0x19, when decoded in ASCII, gives ‘ry-1.t’. The final section of characters in bytes offset 0x1C to 0x1F gives 'xt'. Appending these characters together, we get "primary-1.txt". The short version of this name can be found in the third entry, which is a normal directory entry and has the name ‘PRIMAR~1.TXT’.

One consequence of this approach that a forensic analyst needs to be aware of is that SFNs of copied files, which are generated by Windows at the time of the copy, may be different in different directories. This is because the value of the last digit in the SFN depends on whether there is one or more files with similar starting LFNs in the directory to which the copy is being made. Although the LFN of the copied file will always be the same as the original, the SFN of the copied file might not be the same. This means that we could have LFNs of the same name and referring to copies of the same file residing in different directories, each having entirely different SFNs. We could also have SFNs of exactly the same names in different directories referring to entirely different files.

We can now verify the checksum, but first we need to know the ASCII values for the characters in binary. This is shown in the table below.

Character	Hex	Binary
P	50	0011 0010
R	52	0101 0010
I	49	0100 1001
M	4D	0100 1101
A	41	0100 0001
R	52	0101 0010
~	7E	0111 1110
1	31	0011 0001
T	54	0101 0100
X	58	0101 1000
T	54	0101 0100

The first step is to assign a variable ‘chk’ to the value of the first letter of the name, ‘P’.

chk = 01010000

For the remaining 10 rounds, we rotate the current checksum to the right by one bit and then add the next letter. Rotating and adding the letter R, we obtain.

chk = 00101000 + 01010010 = 0111 1010

Rotating again and adding the next letter, I, we obtain the following:

chk = 0011 1101 + 0100 1001 = 1000 0110

The following table shows the computation of the FAT Long File Name (LFN) checksum for the short filename 'PRIMAR~1TXT'. Each step displays the ASCII binary value, previous checksum, new checksum in binary, and its decimal equivalent.

i	Char	Byte (ASCII binary)	Prev chk (binary)	New chk (binary)	New chk (decimal)
0	P	01010000	00000000	01010000	80
1	R	01010010	01010000	01111010	122
2	I	01001001	01111010	10000110	134
3	M	01001101	10000110	10010000	144
4	A	01000001	10010000	10001001	137
5	R	01010010	10001001	00010110	22
6	~	01111110	00010110	10001001	137
7	1	00110001	10001001	11110101	245
8	T	01010100	11110101	01001110	78
9	X	01011000	01001110	01111111	127
10	T	01010100	01111111	00010011	19 (0x13)

The final value of 0x13 is the same that we saw in the LFN entry.

Facebook SDK

Analyzing The FAT Directory Entry For Digital Evidence

The Root Directory

The Long File Name

Joseph Moronwi

Post a Comment

Post a Comment

Contact Form