File carving is the process of reconstructing files based solely on their contents. In other words, the technique of recovering files from a block of binary data without using any information available in the file system structure. There are many tools available, using such technology, and they are called file carvers. As early as 2002, research in digital forensics focused on the recovery of data files independent of the file metadata was begun. Using only the content found inside the data blocks themselves, researchers have attempted to reconstruct files in whole or in part using techniques that will be discussed in this post.
Principles Of File Carving
File carvers are able to recover files using only the information available in the data blocks stored on the disk. Using information about their unique headers, footers (or the file size), and even internal data structure, file carvers reassemble contiguous data blocks beginning with the file header and ending at the file footer, if present. This type of file carving technique is also known as Header/Footer carving.
The figure below shows a hex dump of the first 512 bytes of a BMP file by using a hex editor. The left side is the offset address, in hexadecimal format, which is used to locate individual bytes (start at byte offset 0). The middle is the hex dump data, and the right is the ASCII interpretation of the dump data. Each byte represents a two-digit hexadecimal number. As can be seen in the figure, the first two bytes are “0x424D”, which is a special marker indicating the beginning (the header) of a BMP file, and 4 bytes of data starting at byte offset 3 is the file size, which is “46 07 00 00”. However, since the computer here uses the little-endian system, the real value of this 4-byte number is “0x00000746”. It means the file size is 1862 bytes.
Header/Footer Carving
Header/footer carving, also known as file structure-based carving, is based on the fact that most known files have distinctive header/footer structures which can be used to identify the start and end of a file.
The growing demand by forensic investigators and corporations to collect digital evidence from devices with damaged or missing file system structures led to a wave of file carving tools that could carve contiguous files automatically. For example, Foremost works by first creating a configuration file which contains file header/ footer information of certain file format. It then tries to match each of the headers with a corresponding footer. Unfortunately this software will repeatedly search through data that has previously been matched, making it time inefficient. Richard and Roussev proposed to fix this performance issue by creating a high performance multiple file systems carver. The improved file carver, Scalpel, first indexes all headers and footers, then looks for potential matches from within that index which is stored in memory; a much faster method than repeatedly searching the hard drive. Additionally, the software also contains improved memory-to-memory copy operations, as well as faster byte writing output.
These two tools, however, make no effort to validate the recovered data, so false positive header/footer matches result in corrupt file recoveries. Especially in a highly fragmented disk these files contain gibberish at the end, may be incomplete or not viewable in the target program, thus requiring additional manual intervention by the investigator to remove the incorrect data blocks.
Due to the poor performance of header/footer carvers and the lower success rate for file recovery in less than ideal conditions, utilities for searching the dataset for specific file signatures were built up. Investigators looking for specific files could load a source file into the carver and then search the dataset for binary patterns that are similar or the same to the source file. Expert users could create custom hexadecimal signatures (e.g. for files types unknown to the carver) to search for and manually stitch data blocks together.
Many of these header/footer carvers are still in use today and have enjoyed wide commercial success. The most popular among them is EnCase by Guidance Software, which is widely used by law enforcement agencies as well as corporations for a wide variety of forensic investigations. Other popular file carvers in this category include FTK, Testdisk, Photorec.
Bifragment Gap Carving (BCG)
Bifragment gap carving is a data recovery algorithm for a very specific use case. It can recover fragmented files within the following constraints:
- File must be stored in exactly two fragments, separated by a gap which can contain arbitrary data.
- Two fragments must be correctly ordered (that is, gap size must be positive).
- The gap size must be below certain configurable limit. This is not really a theoretical requirement, because one can set limit to be larger than the media size, effectively allowing any gap size. However, because computational requirements increase as the range of possible gap sizes increases, the practical limit is low (think up to 1000 clusters as of 2018). More fundamentally, increasing the gap size increases the probability of false positive causing two fragments of different files to be merged.
Let’s see the diagram. We have a file header and optionally a footer and the file size computed from the header. Distance between header and footer is D, and file size from the header is S.
If there is no footer, the distance D is assumed to be some constant enough to accommodate the largest possible gap and the largest possible file (assume D = Lmax + Smax). Now, if S is known, the gap size L can be computed simply as L = D-S. Then, we need to do approximately N ~ (D-L)/Bvalidation tests, trying each possible position of a gap, and see which one produces the valid file. B here is a block size, typically 512 bytes.
If a validation function is not quite good, there may be several positive validations, in which case several probable files are produced for user to sort out which one is actually needed. First and last positions for the gap are not tested, because these outermost positions are known to be occupied by header and footer which are obviously parts of the file.
Then, if S is not known, the gap size cannot be determined, and the test has to be repeated for each possible gap size up to Lmax = D-Smin, where Smin is some minimum acceptable file size. Smin is typically guessed by looking at known-good files produced be the same camera at the same settings. This makes for approximately N ~ ((D-Lmax)/B)*(Lmax/B) tests.
Being not able to identify the footer does not change much, except that even more tests must be performed after assuming some maximum size Dmax.
This method have a number of limitations. Firstly, it is necessary to carry out a preliminary analysis of the entire disk (signature search, statistical analysis). Such an analysis, although time-consuming, is necessary to define the locations that may contain the files in question. The method does not scale well with increasing distance between the fragments. Computational complexity grows exponentially. Current algorithms enable to recover a file that is split into two parts. A greater degree of fragmentation not only increases the computational complexity, but it can also prevent the recovery of a file if any of the fragments is missing. Erroneous classification of file fragments is yet another serious problem causing a situation in which the recovered files, although they are correct (i.e. they open in the associated programs), contain invalid data such as shown below.
inappropriately carve JPG file |
File Carving Tools
There are different carving tools available, many of them open source. Some are listed below
- Foremost
- Scalpel
- Testdisk and Photorec
In this post, I will illustrate file carving using testdisk and photorec.
Testdisk and Photorec
TestDisk is an open source, multi-platform, data recovery tool developed by the cgsecurity team. It is capable of running on multiple operating systems such as:
- DOS
- Windows XP, 7, and 10
- Linux
- FreeBSD, NetBSD, OpenBSD
- SunOS
- Mac OS
Unlike file carvers specializing in the recovery of deleted or lost files, TestDisk has many useful features when it comes to data recovery. They are as follows:
- Fix Partition table
- Recover deleted partition
- Recover Fat32 boot sector from backup
- Fix FAT table
- Rebuild Fat12/16/32 boot sector
- Rebuild NTFS boot sector
- Recover NTFS boot sector from its backup
- Fix MFT using MFT mirror
- Locate EXT2/3/4 backup superblock
- Undelete files from Fat, exFAT, NTFS and ext2 file systems
- Copy files from deleted FAT, exFAT, NTFS and EXT2/3/4 partitions
There is another open source application that is offered by cgsecurity, which focuses only on file recovery. The application is called PhotoRec, which essentially ignores the file system and goes directly for the data. This application is quite similar to TestDisk since it is also used for data recovery. The main difference between the two is that PhotoRec is strictly only for files while TestDisk offers many other options in which the users can choose from. Also, PhotoRec stands for Photo Recovery, and was originally designed to recover lost pictures or lost files from digital camera memory. PhotoRec is superior when recovering deleted or lost photos and pictures. Actually, PhotoRec is a companion program to TestDisk, and provides file recovery functionality for TestDisk. However, TestDisk supports more data recovery functions, such as deleted or lost partition recovery.
File Carving With PhotoRec
PhotoRec is a file carving tool that is widely used by digital forensic examiners. This tool is even built into Autopsy, as a module. PhotoRec can recover a diverse range of file types (more than 480 file formats), but if you think this will not be enough, you can add your own custom signatures, which will help the tool to recover even more data.
PhotoRec reads the data source sector by sector, and finds the first ten files. It uses them to calculate the block/cluster size after each block is checked by the tool against a signature database. If the file system is not corrupted, PhotoRec can get the block/cluster size from volume boot record, or superblock.
Go to CGSecurity's website and download the latest version of Photorec. Unpack it and you are ready to go. It is important to note that PhotoRec supports disk images: not only RAW but also EO1 formats.
Open the Windows command prompt with administrator priviledge and change the directory to where you have stored your testdisk file enter the following command (Make sure you typed the path to the image you acquired).
photorec_win D:\DRIVE_IMAGE\DRIVE_IMAGE.001
The first dialog box that you see is 'Select a media'. In my case, I am dealing with an RAW image, so I have only one option, and all I need to do is press Enter to proceed.
Now we have the Partition selection dialog box. In our case, we have only one partition of the type NTFS.
Four options are seen at the bottom of the screen displayed. They include:
- Search - To start recovery of deleted files
- Options - To modify recovery options
- File Opt - To modify file type to be recovered
- Quit - To cancel recovery of files
Navigating to Options, the following is displayed:
- Paranoid - if enabled, verifies recovered files, and invalid files are rejected. Another option here, bruteforce, if enabled, tries to recover fragmented JPG files.
- Keep corrupted files - if enabled, keeps invalid files. Use it if you want to try to repair them with other tools.
- Expert mode - if enabled, allows an examiner to force the block size and the offset.
- Low memory - use it if your workstation doesn't have enough memory to avoid recovery crashes.
Navigating to File Opt, we have a long list of file types supported by the tool. Use the s button to check all file types or to disable all. Use spacebar if you want to enable or disable some of the types. To save changes use b.
Having gone through the available options, I will start the file recovery process.
Choose Search and press Enter. It's time to choose the file system type. We know that there are only two options and that there are no EXT partitions on our image, so we choose Other.
Next, we need to determine if we are recovering files from the whole partition or from the unallocated space only. Choose Free and press Enter.
Now we need to choose the destination path for the files being recovered. It is recommended that you create one before starting the file carving process. In my case, the destination folder is D:\Carved, as you can see in the following figure.
Use the C button to start the file carving process. Once the process is finished, you will have one or more folders (recup_dir.1, recup_dir.2...) with recovered files. It's important to note that these folders can be accessed before the recovery is finished.
File Carving With Testdisk
Launch TestDisk by loading the USB drive image acquired above using the following command.
testdisk_win D:\DRIVE_IMAGE\DRIVE_IMAGE.001
No partitions are showing. Select the “Quick Search” option and press ENTER to continue.
- Confirm whether TestDisk search for the partition created by Vista or later. Press Y if yes/not sure. N otherwise.
- After the scan is over, all the partitions found by “Quick Search” will be displayed. Also, if any detected partition is not corrupted, it will be displayed in green. It can be observed that the deleted partition is detected by TestDisk. Next, we will restore deleted partition and press ENTER to continue.
- Select the “Write” option, and Press ENTER to continue.
- Confirm write partition table to the disk image by pressing Y
Post a Comment