The forensic keyword search typically follows a process where an analyst acquire a questionable hard drive and creates a repository of keyword(s), also known as “dirty words”, to search the disk image.
However, it is a challenge to find a string type keyword when the hard disk image is comprised of binary data. Therefore, we first need to extract printable data from a binary image disk.
In order for a disk or partition image to be searched by using a search tool like “grep”, we will need to print the strings of printable characters in the disk image into a text file, and then a search can be performed against the text file instead of the image file. The Sleuthkit provides a tool called srch_strings to print the strings of printable characters in files, and the investigator will also need to print the location of the string so that the location (or the byte offset) can be used later to locate the data unit which contains any keywords of interest to the investigation. Afterwards, you can search the resulted text file, based on your defined keywords by using the “grep” command. It is worth noting that if we were to simply perform a grep on the image we would not have made any of these hits at all. Thus, we will search the text file resulted from the command srch_strings.
The output .asc file from the srch_strings command contains all the printable data along with their locations in the disk image. Afterwards, we can search the keyword within the .asc file, for example, using the grep command. If a match is found, the analyst perform further analysis by discovering the meta-data structure for the file that occupies the disk unit where keyword resides.
Other noteworthy functions from the sleuthkit which will be useful in our analysis include:
Henceforth, the analyst can view data by either:
- Retrieving the data unit that contains the dirty keywords (using blkcat).
- Figuring out which file dirty keyword(s) reside in (using ifind)
- The details of the file meta-data structure (using istat).
Grep And Regular Expressions
The Globally search a Regular Expression and Print (grep) command is a Linux tool used to find input files (or standard input when there is no file to name) for a given line of data. It matches based on a regular expression, which is a method for specifying a set of strings. The basic usage of grep command is to search for a specific string, represented by a regular expression in specified file(s).
?
, +
, and |
For example, a basic regular expression [a-z]
matches any single lowercase character while an extended regular expression /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.] {2,6})([\/\w\.-]*)*\/?$/
matches “http://”, “https://”,or neither of them, followed by a series of digits and letters, followed by a single dot and more digits and letters after another single dot, finally followed by a single “/”.
REGULAR EXPRESSION METACHARACTERS |
|
Metacharacter |
Description |
^ |
Matches the following item at the beginning of a text line |
$ |
Matches the preceding item at the end of a text line |
. |
Matches any single character |
[...] |
A bracket expression. Matches a single character in the bracketed list or range |
[^...] |
Matches a single character that is not contained within the brackets |
() |
De fines a marked sub-expression. A marked sub-expression is also called a block or capturing group. BRE mode requires |
* |
Matches the preceding item zero or more time |
{m} |
The preceding item is matched exactly m times. BRE mode requires \{m\} |
{m,} |
The preceding item is matched N or more times. BRE mode requires \{m,\} |
{m,n} |
Matches the preceding item at least m and not more than n times. BRE mode requires \{m,n\} |
\ |
The escape of special meaning of next character |
The next three meta-characters are only for extended regular expression:
Metacharacter |
Description |
? |
Matches the preceding character, meta-character, or expression zero or one time |
+ |
Matches the preceding character, meta-character, or expression one or more times. There is no limit to the amount of times it can be matched |
| |
Matches the character, meta-character, or expression on either side of it |
Note that to use the grep command to search for meta-characters, you have to use a backslash (\) to escape the meta-character. For example, the regular expression “^\.” matches lines that start with a period.
In the table below, I will show how to use grep with examples.
grep hackers files |
search files for lines with “hackers” |
grep 'hackers?' files |
search files for lines with “hackers” or “hacker” |
grep '^hackers' files |
“hackers” at the start of a line |
grep 'hackers$' files |
“hackers” at the end of a line |
grep '^hackers$' files |
lines containing only “hackers” |
grep '[Hh]ackers' files |
search for “Hackers” or “hackers” |
grep '\^f' files |
search files for lines with “^h”, “\” escapes the ^ |
grep '^$' files |
search for blank lines |
grep '[0-9][0-9] [0-9]' files |
search for triples of numeric digits |
grep -f hack.txt files |
The -f option specifies a file where grep reads patterns. In this example, the search patterns are contained in a file called hack.txt, one per line |
Having introduced srch_strings, grep, and regular expressions, I will now show a demonstration of what has been discussed so far.
Environment Set Up
In this post, I will be using a Windows 10 machine as my forensic workstation. I will be using Kali Linux virtual machine as my guest OS on which I will be doing my keyword search analysis, and Oracle VM VirtualBox as my type 2 hypervisor. My forensic image is a FAT32-formatted USB drive.
To conduct my analysis via my Kali Linux VM, I will create a shared folder between my Windows host OS and my Kali guest OS via the steps shown in the image below.
After following the above steps carefully, I will boot up my Kali Linux VM and my shared folder will be revealed as shown below.
You are one step closer to sharing files between your host OS and guest OS at this point. To fully start sharing files, open the terminal in your Kali VM and type the below command.
sudo mount -a
sudo usermod -G vboxsf -a you #where you is the non-root user
Reboot the Kali Linux VM and you are ready to start sharing files between your host and guest OS. Simply place your forensic image in the shared folder directory of your host OS and it will be seen in the shared folder directory in your guest OS as shown in the figure above.
Keyword Search Forensics
For ease of illustration, we assume that a keyword “secret” is the sensitive data which we are interested in.
As discussed earlier, we first need to extract printable data from a binary image disk using TSK’s srch_strings command.
bashsrch_strings –t d fatimage.001 > fat-kw.ascii.str
where the “-t d” option specifies a location for the discovered string to be output and the location is using byte offset in decimal from the beginning of the partition (or the FAT file system in this example).
Now we can use grep to search keywords we are interested in. In my example, I will search a particularly word “secret” using the following command. Note that the search should be case insensitive here.
bashgrep –i secret fat-kw.ascii.str
It can be observed that the word “secret” appears in a strings located at different byte offsets. Our target file (secret.docx) however appears in strings located at bytes offsets 17477168, 49593271, and 49602096. Nevertheless, hard disk uses sector address to locate an area on disk, whereas a file system uses cluster or block number to identify a data unit on disk. Thus, we need to convert byte offset to sector address and then cluster or block address. Regarding conversion of byte offset to sector address in a partition, you can divide the offset by the sector size i.e. 512 bytes and determine the sector address by obtaining the floor (rounded down) integer number of the quotient.
bashsector address = floor(17477168/512) = 34135 sector address = floor (49593271/512) = 96861 sector address = floor(49602096/512) = 96879
Now we know the word “secret” resides in sectors whose addresses are 34135, 96861, and 96879. Henceforth, we will conduct a more in-depth investigation. First, we can view the contents of data unit (or a sector here) using blkcat command. Using sector address 96879, the blkcat command and output is as follows
bashblkcat -h fatimage.001 96879
bashifind -f fat -d 96879 fatimage.001
bashffind fatimage.001 12
It can be observed that a file called “instagram.jpg” in the root directory contains the word “secret”. How could an MS word document (with a .docx extension) be contained in an image (with a .jpg extension)? This could only mean one thing – Data hiding.
bashistat -f fat fatimage.001 12
The above output gives more information about the suspect file. The suspect file was hidden inside the image file instagram.jpg and placed at the root directory (as revealed by the ffind command). The file was subsequently deleted by the suspect (as revealed by the istat command). If this was the only instance of the suspected file found on the disk image, then an examination of the unallocated space will be required by the investigator. File carving and/or slack space analysis will be necessary further steps.
Finding the name of the file using the ffind command reveals the following output.
Displaying the details of the metadata using the istat command reveals the following.
The quality of the results of keyword analysis depends on the quality of keywords. It is advisable to avoid keywords such as, for example user name or the name of the computer, because this generates thousands of hits both in documents as well as in the system registry as system continuously adds and deletes items. Inappropriate choice of keywords results in a large number of hits in the files and in the unallocated space. It is much better to enter search patterns in the form of whole sentences, but then you must know the content of the document (e.g. its printed version).
Post a Comment