Keyword Forensics

The forensic keyword search typically follows a process where an analyst acquire a questionable hard drive and creates a repository of keyword(s), also known as “dirty words”, to search the disk image. 

However, it is a challenge to find a string type keyword when the hard disk image is comprised of binary data. Therefore, we first need to extract printable data from a binary image disk.

In order for a disk or partition image to be searched by using a search tool like “grep”, we will need to print the strings of printable characters in the disk image into a text file, and then a search can be performed against the text file instead of the image file. The Sleuthkit provides a tool called srch_strings to print the strings of printable characters in files, and the investigator will also need to print the location of the string so that the location (or the byte offset) can be used later to locate the data unit which contains any keywords of interest to the investigation. Afterwards, you can search the resulted text file, based on your defined keywords by using the “grep” command. It is worth noting that if we were to simply perform a grep on the image we would not have made any of these hits at all. Thus, we will search the text file resulted from the command srch_strings.

The output .asc file from the srch_strings command contains all the printable data along with their locations in the disk image. Afterwards, we can search the keyword within the .asc file, for example, using the grep command. If a match is found, the analyst perform further analysis by discovering the meta-data structure for the file that occupies the disk unit where keyword resides.

Other noteworthy functions from the sleuthkit which will be useful in our analysis include:

  • blkcat - used to display contents of data unit containing keywords.
  • ifind - used to find metadata structure that allocates or points to a given data unit.
  • istat - used to display details of a given meta-data structure.

  • Henceforth, the analyst can view data by either:

    • Retrieving the data unit that contains the dirty keywords (using blkcat).
    • Figuring out which file dirty keyword(s) reside in (using ifind)
    • The details of the file meta-data structure (using istat).

    Grep And Regular Expressions

    The Globally search a Regular Expression and Print (grep) command is a Linux tool used to find input files (or standard input when there is no file to name) for a given line of data. It matches based on a regular expression, which is a method for specifying a set of strings. The basic usage of grep command is to search for a specific string, represented by a regular expression in specified file(s).

    Regular expressions (RegEx) provide a basic and extended standard syntax for creating patterns designed specifically to lookup a set of strings from a list of elements or to verify if a given string follows a particular arrangement (for example IP address, email address, phone number, and so on). Literally, Basic Regular Syntax (BRE) and Extended Regular Syntax (ERE) work together. However, BRE requires that the meta-characters ( ) and { } be designated and \{\}, whereas ERE does not. Also, ERE introduces more meta-characters, including ?, +, and |

    For example, a basic regular expression [a-z] matches any single lowercase character while an extended regular expression /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.] {2,6})([\/\w\.-]*)*\/?$/ matches “http://”, “https://”,or neither of them, followed by a series of digits and letters, followed by a single dot and more digits and letters after another single dot, finally followed by a single “/”.





    Matches the following item at the beginning of a text line


    Matches the preceding item at the end of a text line


    Matches any single character


    A bracket expression. Matches a single character in the bracketed list or range


    Matches a single character that is not contained within the brackets


    De fines a marked sub-expression. A marked sub-expression is also called a block or capturing group. BRE mode requires


    Matches the preceding item zero or more time


    The preceding item is matched exactly m times. BRE mode requires \{m\}


    The preceding item is matched N or more times. BRE mode requires \{m,\}


    Matches the preceding item at least m and not more than n times. BRE mode requires \{m,n\}


    The escape of special meaning of next character

    The next three meta-characters are only for extended regular expression:




    Matches the preceding character, meta-character, or expression zero or one time


    Matches the preceding character, meta-character, or expression one or more times. There is no limit to the amount of times it can be matched


    Matches the character, meta-character, or expression on either side of it

    Note that to use the grep command to search for meta-characters, you have to use a backslash (\) to escape the meta-character. For example, the regular expression “^\.” matches lines that start with a period.

    In the table below, I will show how to use grep with examples.

    grep hackers files

    search files for lines with “hackers”

    grep 'hackers?' files

    search files for lines with “hackers” or “hacker”

    grep '^hackers' files

    “hackers” at the start of a line

    grep 'hackers$' files

    “hackers” at the end of a line

    grep '^hackers$' files

    lines containing only “hackers”

    grep '[Hh]ackers' files

    search for “Hackers” or “hackers”

    grep '\^f' files

    search files for lines with “^h”, “\” escapes the ^

    grep '^$' files

    search for blank lines

    grep '[0-9][0-9] [0-9]' files

    search for triples of numeric digits

    grep -f hack.txt files

    The -f option specifies a file where grep reads patterns. In this example, the search patterns are contained in a file called hack.txt, one per line

    Having introduced srch_strings, grep, and regular expressions, I will now show a demonstration of what has been discussed so far.


    Environment Set Up

    In this post, I will be using a Windows 10 machine as my forensic workstation. I will be using Kali Linux virtual machine as my guest OS on which I will be doing my keyword search analysis, and Oracle VM VirtualBox as my type 2 hypervisor. My forensic image is a FAT32-formatted USB drive.

    To conduct my analysis via my Kali Linux VM, I will create a shared folder between my Windows host OS and my Kali guest OS via the steps shown in the image below.


    After following the above steps carefully, I will boot up my Kali Linux VM and my shared folder will be revealed as shown below.

    You are one step closer to sharing files between your host OS and guest OS at this point. To fully start sharing files, open the terminal in your Kali VM and type the below command.


    sudo mount -a
    sudo usermod -G vboxsf -a you #where you is the non-root user

    Reboot the Kali Linux VM and you are ready to start sharing files between your host and guest OS. Simply place your forensic image in the shared folder directory of your host OS and it will be seen in the shared folder directory in your guest OS as shown in the figure above.

    Keyword Search Forensics

    I assume that law enforcement authorities confiscated the hard disk (or USB drive) of a suspect and you are asked to analyze it using its bitstream image provided. In my example case, I have a FAT32-formatted USB drive containing a secret MS word document named secret.docx. In an attempt to evade detection, the criminal hid this document in a JPG image file (which is not yet known among thousands of JPG image files) using the copy command in Windows so that the image and not the word document is visible to investigators. Your mission is to find this secret word document and view its content.

    For ease of illustration, we assume that a keyword “secret” is the sensitive data which we are interested in.

    As discussed earlier, we first need to extract printable data from a binary image disk using TSK’s srch_strings command.

    srch_strings –t d fatimage.001 > fat-kw.ascii.str

    where the “-t d” option specifies a location for the discovered string to be output and the location is using byte offset in decimal from the beginning of the partition (or the FAT file system in this example).

    Now we can use grep to search keywords we are interested in. In my example, I will search a particularly word “secret” using the following command. Note that the search should be case insensitive here.

    grep –i secret fat-kw.ascii.str

    where the “-i” specifies that the matching will be case insensitive.

    It can be observed that the word “secret” appears in a strings located at different byte offsets. Our target file (secret.docx) however appears in strings located at bytes offsets 17477168, 49593271, and 49602096. Nevertheless, hard disk uses sector address to locate an area on disk, whereas a file system uses cluster or block number to identify a data unit on disk. Thus, we need to convert byte offset to sector address and then cluster or block address. Regarding conversion of byte offset to sector address in a partition, you can divide the offset by the sector size i.e. 512 bytes and determine the sector address by obtaining the floor (rounded down) integer number of the quotient.

    sector address = floor(17477168/512) = 34135
    sector address = floor (49593271/512) = 96861
    sector address = floor(49602096/512) = 96879

    where floor() is floor function, which outputs the largest integer less than or equal to the input.

    Now we know the word “secret” resides in sectors whose addresses are 34135, 96861, and 96879. Henceforth, we will conduct a more in-depth investigation. First, we can view the contents of data unit (or a sector here) using blkcat command. Using sector address 96879, the blkcat command and output is as follows

    blkcat -h fatimage.001 96879

    Next, let us figure out which file the word resides in. First, we can find the metadata structure that has allocated the above disk unit using the following command.

    ifind -f fat -d 96879 fatimage.001

    Next, we can find the name of the file (or directory) using the above metadata structure using the following command.

    ffind fatimage.001 12

    It can be observed that a file called “instagram.jpg” in the root directory contains the word “secret”. How could an MS word document (with a .docx extension) be contained in an image (with a .jpg extension)? This could only mean one thing – Data hiding.

    We can display the details of the file meta-data structure using istat command

    istat -f fat fatimage.001 12

    The above output gives more information about the suspect file. The suspect file was hidden inside the image file instagram.jpg and placed at the root directory (as revealed by the ffind command). The file was subsequently deleted by the suspect (as revealed by the istat command). If this was the only instance of the suspected file found on the disk image, then an examination of the unallocated space will be required by the investigator. File carving and/or slack space analysis will be necessary further steps.

    Repeating the same process with sector address 34135, the blkcat command and output is as follows:

    Finding the metadata structure using the ifind command reveals the following output.

    Finding the name of the file using the ffind command reveals the following output.

    Displaying the details of the metadata using the istat command reveals the following.

    In this case, the file is allocated and located at the root directory, the investigator can then navigate to the said directory and examine the file.

    In order to retrieve the hidden file inside the picture we need to rename the extension of our newly created picture to .zip (it was done using WinRAR) and then open it using any compression utility we have. Alternatively we can simply right-click over the image then open it using WinRAR or 7-zip program without renaming it to view the hidden contents.

    The quality of the results of keyword analysis depends on the quality of keywords. It is advisable to avoid keywords such as, for example user name or the name of the computer, because this generates thousands of hits both in documents as well as in the system registry as system continuously adds and deletes items. Inappropriate choice of keywords results in a large number of hits in the files and in the unallocated space. It is much better to enter search patterns in the form of whole sentences, but then you must know the content of the document (e.g. its printed version).

    Keywords search is useful in the case of deleted files with lost signatures. Retrieving the desired file boils down to the analysis of the vicinity of keywords. Due to the considerable narrowing of the search scope file, recovery can be performed manually.

    Post a Comment

    Previous Post Next Post