The forensic keyword search typically follows a process where an analyst acquire a questionable hard drive and creates a repository of keyword(s), also known as “dirty words”, to search the disk image.

However, it is a challenge to ﬁnd a string type keyword when the hard disk image is comprised of binary data. Therefore, we ﬁrst need to extract printable data from a binary image disk.

In order for a disk or partition image to be searched by using a search tool like “grep”, we will need to print the strings of printable characters in the disk image into a text ﬁle, and then a search can be performed against the text ﬁle instead of the image ﬁle. The Sleuthkit provides a tool called srch_strings to print the strings of printable characters in ﬁles, and the investigator will also need to print the location of the string so that the location (or the byte offset) can be used later to locate the data unit which contains any keywords of interest to the investigation. Afterwards, you can search the resulted text ﬁle, based on your deﬁned keywords by using the “grep” command. It is worth noting that if we were to simply perform a grep on the image we would not have made any of these hits at all. Thus, we will search the text ﬁle resulted from the command srch_strings.

The output .asc ﬁle from the srch_strings command contains all the printable data along with their locations in the disk image. Afterwards, we can search the keyword within the .asc ﬁle, for example, using the grep command. If a match is found, the analyst perform further analysis by discovering the meta-data structure for the ﬁle that occupies the disk unit where keyword resides.

Other noteworthy functions from the sleuthkit which will be useful in our analysis include:

blkcat - used to display contents of data unit containing keywords.

ifind - used to ﬁnd metadata structure that allocates or points to a given data unit.

istat - used to display details of a given meta-data structure.

Henceforth, the analyst can view data by either:

Retrieving the data unit that contains the dirty keywords (using blkcat).
Figuring out which ﬁle dirty keyword(s) reside in (using iﬁnd)
The details of the ﬁle meta-data structure (using istat).

Grep And Regular Expressions

The Globally search a Regular Expression and Print (grep) command is a Linux tool used to ﬁnd input ﬁles (or standard input when there is no ﬁle to name) for a given line of data. It matches based on a regular expression, which is a method for specifying a set of strings. The basic usage of grep command is to search for a speciﬁc string, represented by a regular expression in speciﬁed ﬁle(s).

Regular expressions (RegEx) provide a basic and extended standard syntax for creating patterns designed speciﬁcally to lookup a set of strings from a list of elements or to verify if a given string follows a particular arrangement (for example IP address, email address, phone number, and so on). Literally, Basic Regular Syntax (BRE) and Extended Regular Syntax (ERE) work together. However, BRE requires that the meta-characters ( ) and { } be designated and \{\}, whereas ERE does not. Also, ERE introduces more meta-characters, including ?, +, and |

For example, a basic regular expression [a-z] matches any single lowercase character while an extended regular expression /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.] {2,6})([\/\w\.-]*)*\/?$/ matches “http://”, “https://”,or neither of them, followed by a series of digits and letters, followed by a single dot and more digits and letters after another single dot, ﬁnally followed by a single “/”.

REGULAR EXPRESSION METACHARACTERS
Metacharacter	Description
^	Matches the following item at the beginning of a text line
$	Matches the preceding item at the end of a text line
.	Matches any single character
[...]	A bracket expression. Matches a single character in the bracketed list or range
[^...]	Matches a single character that is not contained within the brackets
()	De ﬁnes a marked sub-expression. A marked sub-expression is also called a block or capturing group. BRE mode requires
*	Matches the preceding item zero or more time
{m}	The preceding item is matched exactly m times. BRE mode requires \{m\}
{m,}	The preceding item is matched N or more times. BRE mode requires \{m,\}
{m,n}	Matches the preceding item at least m and not more than n times. BRE mode requires \{m,n\}
\	The escape of special meaning of next character

The next three meta-characters are only for extended regular expression:

Metacharacter	Description
?	Matches the preceding character, meta-character, or expression zero or one time
+	Matches the preceding character, meta-character, or expression one or more times. There is no limit to the amount of times it can be matched
\|	Matches the character, meta-character, or expression on either side of it

Note that to use the grep command to search for meta-characters, you have to use a backslash (\) to escape the meta-character. For example, the regular expression “^\.” matches lines that start with a period.

In the table below, I will show how to use grep with examples.

grep hackers files	search ﬁles for lines with “hackers”
grep 'hackers?' ﬁles	search ﬁles for lines with “hackers” or “hacker”
grep '^hackers' ﬁles	“hackers” at the start of a line
grep 'hackers$' ﬁles	“hackers” at the end of a line
grep '^hackers$' ﬁles	lines containing only “hackers”
grep '[Hh]ackers' ﬁles	search for “Hackers” or “hackers”
grep '\^f' ﬁles	search ﬁles for lines with “^h”, “\” escapes the ^
grep '^$' ﬁles	search for blank lines
grep '[0-9][0-9] [0-9]' ﬁles	search for triples of numeric digits
grep -f hack.txt ﬁles	The -f option speciﬁes a ﬁle where grep reads patterns. In this example, the search patterns are contained in a ﬁle called hack.txt, one per line

Having introduced srch_strings, grep, and regular expressions, I will now show a demonstration of what has been discussed so far.

Environment Set Up

In this post, I will be using a Windows 10 machine as my forensic workstation. I will be using Kali Linux virtual machine as my guest OS on which I will be doing my keyword search analysis, and Oracle VM VirtualBox as my type 2 hypervisor. My forensic image is a FAT32-formatted USB drive.

To conduct my analysis via my Kali Linux VM, I will create a shared folder between my Windows host OS and my Kali guest OS via the steps shown in the image below.

After following the above steps carefully, I will boot up my Kali Linux VM and my shared folder will be revealed as shown below.

You are one step closer to sharing files between your host OS and guest OS at this point. To fully start sharing files, open the terminal in your Kali VM and type the below command.

sudo mount -a
sudo usermod -G vboxsf -a you #where you is the non-root user

Reboot the Kali Linux VM and you are ready to start sharing files between your host and guest OS. Simply place your forensic image in the shared folder directory of your host OS and it will be seen in the shared folder directory in your guest OS as shown in the figure above.

Keyword Search Forensics

I assume that law enforcement authorities conﬁscated the hard disk (or USB drive) of a suspect and you are asked to analyze it using its bitstream image provided. In my example case, I have a FAT32-formatted USB drive containing a secret MS word document named secret.docx. In an attempt to evade detection, the criminal hid this document in a JPG image file (which is not yet known among thousands of JPG image files) using the copy command in Windows so that the image and not the word document is visible to investigators. Your mission is to find this secret word document and view its content.

For ease of illustration, we assume that a keyword “secret” is the sensitive data which we are interested in.

As discussed earlier, we ﬁrst need to extract printable data from a binary image disk using TSK’s srch_strings command.

srch_strings –t d fatimage.001 > fat-kw.ascii.str

where the “-t d” option speciﬁes a location for the discovered string to be output and the location is using byte offset in decimal from the beginning of the partition (or the FAT ﬁle system in this example).

Now we can use grep to search keywords we are interested in. In my example, I will search a particularly word “secret” using the following command. Note that the search should be case insensitive here.

grep –i secret fat-kw.ascii.str

where the “-i” speciﬁes that the matching will be case insensitive.

It can be observed that the word “secret” appears in a strings located at different byte offsets. Our target file (secret.docx) however appears in strings located at bytes offsets 17477168, 49593271, and 49602096. Nevertheless, hard disk uses sector address to locate an area on disk, whereas a ﬁle system uses cluster or block number to identify a data unit on disk. Thus, we need to convert byte offset to sector address and then cluster or block address. Regarding conversion of byte offset to sector address in a partition, you can divide the offset by the sector size i.e. 512 bytes and determine the sector address by obtaining the ﬂoor (rounded down) integer number of the quotient.

sector address = floor(17477168/512) = 34135
sector address = floor (49593271/512) = 96861
sector address = floor(49602096/512) = 96879

where ﬂoor() is ﬂoor function, which outputs the largest integer less than or equal to the input.

Now we know the word “secret” resides in sectors whose addresses are 34135, 96861, and 96879. Henceforth, we will conduct a more in-depth investigation. First, we can view the contents of data unit (or a sector here) using blkcat command. Using sector address 96879, the blkcat command and output is as follows

blkcat -h fatimage.001 96879

Next, let us ﬁgure out which ﬁle the word resides in. First, we can ﬁnd the metadata structure that has allocated the above disk unit using the following command.

ifind -f fat -d 96879 fatimage.001

Next, we can ﬁnd the name of the ﬁle (or directory) using the above metadata structure using the following command.

ffind fatimage.001 12

It can be observed that a ﬁle called “instagram.jpg” in the root directory contains the word “secret”. How could an MS word document (with a .docx extension) be contained in an image (with a .jpg extension)? This could only mean one thing – Data hiding.

We can display the details of the ﬁle meta-data structure using istat command

istat -f fat fatimage.001 12

The above output gives more information about the suspect file. The suspect file was hidden inside the image file instagram.jpg and placed at the root directory (as revealed by the ffind command). The file was subsequently deleted by the suspect (as revealed by the istat command). If this was the only instance of the suspected file found on the disk image, then an examination of the unallocated space will be required by the investigator. File carving and/or slack space analysis will be necessary further steps.

Repeating the same process with sector address 34135, the blkcat command and output is as follows:

Finding the metadata structure using the ifind command reveals the following output.

Finding the name of the file using the ffind command reveals the following output.

Displaying the details of the metadata using the istat command reveals the following.

In this case, the file is allocated and located at the root directory, the investigator can then navigate to the said directory and examine the file.

In order to retrieve the hidden file inside the picture we need to rename the extension of our newly created picture to .zip (it was done using WinRAR) and then open it using any compression utility we have. Alternatively we can simply right-click over the image then open it using WinRAR or 7-zip program without renaming it to view the hidden contents.

The quality of the results of keyword analysis depends on the quality of keywords. It is advisable to avoid keywords such as, for example user name or the name of the computer, because this generates thousands of hits both in documents as well as in the system registry as system continuously adds and deletes items. Inappropriate choice of keywords results in a large number of hits in the files and in the unallocated space. It is much better to enter search patterns in the form of whole sentences, but then you must know the content of the document (e.g. its printed version).

Keywords search is useful in the case of deleted files with lost signatures. Retrieving the desired file boils down to the analysis of the vicinity of keywords. Due to the considerable narrowing of the search scope file, recovery can be performed manually.

Facebook SDK

Keyword Forensics

Grep And Regular Expressions

Environment Set Up

Keyword Search Forensics

Joseph Moronwi

Post a Comment

Post a Comment

Contact Form