There are entire books dedicated to Google searching and Google hacking. Most of these focus on penetration testing and securing computer networks. These are full of great information, but are often overkill for the investigator looking for quick personal information. A few simple rules can help locate more accurate data.
Search Operators
Most search engines allow the use of commands within the search field. These commands are not actually part of the search terms and are referred to as operators. There are two parts to most operator searches, and each are separated by a colon. To the left of the colon is the type of operator, such as "site" (website) or "ext" (file extension). To the right is the rule for the operator, such as the target domain or file type. This post will explain each operator and the most appropriate uses.
The Site Operator
The site operator asks Google to search within one website or domain. This operator provides two benefits to the search results. First, it will only provide results of pages located on a specific domain. Second, it will provide all of the results containing the search terms on that domain. If you want to view every page on a specific domain that includes your target of interest, the site operator is required.
To see how many pages Google has indexed for a page, enter the following query
site:eforensicsmag.com
But how many of these are blog posts? Let us find out
site:eforensicsmag.com/blog
Note: Google only gives a rough approximation when using this operator. For the full picture, check Google Search Console.
Next, I conducted the following exact search.
site:eforensicsmag.com "Joseph Moronwi"
The result was all eleven pages on eforensicsmag.com
that include my name within the content. This
technique can be applied to any domain. This includes social networks, blogs, and any other
website that is indexed by search engines.
To view the subdomains of the target website, enter the following query
site:*.google.com -www
To find unsecure pages of a target domain, enter the following query
site:google.com -inurl:https
Using the same operator, you can restrict your search within one domain type. An example usage is given below.
computer forensics site:gov
This searches for the term computer forensics in all websites with the .gov domain. Jung Kim showed a nice dork to find people within GitHub:
site:github.com/orgs/*/people
Simply replace the asterik (*) with the organisation's name to target people from a specific organization
The Filetype Operator
Another operator that works with both Google and Bing is the filetype
filter. It allows you to
filter any search results by a single file type extension. While Google allows this operator to be
shortened to "ext", Bing does not. When using the filetype suffix with your search terms, Google will
restrict the results to web pages that end with this extension.
Consider the following search attempting to locate PDF files associated with the terror group ISIS.
"ISIS" filetype:pdf
There are many uses for this technique.
A search of filetype:doc "resume" "target name"
often provides resumes created by the target
which can include cellular telephone numbers, personal addresses, work history, education
information, references, and other personal information that would never be intentionally posted
to the internet. The "filetype" operator can identify any file by the file type within any website.
This can be combined with the "site" operator to find all files of any type on a single domain. By
conducting the following searches, I was able to find several documents stored on the website cnn.com
site:cnn.com filetype:pdf
site:cnn.com filetype:pptx
site:cnn.com filetype:doc
Previously, Google and Bing indexed media files by type, such as MP3, MP4, AVI, and others. Due to abuse of pirated content, this no longer works well. The following extensions have been found to be indexed and provide valuable results.
7Z: Compressed File
BMP: Bitmap Image
DOC: Microsoft Word
DOCX: Microsoft Word
DWF: Autodesk
GIF: Animated Image
HTM: Web Page
HTML: Web Page
JPG: Image
Hyphen (-)
JPEG: Image
KML: Google Earth
KMZ: Google Earth
ODP: OpenOffice
Presentation
ODS: OpenOffice
Spreadsheet
ODT: OpenOffice Text
PDF: Adobe Acrobat
PNG: Image
PPT: Microsoft PowerPoint
PPTX: Microsoft PowerPoint
RAR: Compressed File
RTF: Rich Text Format
TXT: Text File
XLS: Microsoft Excel
XLSX: Microsoft Excel
ZIP: Compressed File
The Exclusion operator (-)
You may want to exclude some content from appearing within results. The hyphen (-) tells most search engines and social networks to exclude the text immediately following from any results. It is important to never include a space between the hyphen and filtered text. The following query shows all links to my blog excluding the internal links.
site:* digitalinvestigator.blogspot.com -site:digitalinvestigator.blogspot.com
My goal in search filters is to dwindle the total results to a manageable amount. When you are overwhelmed with search results, slowly add exclusions to make an impact on the amount of data to analyze.
The InURL Operator
Previously, the operators discussed applied to the content within the web page. This search operator, however, will search for a specific word or phrase inside the URL of a web page. Using suitable keywords for the title in the URL, rather than getting a lot of irrelevant data, the inurl search operator is very useful and helpful. My favourite search using this technique is to find File Transfer Protocol (FTP) servers that allow anonymous connections.
The following search would identify any FTP servers that possess PDF files that contain the term terror within the file
inurl:ftp -inurl(http|https) filetype:pdf "terror"
Obviously, this operator could also be used to locate standard web pages, documents, and files.
In an investigation, you might want to check if your target left a resume online. Simply enter the query below
inurl:curriculum vitae "Julian Assange"
You can add all
to this search to force all listed words to appear in any order. For example, enter
allinurl: OSINT intelligence
and Google will return pages with the terms OSINT intelligence in their URLs
The InTitle Operator
This operator will filter web pages by details other than the actual content of the page. This filter will only present web pages that have specific content within the title of the page. Practically every web page on the internet has an official title for the page. This is often included within the source code of the page and may not appear anywhere within the content. Most webmasters carefully create a title that will be best indexed by search engines.
If you conduct a search for "business email compromise" on Google, you will receive 552,000 results. However, the following search will filter those to 6,150. These only include web pages that had the search terms within the limited space of a page title.
intitle:"business email compromise"
You can add all
to this search to force all listed words to appear in any order. The
following would find any sites that have the words business, email, and compromise within the title,
regardless of the order
allintitle:"business email compromise"
An interesting way to use this search technique is while searching for online folders. We often focus on finding websites or files of interest, but we tend to ignore the presence of online folders full of content related to our search. As an example, I conducted the following search on Google.
intitle:index.of OSINT>
The results contain online folders that usually do not have typical website files within the folders. Each possess dozens of documents and other files related to our search term of OSINT. Some provides a folder structure that allows access to an entire web server of content. Notice that none of these results points to a specific page, but all open a folder view of the data present.
The intext operator
intext:term
restricts results to documents containing term
in the text. This is a very helpful Google dorks search operator. By using intext, you can search and get a glimpse of the material of a web page without having to open it. Generally, we use the shortcut key, that is, CTRL + F to search the term which we are looking for. But by using intext, we will get the results only with the term which we used in the intext search.For example - We are going to search for the web series Tom Clancy's Jack Ryan. I just want to search and gather more information about the series, characters, etc. The appropriate query is given below
intext:"jack ryan"
This will display all the results that have Jack Ryan in the content of the web page. In an investigation, you might want to check if your target left a resume online. Simply enter the query below
intext:curriculum vitae "Julian Assange"
You can add all
to this search to force all listed words to appear in any order. For example, enter
allintext:TOR Dark markets
and Google will only return the pages that have the three terms TOR and Dark and markets within its text.
The OR operator
You may have search terms that are not definitive. You may have a target that has a unique last name that is often misspelled. The OR operator in capital letters only—also written as a vertical bar (|)— returns pages that have just A, just B, or both A and B. For example, entering
DFIR OR OSINT
or entering
DFIR|OSINT
will retrieve pages that contain either the term DFIR or the term OSINT.
The Wildcard operator
The asterisk (*) represents one or more words to Google and is considered a wild card. Google treats the * as a placeholder for a word or words within a search string. For example, "DFIR * training" tells Google to find pages containing a phrase that starts with "DFIR" followed by one or more words, followed by "training".
Let’s say that you are looking into a person-of-interest and the only information you have about this individual is a username: JoseffMoro. While a search for JoseffMoro might return other places that the username shows up online, by using the Wildcard Operator and searching for
JoseffMoro*com
we can instead see if any email addresses or other personal details appear publicly online that use the username as the unique identifier.
While this will not always return significantly different results to searching the username itself, it can be used as a quick way to identify an email address that can later be tied to other accounts.
The Range Operator
The "Range Operator" tells Google to search between two identifiers. These could be sequential numbers or years. As an example,
OSINT Training 2015 .. 2018
would result in pages that include the terms OSINT and training, and also include any number between 2015 and 2018. I have used this to filter results for online news articles that include a commenting system where readers can express their views. The following search identifies websites that contain information about Deborah Samuel, a Nigerian female Christian student lynched to death by her male Muslim colleagues on accusation of blasphemy against Islam in May 2022, and between 1 and 999 comments within the page.
"Deborah Samuel" "1...999 comment"
The Related Operator
To search for similar web pages, use the related operator. It collects a domain, and attempts to provide online content related to that address. As an example, I conducted a search on Google with the following syntax.
related:google.com
The results included no references to that domain but did associate it with other search engines.
The Cache operator
The cache operator enables users to return the most recently cached version of a webpage when the web page has been indexed. Investigators can use the cache operator to locate previous versions of edited or deleted web pages to locate removed intelligence.
cache:eforensicsmag.com
The Map operator
The map operator enables users to force Google to show map results for a locational search. The results show only location-specific data and do not include recent news stories. Investigators can use the map operator to focus on geospatial relevant intelligence.
map:Johannesburg
It is important to note here that these operators can be combined in ways that the OSINT investigator deems fit. Some example usage are considered as follows.
To find guest blogging opportunities, you can combine operators as follows
digital forensics intitle:"write for us" inurl:"write-for-us"
This uncovers so-called “write for us” pages in the digital forensics niche
If you know of a serial guest blogger in your niche, try this:
Kronos Banking Trojan intext:"Marcus Hutchins" inurl:"author" -site:evilsite.com
Got someone in mind that you want to reach out to on social media? Try this trick to find their contact details:
Brett Shavers dfir training (site:twitter.com | site:facebook.com | site:linkedin.com)
To find Q+A threads related to your target term, enter the following query
Satoshi Nakamoto site:quora.com intitle:(TOR | "Bitcoin" | "cryptocurrency market")
By focusing on the LinkedIn site, you can look for people with a certain job title and a certain location. There is a trick that can prove useful, which is that you can search for icons or Unicode characters:
site:linkedin.com/in “<job title>” (☎ OR ☏ OR ✆ OR 📱) +”<location>”
It is also possible to search for a specific target name
“<name>” (☎ OR ☏ OR ✆ OR 📱)
You can search for copies of databases via Google too. To find some of them, simply search for:
ext:sql intext:"-- phpMyAdmin SQL Dump"
To search for excel files in a target organisation that have the word contact in their URL, you can enter the following query (Replace google.com with your target domain).
filetype:xls site:google.com inurl:contact
This yield web pages that have contact list from the target organisation.
The power of Google Dorks or search operators in investigations is in combining them. The reader is encouraged to explore various possible combinations of search operators to achieve the desired results.
Post a Comment