Using The Wayback Machine For OSINT

 


The Wayback Machine is a digital library that provides public access to an archive of both current and historical versions of digitized materials, such as web pages, newspapers, software applications, images, books, and more. 

The archive was created in 1996 and launched in 2001 by the Californian non-profit organization Internet Archive and it has grown to be one of the most powerful tools for open-source researchCurrently, the Wayback Machine has more than 704 billion web pages archived.


The Wayback Machine

The Wayback Machine enables you to find old(er) information on previous versions of websites or archived websites. This information could be anything that was stored. For example, information about organizations, or about people who worked at these organizations. Much of this older information may no longer be available when you try to access it through an organization’s current website. The Wayback Machine may enable you to discover connections between different websites, and uncover old files and cached imagesSometimes you can gather relevant data like names, phone numbers, email addresses, and even metadata from older versions of a website. 


Quick Search Methods

The quickest method to see all the files archived on a particular site is by visiting the URL 

https://web.archive.org/*/www.example.com

Simply replace www.example.com with your target site's URL. For example

https://web.archive.org/*/www.google.com

If the site has been archived, a calendar view will appear with colour coded dots which have different meanings. The blue dots are what you’ll want to click on as they indicate a capture of the web page. Green indicates a redirect, Orange dots indicate the crawler received a client error and Red means there was a server error. Navigating the timeline will display the dates of when the site was archived

 

 


To view all archives of a particular domain, use the following direct URL  replacing  example.com with your target site.

https://web.archive.org/web/*/www.example.com/*


Other Search Methods

You can also manually access the Wayback machine by entering the target site into the search bar 

https://archive.org/web/

OSINT researchers can easily conduct basic keyword searches for topics or persons of interest. You can conduct keyword  searches via the following URL.

https://web.archive.org/

The archive also enables advanced search features, for more targeted queries. You can use these features via the URL below.

https://archive.org

or by the direct URL given below

https://archive.org/advancedsearch.php

You can sometimes find the email address associated with a user who uploaded a file  using the advanced search feature.

Some files require you to login to gain access, this causes the researcher to create a research account  using  a pseudonym and a burner  email address to investigate further.


https://archive.org/account/signup

For OSINT research if you identify an email address, you can run additional searches to see if it has been used elsewhere such as search engines or social media sites. Follow the steps below to understand how to find the email address associated with uploaded files.

  • Scroll below to find “download options”
  • Click on “show all” to display all files.
  • Click on the file that ends with “meta.xml”
  • Ctrl+f for the word “uploader” and you will see the email address: donkeykongland2@yahoo.com.


Use Collections and Changes (Beta)

Collections are a way to learn why a URL has been archived into the Wayback Machine. 

https://web.archive.org/web/collections/2021*/google.com

Changes allows users to select 2 different versions of a URL & compare them side by side. 

https://web.archive.org/web/changes/google.com

To request that a page be archived, the following URL will be useful.

https://archive.org/web

The save button is visible at the bottom right of the screen or by going directly to 

https://web.archive.org/save

This “Save Page Now” option only captures that particular page and not the entire website and only works for sites that allow crawlers. 

For sourcing purposes it may be important to understand when something was saved by the Internet Archive. Let’s look at the link below:

https://web.archive.org/web/20220904051854/https://www.google.com/

The format of the numbers in the middle are yyyymmddhhmmss so the date the site was crawled was September 04, 2022 at 05:18 and 54 seconds.

Post a Comment

Previous Post Next Post