The Wayback Machine is a digital library that provides public access to an archive of both current and historical versions of digitized materials, such as web pages, newspapers, software applications, images, books, and more.
The archive was created in 1996 and launched in 2001 by the Californian non-profit organization Internet Archive and it has grown to be one of the most powerful tools for open-source research. Currently, the Wayback Machine has more than 704 billion web pages archived.
The Wayback Machine
The Wayback Machine enables you to find old(er) information on previous versions of websites or archived websites. This information could be anything that was stored. For example, information about organizations, or about people who worked at these organizations. Much of this older information may no longer be available when you try to access it through an organization’s current website. The Wayback Machine may enable you to discover connections between different websites, and uncover old files and cached images. Sometimes you can gather relevant data like names, phone numbers, email addresses, and even metadata from older versions of a website.
Quick Search Methods
The quickest method to see all the files archived on a particular site is by visiting the URL
https://web.archive.org/*/www.example.com
Simply replace www.example.com with your target site's URL. For example
https://web.archive.org/*/www.google.com
If the site has been archived, a calendar view will appear with colour coded dots which have different meanings. The blue dots are what you’ll want to click on as they indicate a capture of the web page. Green indicates a redirect, Orange dots indicate the crawler received a client error and Red means there was a server error. Navigating the timeline will display the dates of when the site was archived.
To view all archives of a particular domain, use the following direct URL replacing example.com with your target site.
https://web.archive.org/web/*/www.example.com/*
Other Search Methods
You can also manually access the Wayback machine by entering the target site into the search bar
https://archive.org/web/
OSINT researchers can easily conduct basic keyword searches for topics or persons of interest. You can conduct keyword searches via the following URL.
https://web.archive.org/
The archive also enables advanced search features, for more targeted queries. You can use these features via the URL below.
https://archive.org
or by the direct URL given below
https://archive.org/advancedsearch.php
You can sometimes find the email address associated with a user who uploaded a file using the advanced search feature.
Some files require you to login to gain access, this causes the researcher to create a research account using a pseudonym and a burner email address to investigate further.
https://archive.org/account/signup
For OSINT research if you identify an email address, you can run additional searches to see if it has been used elsewhere such as search engines or social media sites. Follow the steps below to understand how to find the email address associated with uploaded files.
- Scroll below to find “download options”
- Click on “show all” to display all files.
- Click on the file that ends with “meta.xml”
- Ctrl+f for the word “uploader” and you will see the email address: donkeykongland2@yahoo.com.
Use Collections and Changes (Beta)
Collections are a way to learn why a URL has been archived into the Wayback Machine.
https://web.archive.org/web/collections/2021*/google.com
Changes allows users to select 2 different versions of a URL & compare them side by side.
https://web.archive.org/web/changes/google.com
To request that a page be archived, the following URL will be useful.
https://archive.org/web
The save button is visible at the bottom right of the screen or by going directly to
https://web.archive.org/save
This “Save Page Now” option only captures that particular page and not the entire website and only works for sites that allow crawlers.
For sourcing purposes it may be important to understand when something was saved by the Internet Archive. Let’s look at the link below:
https://web.archive.org/web/20220904051854/https://www.google.com/
The format of the numbers in the middle are yyyymmddhhmmss so the date the site was crawled was September 04, 2022 at 05:18 and 54 seconds.
Post a Comment