E-mail has emerged as one of the most important applications on Internet for communication of messages, delivery of documents and carrying out of transactions and is used not only from computers but many other electronic gadgets like mobile phones.
Emails are now being used for all sorts of communication including providing confidentiality, authentication, non-repudiation and data integrity. As email usage increased, attackers and hackers began to use emails for malicious activities. Spam emails are a major source of concern within the Internet community. Emails are more vulnerable to be intercepted and might be used by hackers to learn of secret communication. Emails frequently contain malicious viruses, threats and scams that can result in the loss of data, confidential information and even identity theft.
It has thus become essential to identify and eliminate users and machines misusing email service. Email forensic analysis is used to study the source and content of email message as evidence, identifying the actual sender, recipient and date and time it was sent, etc. to collect credible evidence to bring criminals to justice.
Steps in Email Communication
To demonstrate how e-mail delivery works, please refer to the figure below which shows an e-mail communication between a sender Alice having the e-mail address alice@a.com and recipient Bob having the e-mail address bob@b.com.
- Alice (alice@a.org) composes an email using her computer for Bob (bob@b.org) and sends it to her sending SMTP server (smtp.a.org) using the SMTP protocol.
- The sending SMTP server performs a lookup to find the mail exchange record of the receiving server (b.org) through the Domain Name System (DNS) protocol on DNS server (dns.b.org).
- The DNS server responds with the highest priority mail exchange server (mx.b.org) for the domain (b.org).
- The sending server will establish an SMTP connection with the receiving server and deliver the e-mail message to the mailbox of Bob on the receiving server.
- Bob downloads the message from his mailbox on receiving server to local mailbox (e.g., Mozilla Thunderbird) on his client computer using POP3 or IMAP protocols or he can use webmail (through a web browser) to read the e-mail directly on the receiving server.
Email Actors, Roles, and Responsibilities
To understand the email system, including its security vulnerabilities, we need to understand the roles and responsibilities of each Actor and the relationships between them. E-mail is a highly distributed service involving several actors that play different roles to accomplish end-to-end mail exchange.
According to RFC 5598, these actors fall under
- User Actors
- Message Handling Service (MHS) Actors
- ADministrative Management Domain (ADMD) Actors
User actors are the sources or sinks of messages. They can be individuals, organizations, and processes. Users can generate, modify or look at the whole message. They can have an exchange that iterates, and they can expand or contract the set of Users that participate in a set of exchanges.
User Actor Types |
Roles and Responsibilities |
Author |
|
Recipient |
|
Return Handler |
|
Mediator |
|
All types of Mediator user actors set HELO/EHLO, ENVID, RcptTo and Received fields. Alias actors also typically change To/CC/BCC and MailFrom fields. Identities relevant to ReSender are: From, Reply-To, Sender, To/CC/BCC, Resent-From, Resent-Sender, Resent-To/CC/BCC and MailFrom fields. Identities relevant to Mailing List processor are: List-Id, List-*, From, Reply-To, Sender, To/CC and MailFrom fields. Identities relevant to Gateways are: From, Reply-To, Sender, To/CC/BCC and MailFrom fileds.
The Message Handling Actors performs a single end-to-end transfer on behalf of the Author to reach the Recipient addresses specified in the original RFC5321 RcptTo commands. Exchanges that are mediated or iterative and protracted, such as those used for collaboration over time, are handled by the User Actors, not by the MHS Actors. These Actors can generate, modify or look at only transfer data in the message.
MHS Actor Types |
Roles and Responsibilities |
Originator |
|
Relay |
|
Gateway |
|
Receiver |
|
ADministrative Management Domain (ADMD) Actors are associated with different organizations which have their own administrative authority, operating policies and trust-based decision making. There are basically three types of ADMDs:
Edge |
Independent transfer services in networks at the edge of the open Internet Mail service. |
Consumer |
Might be a type of Edge service, as is common for web-based email access |
Transit |
E-Mail Service Providers (ESPs) that offer value-added capabilities for Edge ADMDs, such as aggregation and filtering. |
The mail-level transit service is different from packet-level switching. End-to-end packet transfers usually go through intermediate routers; e-mail exchange across the open Internet can be directly between the Boundary MTAs of Edge ADMDs. Edge networks can use proprietary email standards internally. Common examples of ADMDs are Enterprise Service Providers, Internet Service Providers (ISP) and E-mail Service Providers.
Email Architecture
The Email system is an integration of several hardware and software components, services and protocols which provide interoperability between its users and among the components along the path of transfer. The email architecture consists of the following components:
- Mail User Agent (MUA) -
- Mail Submission Agent (MSA)
- Mail Transfer Agent (MTA)
- Mail Delivery Agent (MDA)
- Message Handling System
- Message Store (MS)
- Relays
- Gateway
- Web server
- Mail server
Mail User Agent
A Mail/Message User Agent (also known as email client) is a computer program or software that allows the User Actor to send and receive mails. On the the Author (or sender) side, it is called the Author MUA (aMUA) and on the Receiver side, it is called the Receiver Mail User Agent (rMUA). aMUA creates messages and performs initial submission via Mail Submission Agent (MSA). Besides this, it can also perform creation and posting time archiving in its Message Store. rMUA processes received mail that includes generation of user level disposition control messages, displaying and disposing of the received message and closing or expanding the user communication loop by initiating replies and forwarding new messages. A Mediator performs message re-posting and as such it is a special MUA. For bulk sending services and automatic responder (serving out of office notifications), MUA can be automated. The identity fields relevant to MUA are: From, Reply-To, Sender, To, CC and BCC.
All Mail User Agent (MUA) nodes are software packages that run on client computers and allow end users to compose, create and read e-mail. Some MUAs may be used to send e-mail to the receiving MTAs directly or indirectly.
Examples of Mail User Agents (MUAs) include Microsoft Outlook, Microsoft Outlook Express, Lotus Notes, Netscape communicator, Qualcomm Eudora, KDE KMail, Apple Mail, and Mozilla Thunderbird. Several Web-based e-mail programs and services (known as Webmail) such as AIM Mail, Yahoo Mail, Gmail, and Hotmail which integrate e-mail clients and servers behind a Web server are also used as MUAs.
Message Store
The Message Store is a dedicated data store for the delivery, retrieval, and manipulation of Internet mail messages. The Message Store works with the IMAP4 and POP3 client access servers to provide flexible and easy access to messaging.
The Message Store is organized as a set of folders or user mailboxes. The folder or mailbox is a container for messages. Each user has an INBOX where new mail arrives. Each IMAP or Webmail user can also have one or more folders where mail can be stored. Folders can contain other folders arranged in a hierarchical tree. Mailboxes owned by an individual user are private folders. Private folders can be shared at the owner’s discretion with other users on the same Message Store. Messaging Server supports sharing folders across multiple stores by using the IMAP protocol.
There are two general areas in the Message Store, one for user files and another for system files. In the user area, the location of each user’s INBOX is determined by using a two-level hashing algorithm. Each user mailbox or folder is represented by another directory in its parent folder. Each message is stored as a file. When there are many messages in a folder, the system creates hash directories for that folder. Using hash directories eases the burden on the underlying file system when there are many messages in a folder. In addition to the messages themselves, the Message Store maintains an index and cache of message header information and other frequently used data to enable clients to rapidly retrieve mailbox information and do common searches without the need to access the individual message files.
Mail Submission Agent
A Mail/Message Submission Agent (MSA) is a computer program or software agent that receives electronic mail messages from a Mail User Agent (MUA) and cooperates with the Mail Transfer Agent (MTA) for the delivery of the mail. It uses ESMTP, a variant of the Simple Mail Transfer Protocol (SMTP) as specified in RFC 6409.
It accepts the message submitted by the aMUA for posting. It enforces the policies of the hosting ADMD and the requirements of Internet standards before posting the message from an Authors environment to the MHS. These include adding header fields such as Date and Message-ID and expanding an address to its formal Internet Mail Format (IMF) representation. The MHS-focused Mail Submission Agent (hMSA) is responsible for transiting the message to the MTA. The identity fields relevant to the MSA are HELO/EHLO, ENVID, MailFrom, RcptTo, Received, and SourceAddr. The responsibilities of the MUA and the MSA may be integrated in a single Agent. Historically, both the MTA and MSA agents use port number 25 but the official port for the MSA is 587.
Mail Transfer Agent
A Mail/Message Transfer Agent (MTA) relays mail for one application-level "hop". MTA nodes are in effect postal sorting agents that have the responsibility of retrieving the relevant Mail eXchange (MX) record from the DNS Server for each e-mail to be send and thus map the distinct e-mail addressee’s domain name with the relevant IP address information. DNS is a distributed directory database that correlated domain names to IP addresses. MTAs can also be used to compose and create e-mail messages. Examples of MTAs include Sendmail, Postfix, Exim, and Exchange Server. A receiving MTA can also perform the operation of delivering e-mail message to the respective mailbox of the receiver on the mail server and thus is also called Mail Delivery Agent (MDA). Unlike typical packet switches (and Instant Messaging services), MTAs are expected to store messages in a manner that allows recovery across service interruptions, such as host-system shutdown. The offered degree of robustness and persistence by MTAs can vary. An MTA can perform well established roles of Boundary MTAs (Onbound or Inbound) or Final MTAs. The identity fields relevant to MTAs are HELO/EHLO, ENVID, MailFrom, RcptTo, Received, and SourceAddr.
Mail Delivery Agent
A Mail/Message Delivery Agent (MDA) is a computer software or program that receives email from a Mail Transfer Agent (MTA), then sorts and delivers the email into the mailbox of the Receiver. Both the MHS-focused MDA (hMDA) and Receiver-focused MDA (rMDA) are responsible for accepting the message for delivery to distinct addresses. The hMDA functions as an SMTP server engine and the rMDA performs the delivery action. The identity fields relevant to MDA are Return-Path and Received.
Relays
Emails relays are also known as SMTP relays. These are, essentially, nodes that performs email relaying. Relaying is the process of receiving e-mail message from one SMTP e-mail node and forward it to another one. They are like packet switches or IP routers and make routing assessments to move the message closer to the Recipients. They also add trace information and have all roles of MTA’s. So: simply put, an SMTP relay is the very act, done by an outgoing mail server, to deliver any email to another server.
SMTP relay is required when an email is sent between two different domains, not when an email is sent to a recipient within the same domain (such as end users who work for the same organization). SMTP relay services make it easier for a user to send out a large volume of emails to different domains.
Gateway
Gateway nodes are used to convert e-mail messages from one application layer protocol to other. Gateway nodes named GWSMTP, B accept SMTP protocol-based emails and transfer them with protocols other that SMTP and GWA, SMTP performs the inverse process at incoming and outgoing interfaces. Gateway nodes GWA,B do not use SMTP either for incoming or outgoing interfaces. A process called Proxy may be done at these nodes when incoming and outgoing interfaces use same protocols.
Used as a strategic and layered approach to email security, an SMTP gateway helps protect sensitive information from becoming vulnerable to malware, spam and phishing attacks.
Web Sever
These nodes are the e-mail Web servers (WebServ) that provide the Web environment to compose, send and read an e-mail message.
Mail Server
They represent e-mail servers (MailServ) providing users mail access service using IMAP or POP3 protocols. They can also provide an internal interface to a Web server for HTTP based e-mail access.
The email architecture described above is depicted in the figure below which specifies the relationship between its logical components for creation, submission, transmission, delivery and reading processes of an e-mail message.
An email message from the Author to the Receiver that traverses through the Author Mail User Agent (aMUA), Author Mail Submission Agent (aMUA), MHS-focused Mail Submission Agent (hMSA), outbound Mail Transfer Agent, inbound Mail Transfer Agent, MHS-focused Mail Delivery Agent (hMDA), Recipient-focused Mail Delivery Agent (rMDA), Recipient-focused Mail Server (rMailServ), and Recipient-focused Mail User Agent (rMUA) is considered a good mail by the Sender Policy Framework (SPF).
Mails following through other paths are either fully or partially non-SMTP based or uses non-standard transfer modes which are often suspected to contain viruses and spam. Delivery Status Notification (DSN) messages are generated by some components of MHS (MSA, MTA, or MDA) which provide information about transfer errors or successful deliveries and are sent to MailFrom addresses. Message Disposition Notification (MDN) messages are generated by rMUA which provide information about post-delivery processing are sent to Disposition-Notification-To address. Out Of Office (OOO) messages are sent by rMDA to return address.
Protocols Used in The Electronic Mail System
The e-mail nodes establish connections with one or more nodes on specific ports for possible email flow between them using a particular protocol. SMTP is an application layer protocol for TCP/IP based Internet infrastructure which sets conversational and grammatical rules for exchanging e-mail between computers. The most commonly used protocols for e-mail retrieval by client programs are Post Office Protocol Version 3 (POP3) and Internet Message Access Protocol (IMAP). The table below shows the various protocols and associated port numbers.
Email Identities and Data
Finding the source of email crimes is a challenging task for forensic investigators. An email message contains several fields that can be used for investigation. I have observed that some investigators in the forensic community lack understanding of the evidence embedded within most of these fields of an email message. Therefore, there is a need to thoroughly understand these identities and what information can be gleaned from them.
Identities used in E-mail are globally unique and they include:
- Mailbox - the destination to which electronic mail messages are delivered. It is the equivalent of a letter box in the postal system. It is identified by an email address (which consists of username and domain name separated by the @ character). Access to a mailbox is controlled by a mailbox provider. Usually, anyone can send messages to a mailbox while only authenticated users can read or delete from their own mailboxes.
- Domain name - A domain name is a global reference to an Internet resource like a host, network or service which maps to IP address(es). Its structure has a hierarchical sequence of labels, separated by dots.
- message-ID and ENVelope IDentifier (ENVID) - These are message identifiers which respectively pertain to message content and transfer. Message-ID is used for threading, aiding identification for duplications and DNS tracking. The ENVelope Identifier (ENVID) is used for the purpose of message tracking.
An e-mail message comprises of envelope that contains transit-handling information used by the Message Handling System (MHS) and message content which consists of two parts namely Body and Header.
The Body is text but can also include multimedia elements in Hyper Text Markup Language (HTML) and attachments encoded in Multi-Purpose Internet Mail Extensions (MIME). The Header is a structured set of fields that include ‘From’, ‘To’, ‘Subject’, ‘Date’, ‘CC’, ‘BCC’, ‘Return-To’, and so on. Headers are included in the message by the sender or by a component of the e-mail system and also contain transit-handling trace information. Further, the message also contains special control data pertaining to Delivery Status and Message Disposition Notifications, and so on.
Various identities called fields are present in the message and are used in different parts of email architecture called Layers. These fields serve a specific function in the system and are set by some component of the system. These identities are used for analysing e-mail to determine the source (originator and the author).
Field |
Set By |
Description |
Layer - Message Header Fields (Identification Fields) |
||
Message-ID: |
Originator |
Globally unique message identification string generated when it is sent. |
In-Reply-To: |
Originator |
Contains the Message-ID of the original message in response to which the reply message is sent. |
References: |
Originator |
Identifies other documents related to this message, such as other email message. |
Layer - Message Header Fields (Originator Fields) |
||
From: |
Author |
Name and email address of the author of the message |
Sender: |
Originator |
Contains the address responsible for sending the message on behalf of the Author, if not omitted or same as that specified in From field. |
Reply-To: |
Author |
Email address, the author would like recipients to use for replies. If present it overrides the From field. |
Layer - Message Header Fields (Originator Date Fields) |
||
Date: |
Originator |
It holds date and time when the message was made available for delivery. |
Layer- Message Header Fields (Informational Fields) |
||
Subject: |
Author |
It describes the subject or topic of the message. |
Comments: |
Author |
It contains summarized comments regarding the message. |
Keyword: |
Author |
It contains list of comma separated keywords that may be useful to the recipients e.g. when searching mail. |
Layer - Message Header Fields (Destination Address Fields) |
||
TO: |
Author |
Specifies a list of addresses of the recipients of the message. These addresses might be different from address in RcptTo SMTP commands |
CC: |
Author |
Generally same as To Field. Generally a To field specifies primary recipient who is expected to take some action and CC addresses receive a copy as a courtesy |
BCC: |
Author |
Address of recipient whose participation is not disclosed to recipients specified in To and CC addresses. |
Layer - Message Header Fields (Resent Fields) |
||
Resent-Message-ID: |
Mediator |
Globally unique message identification string generated when it is resent. |
Resent-* |
Mediator |
When manually forwarding a message, resent header fields referring to the forwarding, not to the original message. MIME specifies another way of resending messages, using the "Message" Content-Type. |
Layer - Message Header Fields (List Fields) |
||
List-ID |
Mediator, Author |
Globally unique Mailing List identification string. |
List-* |
Mediator, Author |
A collection of header fields for use by Mailing Lists. |
Layer - Message Header Fields (Trace Fields) |
||
Received: |
Originator, Relay, Mediator, Destination |
Contains trace information that includes originating host, Mediators, relays, and MSA host domain names and/or IP addresses |
Return-Path: |
MDA, from MailFrom
|
Contains the address recorded by MDA from MailFrom SMTP command. When an email message does not make it to its intended recipient, the return-path indicate where non-delivery receipts or bounced messages are to be sent. Return-Path field is verified by the Sender Policy Framework (SPF). |
Layer - Message Header Fields (Optional Security Fields) |
||
DKIM Signature |
MUA, MSA or MTA |
The signature of the email is stored in the DKIM-Signature header field. This header field contains all of the signature and key fetching data. DKIM uses a simple "tag=value" syntax in several contexts, including in messages and domain signature records |
Received-SPF |
MTA |
It contains Sender Policy Framework (SPF) validation results for a domain and its mail servers. Domain owners publish records via DNS that describe their policy for which machines are authorized to use their domain in the HELO and MAIL FROM addresses, which are part of the SMTP protocol. |
Layer - Message Body Fields (MIME Header Fields) |
||
MIME-version |
Author |
It describes the version of the MIME message format. |
Content-* |
Author |
It contains a collection of MIME Header fields describing various aspects of message body, including and signatures. |
Layer - SMTP |
||
HELO/EHLO |
Latest Relay Client (Originator, MSA, MTA) |
It contains the hosting domain for the SMTP HELO and EHLO commands. |
ENVID |
Originator |
An opaque string included in DSN as a means for assisting the Return Address Recipient in identifying the message that produced a DSN or message tracking. |
MailFrom |
Originator |
It is a string containing e-mail address for receiving return control information like returned messages transfer level problems) |
RcptTo |
Author |
It specifies MUA mailbox address of a recipient. |
ORCPT |
Originator |
Is an optional parameter to the RCPT command, indicating the original address to which the current RCPT TO address corresponds after a mapping during transit. |
Layer - IP |
||
Source Address |
Latest Relay Client |
It contains the source Address of the host immediately preceding the current receiving SMTP server from which the IP datagram (email message is fragmented into IP packets) was send. It is independent of the mail system and is supplied by the IP layer. |
Email Forensic Analysis Techniques
Investigation of email related crimes and incidents involves various approaches. They are discussed below:
Email Header Analysis
Email header analysis is the primary analytical technique. This involves analyzing metadata in the email header. Analysing headers helps to identify the majority of email-related crimes. Email spoofing, phishing, spam, scams and even internal data leakages can be identified by analyzing the header. In using this technique, the following should be observed meticulously by the forensic investigator.
- Check if the ‘From:’ field and ‘Return-Path:’ field match.
- Check if the 'Reply-To:' field is the same as the 'From:' field.
- If X-distribution is bulky, it is an indication of spam.
- X-spam score and X-spam flag helps determine if it is a spam email.
Digital forensic experts use this information in their email investigation to track down cybercriminals.
Server Investigation
Server investigation involves the analysis of email copies and logs retained on the mailing server in an effort to identify the source of a message. If the sender or receiver has deleted the email, then investigators look at the Internet Service Provider (ISP) or proxy servers to find a saved copy. A proxy server is an intermediate gateway between the end-user and the website domain.
SMTP servers which store data pertaining to owner of a mailbox (i.e., credit card number) are of great value in revealing one’s identity. However, this type of investigation often proves to be time consuming as the logs and back up emails need to be requested either from the proxy server or the ISP (some may not co-operate with the investigators), and resource expensive due to the large amounts of processing required to restore any valuable information. Additionally, e-mail copies and server logs are only maintained for some limited periods of time (which vary according to the applicable legislation).
It is important that the investigator examine the logs of all servers in the received chain as soon as possible. Time is very important in e-mail cases as HTTP and SMTP logs are archived frequently, especially by large ISPs. If a log is archived, it could take time and effort to retrieve and decompress the log files needed to trace e-mails.
When server investigation is not an option, investigators have the option to turn towards network device investigation, which is known as a notoriously complex type of investigation involving analysis of logs maintained by the network devices such as routers, firewalls and switches.
Network Device Investigation
At times, the above-mentioned logs are not available. This could be due to non-configuration or refusal to share log files by ISPs. In this situation, forensic cyber experts check the data maintained by network devices like switches and routers for evidence.
Bait Tactics
In bait tactic investigation an e-mail with http: “<img src>” tag having image source at some computer monitored by the investigators is send to the sender of e-mail under investigation containing real (genuine) e-mail address. When the e-mail is opened, a log entry containing the IP address of the recipient (sender of the e-mail under investigation) is recorded on the http server hosting the image and thus sender is tracked. However, if the recipient (sender of the email under investigation) is using a proxy server then IP address of the proxy server is recorded. The log on proxy server can be used to track the sender of the e-mail under investigation. If the proxy server’s log is unavailable due to some reason, then investigators may send the tactic email containing a) Embedded Java Applet that runs on receiver’s computer or b) HTML page with Active X Object. Both aiming to extract IP address of the receiver’s computer and e-mail it to the investigators.
Software Embedded Identifiers
Another investigation technique involves the analysis of software embedded identifiers where the investigator is looking for information related to the creator or message contained data (i.e., attached files) through information incorporated by the email client/software used by the sender. This information often takes the form of custom headers or MIME (Multipurpose Internet Mail Extensions) content. Even though this type of analysis proves time consuming it may reveal some vital information about the sender’s e-mail preferences and options that could help client-side evidence gathering.
This information may be included in the form of custom headers or in the form of MIME content as a Transport Neutral Encapsulation Format (TNEF). MIME is an internet standard deployed to assist the transfer of single text, multiple texts, or non-text attachments. TNEF is an exclusive and unshared format for email attachments used by Microsoft Outlook and Microsoft Exchange Server.
The investigation can reveal PST file names, Windows logon username, MAC address, etc. of the client computer used to send e-mail message. Similarly, the received header field and email handling software at the sender side may reveal the software managing emails on the server (due to the different structure in headers). This analysis forms part of a procedure known as sender mailer fingerprints capable of describing applications and their version at the client side; useful as reveals characteristics/vulnerabilities of the bearing host machine.
Sender Mailer Fingerprints
In addition to the ‘subject,’, ‘from:’, and ‘to:’ headers, emails contain X-headers. Specialists track this piece of information to locate the IP address of the sender’s device.
During the investigation of an email, the sender mailer fingerprints approach identifies the sender’s software and its version. For example, Gmail, Outlook, Hotmail and more. This information about the client computer of the sender can be used to help investigators devise an effective plan and thus prove to be very useful.
Use of Email Trackers
In some situations, attackers use different techniques and locations to generate emails. In such situations it is important to find out the geographical location of the attacker. To get the exact location of the attacker, investigators often use email tracking software embedded into the body of an email. When a recipient opens a message that has an email tracker attached, the investigator will be notified with the IP address and geographical location of the recipient. This technique is often used to identify suspects in murder or kidnapping cases, where the criminal communicates via email.
Attachment Analysis
Most viruses and malware are sent through email attachments. Investigating attachments is crucial in any email-related investigation. Confidential information leakage is another important field of investigation. There are software tools available to recover email-related data, such as attachments from computer hard discs. For the analysis of suspicious attachments, investigators can upload documents into an online sandbox such as VirusTotal to check whether the file is malware or not. However, it is important to bear in mind that even if a file passes a test such as VirusTotal’s, this is not a guarantee that it is fully safe. If this happens, it is a good idea to investigate the file further in a sandbox environment such as Cuckoo.
CONCLUSION
It is clear that it is not viable for investigators to perform most of these analyses on a day-to-day basis (with no real evidence in hand) due to their time and resource complexity and also the risk of not being able to gather evidence of real value. Due to this, I emphasise on email header (tracing) analysis in an effort to allow investigators to get to evidence of value in a timely manner.
Post a Comment