Windows PE Forensics - DOS And PE Headers



The Windows PE file format is derived from the Common Object File Format (COFF) that was used in Virtual Address eXtension (VAX) systems running the Virtual Memory System (VMS) operating system produced by Digital Equipment Corporation (DEC), which was acquired by Compaq in 1998 and merged with HP in 2002. Most of the original Windows NT development team came from DEC, and when coming up with the PE format, they used existing code that they were already familiar with. PE is the native Win32 file format. Every win32 executable (except VxDs and 16-bit DLLs) uses PE file format. 32bit DLLs, COM files, OCX controls, Control Panel Applets (.CPL files) and .NET executables are all PE format. Even NT's kernel mode drivers use PE file format. 

Why do we need to know about it? 2 main reasons - Adding code to executables (e.g. keygen injection or adding functionality) and manually unpacking executables - With respect to the latter, most shareware nowadays comes "packed" in order to reduce size and to provide an added layer of protection. In a packed executable, the import tables are usually destroyed and data is often encrypted. The packer inserts code to unpack the file in memory upon execution, and then jumps to the original entry point of the file (where the original program actually starts executing). If we manage to dump this memory region after the packer finished unpacking the executable, we still need to fix the sections and import tables before our app will run. How will we do that if we don’t even know what the PE format is?

The example executable I will be using throughout this series is BASECALC.exe, a very useful app for calculating and converting decimal, hex, binary and octal. It is coded in Borland Delphi 2.0.


Basic Structure Of A PE File


The PE file format consists of the following:

  • DOS MZ Header - All PE files start with the DOS MZ header. It is located at offset 0 of a PE file. The DOS MZ header was placed there to enable a DOS operating system to recognize the PE file as a valid executable file so it can execute the DOS stub
  • DOS Stub - The DOS stub is a valid DOS executable file. As said above, the DOS MZ header enables DOS to recognize the PE file as a valid executable in DOS so the DOS stub can be executed. The main purpose of executing the DOS stub is to tell the user, in case the program was executed under DOS, that the program is for Windows. The stub simply displays a message that the program cannot be run in DOS mode.
  • PE Header - This structure contains the important fields that the PE loader needs.
  • Section table
  • Sections

At a minimum, a PE file will have 2 sections; one for code and the other for data. An application for Windows NT has 9 predefined sections named .text, .bss, .rdata, .data, .rsrc, .edata, .idata, .pdata, and .debug. Some applications do not need all of these sections, while others may define still more sections to suit their specific needs.

The sections that are most commonly present in an executable are:

  • Executable Code Section, named .text (Microsoft) or CODE (Borland)
  • Data Sections, named .data, .rdata, or .bss (Microsoft) or DATA (Borland)
  • Resources Section, named .rsrc 
  • Export Data Section, named .edata
  • Import Data Section, named .idata
  • Debug Information Section, named .debug  

The names are actually irrelevant as they are ignored by the OS and are present only for the convenience of the programmer. Another important point is that the structure of a PE file on disk is exactly the same as when it is loaded into memory so if you can locate info in the file on disk you will be able to find it when the file is loaded into memory.

However it is not copied exactly into memory. The windows loader decides which parts need mapping in and omits any others. Data that is not mapped in is placed at the end of the file past any parts that will be mapped in e.g. Debug information.

Also the location of an item in the file on disk will often differ from its location once loaded into memory because of the page-based virtual memory management that windows uses. When the sections are loaded into RAM they are aligned to fit to 4Kb memory pages, each section starting on a new page. Virtual memory is explained below.



The concept of virtual memory is that instead of letting software directly access physical memory, the processor and OS create an invisible layer between the two. Every time an attempt is made to access memory, the processor consults a "page table" that tells the process which physical memory address to actually use. It wouldn’t be practical to have a table entry for each byte of memory (the page table would be larger than the total physical memory), so instead processors divide memory into pages. This has several advantages:

  • It enables the creation of multiple address spaces. An address space is an isolated page table that only allows access to memory that is pertinent to the current program or process. It ensures that programs are completely isolated from one another and that an error causing one program to crash is not able to poison another program's address space.
  • It enables the processor to enforce certain rules on how memory is accessed. Sections are needed in PE files because different areas in the file are treated differently by the memory manager when a module is loaded. At load time, the memory manager sets the access rights on memory pages for the different sections based on their settings in the section header. This determines whether a given section is readable, writable, or executable. This means each section must typically start on a fresh page. However, the default page size for Windows is 4096 bytes (1000h) and it would be wasteful to align executables to a 4Kb page boundary on disk as that would make them significantly bigger than necessary. Because of this, the PE header has two different alignment fields; Section alignment and file alignment. Section alignment is how sections are aligned in memory as above. File alignment (usually 512 bytes or 200h) is how sections are aligned in the file on disk and is a multiple of disk sector size in order to optimize the loading process. 
  • It enables a paging file to be used on the harddrive to temporarily store pages from the physical memory whilst they are not in use. For instance if an app has been loaded but becomes idle, its address space can be paged out to disk to make room for another app which needs to be loaded into RAM. If the situation reverses, the OS can simply load the first app back into RAM and resume execution where it left off. An app can also use more memory than is physically available because the system can use the hard drive for secondary storage whenever there is not enough physical memory.  

When PE files are loaded into memory by the windows loader, the in-memory version is known as a module. The starting address where file mapping begins is called an HMODULE. A module in memory represents all the code, data and resources from an executable file that is needed for execution whilst the term process basically refers to an isolated address space which can be used for running such a module.


The DOS Header 

 All PE files start with the DOS header which occupies the first 64 bytes of the file. It's there in case the program is run from DOS, so DOS can recognize it as a valid executable and run the DOS stub which is stored immediately after the header. The DOS stub usually just prints a string something like "This program must be run under Microsoft Windows" but it can be a full-blown DOS program. When building an application for Windows, the linker links a default stub program called WINSTUB.EXE into your executable. You can override the default linker behaviour by substituting your own valid MS-DOS-based program in place of WINSTUB and using the -STUB: linker option when linking the executable file.

The DOS header is a structure defined in the windows.inc or winnt.h files. (If you have an assembler or compiler installed you will find them in the \include\ directory). It has 19 members of which magic and lfanew are of interest:

 


IMAGE_DOS_HEADER STRUCT
	e_magic			WORD		?  //Magic DOS signature MZ (4D5Ah)
	e_cblp			WORD		? //Byte on last page of file       
	e_cp			WORD		? //Pages in file
	e_crlc			WORD		? //Relocations
	e_cparhdr		WORD		? //Size of header in paragraphs
	e_minalloc		WORD		? //Minimum extra paragraphs needed
	e_maxalloc		WORD		? //Maximum extra paragraphs needed
	e_ss			WORD		? //Initial (relative) SS value
	e_sp			WORD		? //Initial SP value
	e_csum			WORD		? //Checksum
	e_ip			WORD		? //Initial IP value
	e_cs			WORD		? //Initial (relative) CS value
	e_lfarlc		WORD		? //File address of relocation table
	e_ovno			WORD		? //Overlay number
	e_res			WORD	  4 dup (?) //Reserved words
	e_oemid			WORD		? //OEM identifier (for e_oeminfo)
	e_res2			WORD	  10 dup (?) //Reserved words
	e_lfanew		DWORD		? //Offset to start of PE Header
IMAGE_DOS_HEADER ENDS

In the PE file, the magic part of the DOS header contains the value 4D5Ah (The letters "MZ" for Mark Zbikowsky one of the original architects of MS-DOS) which signifies a valid DOS header. MZ are the first 2 bytes you will see in any PE file opened in a hex editor.



As can be seen from its definition above, lfanew is a DWORD which sits at the end of the DOS header directly before the DOS stub begins. It contains the offset of the PE header, relative to the file beginning. The windows loader looks for this offset so it can skip the DOS stub and go directly to the PE header.

NOTE: DWORD ("double word") = 4 bytes or 32bit value, WORD = 2 bytes or 16bit value, sometimes you will also see dd for DWORD, dw for WORD and db for byte.

 The definitions are helpful as they tell us the size of each member. This allows us to locate information of interest by counting the number of bytes from the start of the section or any other identifiable point.

As I said above, the DOS header occupies the first 64 bytes of the file - that is the first 4 rows seen in the hex editor in the picture below. The last DWORD before the DOS stub begins contains 00 01 00 00h. Allowing for reverse byte order this gives us 00 00 01 00h which is the offset where the PE header begins. The PE header begins with its signature 50 45 00 00h (the letters "PE" followed by two terminating zeroes).

If in the Signature field of the PE header, you find an NE signature here rather than a PE, you're working with a 16-bit Windows New Executable file. Likewise, an LE in the signature field would indicate a Windows 3.x virtual device driver (VxD). An LX here would be the mark of a file for OS/2 2.0.


The PE Header

The PE header is the general term for a structure named IMAGE_NT_HEADERS. This structure contains essential information used by the loader. IMAGE_NT_HEADERS has 3 members and is defined in windows.inc thus:

IMAGE_NT_HEADERS STRUCT
	Signature		DWORD				
	FileHeader		IMAGE_FILE_HEADER		
	OptionalHeader	        IMAGE_OPTIONAL_HEADER32
IMAGE_NT_HEADERS ENDS

 Signature is a DWORD containing the value 50 45 00 00h ("PE" followed by two terminating zeroes).

FileHeader is the next 20 bytes of the PE file and contains info about the physical layout & properties of the file e.g. number of sections. OptionalHeader is always present and forms the next 224 bytes. It contains info about the logical layout inside the PE file e.g. AddressOfEntryPoint. Its size is given by a member of FileHeader. The structures of these members are also defined in winnt.h.

FileHeader is defined as follows.

IMAGE_FILE_HEADER STRUCT
	Machine				WORD	?
	NumberOfSections		WORD	?
	TimeDateStamp			DWORD	?
	PointerToSymbolTable	        DWORD	?
	NumberOfSymbols			DWORD	?
	SizeOfOptionalHeader	        WORD	?
	Characteristics			WORD	?
IMAGE_FILE_HEADER ENDS


These are explained in the table below


Offset

Size

Field Name

Description

Value

04h

WORD

Machine

The CPU that the file is intended for

0x14D – Intel i860 0x14C – Intel i386 (same ID used for 486 and 586) 0x162 – MIPS R3000 0x166 – MIPS R4000 0x183 – DEC Alpha AX

06h

WORD

Number of sections

The number of sections in the file

 

08h

DWORD

TimeDateStamp

The time that the linker produced the file. The field represents the number of seconds since December 31, 1969, at 4 p.m.

 

 0Ch

 DWORD

 PointerToSymbolTable

 The file offset of the COFF symbol table. This field is used only in OBJ files and PE files with COFF debug information.

 

 10h

 DWORD

 NumberOfSymbols

 The number of symbols in the COFF symbol table.

 

 14h

 WORD

 SizeOfOptionalHeader

 This is the size of the IMAGE_OPTIONAL_ HEADER

 

 16h

 WORD

 Characteristics

 Flags with information about the file. The following are some important fields:

  • 0001h - There are no relocations in the file.
  • 0002h - The file is an executable image.
  • 2000h - The file is a dynamic link library, not a program

 

 

Most of these members are not of use to us but we must modify NumberOfSections if we add or delete any sections in the PE file. Characteristics contains flags which dictate for instance whether this PE file is an executable or a DLL. Back to our example in the Hex editor, we can find NumberOfSections by counting a DWORD and a WORD (6 bytes) from the start of the PE header (to allow for the Signature and Machine members):




This can be verified by using any number of different (freeware) PE tools. For instance in PEBrowsePro:



In LordPE, it is shown as follows:


In PEID, it is revealed as follows


Moving on to OptionalHeader, this takes up 224 bytes, the last 128 of which contain the Data Directory. Its definition is as follows:


IMAGE_OPTIONAL_HEADER32 STRUCT
	Magic				WORD
	MajorLinkerVersion		BYTE
	MinorLinkerVersion		BYTE
	SizeOfCode			DWORD
	SizeOfInitializedData		DWORD
	SizeOfUninitializedData		DWORD
	AddressOfEntryPoint		DWORD
	BaseOfCode			DWORD
	BaseOfData			DWORD
	ImageBase			DWORD
	SectionAlignment		DWORD
	FileAlignment			DWORD
	MajorOperatingSystemVersion	WORD
	MinorOperatingSystemVersion	WORD
	MajorImageVersion		WORD
	MinorImageVersion		WORD
	MajorSubsystemVersion		WORD
	MinorSubsystemVersion		WORD
	Win32VersionValue		DWORD
	SizeOfImage			DWORD
	SizeOfHeaders			DWORD
	CheckSum			DWORD
	Subsystem			WORD
	DllCharacteristics		WORD
	SizeOfStackReserve		DWORD
	SizeOfStackCommit		DWORD
	SizeOfHeapReserve		DWORD
	SizeOfHeapCommit		DWORD
	LoaderFlags			DWORD
	NumberOfRvaAndSizes		DWORD
	DataDirectory			IMAGE_DATA_DIRECTORY
IMAGE_OPTIONAL_HEADER32 ENDS


These are explained in the table below


 Offset

 Size

 Field Name

 Description

 Possible Values

 18h

 WORD

 MAGIC

 This value is always set to 010Bh, which can mean that the file is a normal executable image.

 010Bh

 1Ah

 BYTE

 MajorLinkerVersion

 The version of the linker that produced the file. The numbers are written in decimal instead of hex. This field represents the major version number

 

 1Bh

 BYTE

 MinorLinkerVersion

 This is the same as MajorLinkerVersion but represents the minor version number (the number after the dot).

 

 1Ch

 DWORD

 SizeOfCode

 The combined and rounded-up size of all the code sections. But since most files have only one code section, this field matches the size of the .text section.

 

 20h

 DWORD

 SizeOfInitializedData

 This is supposedly the total size of all the sections that are composed of initialized data (excluding code segments). But it does not seem to be consistent with what appears in the file.

 

 24h

 DWORD

 SizeOfUninitializedData

 The size of the sections that the loader commits space for in the virtual address space but that don’t take up any space in the disk file. These sections don’t need to have specific values at program startup, which is why they’re called uninitialized data. Uninitialized data usually goes into a section called .bss.

 

 28h

 DWORD

 AddressOfEntryPoint

 The address where the loader will begin execution. This is a relative virtual address usually found in the code (.text) section

 

 2Ch

 DWORD

 BaseOfCode

 The RVA where the file’s code section begins. The code sections typically come before the data sections and after the PE header in memory

 1000h

 30h

 DWORD

 BaseOfData

 The RVA where the file’s data section begins. The data sections typically come last in memory, after the PE header and the code sections.

 

 34h

 DWORD

 ImageBase

 This field contains the address of the specific location in memory where the file will be mapped. The default base address for Win32 files is 400000h

 400000h

 38h

 DWORD

 SectionAlignment

 When mapped into memory, each section is guaranteed to start at a virtual address that’s a multiple of this value. For paging purposes, the default section alignment is 1000h.

 1000h

 3Ch

 DWORD

 FileAlignment

 In the PE file, the raw data that comprises each section is guaranteed to start at a multiple of this value. The default value is 0x200 bytes, probably to ensure that sections always start at the beginning of a disk sector, which are also 200h bytes in length.

 200h

 40h

 WORD

 MajorOperatingSystemVersion

 The minimum version of the operating system required to use this executable. This field defaults to 1.0 in all Win32 executables.

 100h

 42h

 WORD

 MinorOperatingSystemVersion

 Same as MajorOperatingSystemVersion.

 

 44h

 WORD

 MajorImageVersion

 A user-definable field, which allows the developer to have different versions of an EXE or DLL. The fields can be set via the linker/ VERSION switch.

 

 46h

 WORD

 MinorImageVersion

 Same as MajorImageVersion

 

 48h

 WORD

 MajorSubsytemVersion

 This field contains the minimum subsystem version required to run the executable. A typical value for this field is 3.10, which means Windows NT 3.1.

 

 4Ah

 WORD

 MinorSubsystemVersion

 Same as MajorSubsystemVersion.

 

 4Ch

 DWORD

 Reserved1

 Reserve field and seems to always have 0 as its value.

 00h

 50h

 DWORD

 SizeOfImage

 This is the size of the region starting at the image base up to the end of the last section. The end of the last section is rounded up to the nearest multiple of the section alignment.

 

 54h

 DWORD

 SizeOfHeaders

 This is the size of the PE header and the section (object) table. The raw data for the sections starts immediately after all the header components.

 

 58h

 DWORD

 Checksum

 CRC checksum of the file. As in other Microsoft executable formats, this field is ignored and set to 0. The one exception to this rule is for trusted services, and these EXEs must have a valid checksum.

 

 5Ch

 WORD

 Subsystem

 The type of subsystem that this executable uses for its user interface.

 1 – NATIVE – Doesn’t require a subsystem such as a device driver

 2 – WINDOWS_ GUI – Runs in the Windows GUI subsystem 

3 - WINDOWS_ CUI – Runs in the Windows character subsystem (a console app)

 5 - OS2_CUI – Runs in the OS/2 character subsystem (OS/2 1.x apps only) 

7 - POSIX_CUI – Runs in the Posix character subsystem

 5Eh

 WORD

 DLLCharacteristics

 A set of flags indicating under which circumstances a DLL’s initialization function (such as DLLmain) will be called. This value appears to be always set to 0, yet the OS still calls the DLL initialization function for all four events

1 – Call when DLL is first loaded into a process’s address space

2 – Call when a thread terminates 

4 – Call when a thread starts up 

8 – Call when DLL exits 

 60h

 DWORD

 SizeOfStackReserve

 The amount of virtual memory to reserve for the initial thread’s stack. However, not all of this memory is committed

 

 64h

 DWORD

 SizeOfStackCommit

 The amount of memory initially committed for the initial thread’s stack.

 

 68h

 DWORD

 SizeOfHeapReserve

 The amount of virtual memory to reserve for the initial process heap. However, not all of this memory is committed

 

 6Ch

 DWORD

 SizeOfHeapCommit

 The amount of memory initially committed in the process heap.

 

 70h

 DWORD

 LoaderFlags

 These are fields related to debugging support.

 1 – Invokes a breakpoint instruction before starting the process

2 – Invokes a debugger on the process after it’s been loaded

 74h

 DWORD

 NumberOfRVAAndSize

 This field represents the number of entries in the DataDirectory array. This value is always set to 16 or 10h in hex.

 10h

 


  • AddressOfEntryPoint - The Relative Virtual Address (RVA) of the first instruction that will be executed when the PE loader is ready to run the PE file. If you want to divert the flow of execution right from the start, you need to change the value in this field to a new RVA and the instruction at the new RVA will be executed first. Executable packers usually redirect this value to their decompression stub, after which execution jumps back to the original entry point of the app - the OEP. Of further note is the Starforce protection in which the CODE section is not present in the file on disk but is written into virtual memory on execution. The value in this field is therefore a VA.
  • ImageBase - The preferred load address for the PE file. For example, if the value in this field is 400000h, the PE loader will try to load the file into the virtual address space starting at 400000h. The word "preferred" means that the PE loader may not load the file at that address if some other module already occupied that address range. In 99% of cases it is 400000h. 
  • SectionAlignment - This is the granularity of the alignment of the sections in memory. For example, if the value in this field is 4096 (1000h), each section must start at multiples of 4096 bytes. If the first section is at 401000h and its size is 10 bytes, the next section must be at 402000h even if the address space between 401000h and 402000h will be mostly unused.  
  • FileAlignment - The granularity of the alignment of the sections in the file. For example, if the value in this field is 512 (200h), each section must start at multiples of 512 bytes. If the first section is at file offset 200h and the size is 10 bytes, the next section must be located at file offset 400h: the space between file offsets 522 and 1024 is unused/undefined.
  • SizeOfImage - The overall size of the PE image in memory. It's the sum of all headers and sections aligned to SectionAlignment.
  • SizeOfHeaders - The size of all headers + section table. In short, this value is equal to the file size minus the combined size of all sections in the file. You can also use this value as the file offset of the first section in the PE file.
  • DataDirectory - An array of 16 IMAGE_DATA_DIRECTORY structures, each relating to an important data structure in the PE file such as the import address table.

The overall layout of the PE Header can be seen from the following picture in the hexeditor. Note the DOS header and the parts of the PE header are always the same size (and shape) when viewed in the hexeditor, the DOS STUB can vary in size:


Besides the PE tools mentioned above, Ollydbg can also parse the PE headers into a meaningful display. Open our example in Olly and Press the M button or Alt+M to open the memory map - this shows how the sections of the PE file have been mapped into memory:  


Now right-click on PE header and select Dump in CPU. Next in the hex window, right-click again and select special then PE header:


Now you should see this:


There are some specific points of interest in the optional header. If the last 2 members are both given bogus values, eg 

LoaderFlags = ABDBFFDEh
NumberOfRvaAndSizes = DFFFDDDEh 

Olly will determine the binary is a bad image and will eventually run the app without breaking at the entry point. If you were working with a virus then you would be infected. To avoid this when analyzing malware, open the app in the hexeditor and check the header first. If the NumberOfRvaAndSizes field alone is changed back to 10h the problem is solved. A bogus value in this field can also cause some versions of Softice to reboot.

In addition the SizeOfRawData field in the section header can be given a very high value for one of the sections. This will then cause difficulties for many debugging and disassembling tools.

Another strange twist exists in the story of the PE header. Some of you may have noticed there is a section of garbage data between the DOS stub and the PE header in files linked by Microsoft's Linker. The origin of this data has been discussed in at least 3 forums and although it is not necessary to know about it, it is interesting so I will outline the details here.

PE files produced using Microsoft's development tools contain extra bytes in the DOS stub inserted by the linker Link.exe at compile time. In all cases, the penultimate DWORD is "Rich". This data is not present in files produced with other linkers (eg Borland, GCC, FASM, etc). This behaviour is exhibited by all versions of Microsoft Link.exe from v5.12.8078 which is part of the MASM32 package which ships with the latest Visual C++ packages.

The data includes encrypted codes which identify the components used to compile the PE file. It is said to have led to the prosecution of a virus writer as it allowed Microsoft to prove that the virus was compiled on his PC.

The dword after "Rich" is a key generated by the linker which repeats several times in the garbage data. When we compile a program the compiler puts the string "@comp.id" followed by a DWORD-sized compiler ID number in our obj file. When we link our obj file the linker extracts the comp.id number and XORs it with the key and writes it in the "garbage" as the 2nd DWORD before "Rich". 

The "@comp.id" variables are hard coded:


ML.EXE Ver.6.14.8444 -> comp.id is 1220FC (You can search: FC2012) 
ML.EXE Ver.7.00.9466 -> comp.id is 4024FA (search: FA2440) 
ML.EXE Ver.7.10.2179 -> comp.id is 0F0883 (search: 83080F) 
ML.EXE Ver.7.10.3077-> comp.id is 0F0C05 (search: 050C0F) 
C++ Optimizing Compiler Version 12.00.8804 for 80x86 ->comp.id is 0B2306

The 1st DWORD before "Rich" is the key XORed with a hard coded constant 536E6144h. If we search "@comp.id" in our obj file and substitute the DWORD after it with zeroes we'll see that the second DWORD before "Rich" is equal to the key (DWORD after "Rich"). Here is an example of a simple "hello world" type program coded in MASM32 and open in the hex editor. The extra bytes are highlighted:


Fortunately it is possible to patch the linker to stop this behaviour. There is a utility called SignFinder.exe by Asterix which allows you to find quickly the code which needs patching in any version of Link.exe.


 

So open Link.exe in Olly and press Ctrl+G. Enter 0044510C (the address from signfinder above + ImageBase of Link.exe which is 400000). Then highlight the add instruction as shown, right-click and select binary>fill with NOPs: 


It should look like this:


Finally right-click again and select copy to executable>all modifications. Then click "copy all" and right-click in the new window that pops up and select save file. If we use the patched linker to recompile the same example program we see the extra bytes have gone:


The only other differences between the 2 files are of course e_lfanew (the offset of the PE header), TimeDateStamp and SizeOfHeaders (which is effectively the offset of the first section). 

Post a Comment

Previous Post Next Post