Just as a surgeon should understand the human body and its parts to excel in surgery, a malware reverse engineer should understand the structure and components of a binary to be proficient in malware analysis. Within the Windows operating system, we are referring to the Portable Executable (PE) format.
This article will not discuss every excruciating detail about a Windows executable. If you’re looking to scratch that itch, read through Microsoft’s PE Format and Peering inside the PE articles or start reading about structures defined in winnt.h. Don’t get me wrong – these are excellent reference articles, and I will link to them throughout this post, but tackling each resource in its entirety can be overwhelming.
For this discussion, we will navigate a PE file, focusing primarily on fields associated with the binary’s imported DLLs and functions. My hope is that by concentrating on just this one aspect, you will (1) learn an approach to maneuver the PE structure and (2) apply this approach to better understand terminology related to an executable’s imports.
For this discussion, I will use the freely available CFF Explorer tool that is part of NTCore Explorer Suite. Also, my target file is – brace yourself – notepad.exe. Why use a legitimate file for this exercise? First, to understand the structure of a PE file, you don’t need malware. Second, a deeper understanding of legitimate files allows you to more easily discover anomalies when you analyze suspect files. For more on this topic, read my earlier post on analyzing files, not malware.
Let’s take a walk
We begin our travels through the PE file format, well, at the beginning. After loading notepad.exe into CFF Explorer, you will see headers on the left side that comprise the first bytes of a typical Windows executable. These headers describe the rest of the file, including the executable content, resources, and imports.
Figure 1: MS-DOS Stub
Let’s start with the MS-DOS header (also called the MS-DOS Stub), which displays “This program cannot be run in DOS mode” when the executable is run in MS-DOS. At the beginning of this header (see top-right of Figure 1) is the e_magic field, and it contains the well-known “MZ” characters represented by the hexadecimal value 0x4D5A (shown as 0x5A4D above because the value is interpreted as little-endian). Most fields in this header are not relevant to newer operating systems, but the final field e_lfanew (see below) is significant because it points to the PE header, shown in CFF Explorer as Nt Headers.
Figure 2: Pointer (address) to PE header
Clicking on Nt Headers (below) takes us to file offset 0xF0, which matches the value of e_lfanew above. The value translates to the string “PE”, which typically appears at the beginning of the PE header.
Figure 3: PE header
Next on our path is the COFF File Header, displayed simply as File Header in CFF Explorer. This header includes information such as the target machine type (e.g., x64), the compile timestamp and file characteristics (e.g., is the executable a DLL or EXE?).
Figure 4: File header
Then, we have the Optional Header. By the way, this header is “optional” for files like object files, which are not directly executable. For image files like notepad.exe, which are directly executable, this header is required. It contains a wealth of information that supports loading the executable into memory. One field worth mentioning is the ImageBase (below) which specifies the preferred address where the executable should be mapped in memory. If ALSR is enabled, this address is randomized.
Figure 5: Optional header
At the end of the Optional Header are Data Directories, which point to tables that contain supporting information, including imported and exported functions. As a reminder, we are focused on import-related information for this discussion. Among the listed directories, there are only two groups that refer to imports and have non-zero values, highlighted in red:
Figure 6: Data directories
Both the Import Directory and Import Address Table Directory have RVA and size values. The size is straightforward in that it indicates the size, in bytes, of the table. The Relative Virtual Address (RVA) refers to the location of the specified table. RVA is a virtual address because this is an address after the executable is loaded into memory (i.e., after it is “memory-mapped”). It is relative to the ImageBase, so adding the RVA to the Imagebase provides the Virtual Address (VA) in memory of the specified table.
The final headers we see on the left in CFF Explorer are the Section Headers:
Figure 7: Section headers
The contents of a Windows executable after the headers are organized into sections. The table above provides important information on the name, location (both on disk and in memory) and characteristics of each section. Key sections include “.text” for executable code, “.rdata” for read-only data, and “.rsrc” for resources like icons.
You may have noticed in Figure 6 that both highlighted RVA rows have “.rdata” in the Sections column, indicating both tables reside in that section. How was this determined? First, see .rdata’s Virtual Address value in Figure 7, which is 0x1A000. I should clarify that this column lists RVAs, not VAs as the column heading suggests. Next, note .rdata’s Virtual Size of 0x73A8. Performing simple math shows that the .rdata section will extend from RVA 0x1A000 to 0x213A7 (inclusive). Looking back at Figure 6, RVAs for both the Import Directory and Import Address Table Directory (0x1F300 and 0x1A620, respectively) fall within this range.
Following the import trail
The Import Directory RVA is 0x0001F300 and notepad.exe’s ImageBase is 0x140000000, so the VA is 0x14001F300. What’s located at that address? Looking at this offset within the file on disk will not be helpful since, as mentioned earlier, the VA is an address in memory. As a result, we must use a tool that will load our executable similar to how the Windows loader would in preparation for execution. One approach is to use a dissassembler like IDA Pro, which will load the executable into memory in the same manner as the Windows loader during file execution. For this example, I will use IDA Freeware version 7.0 for Windows.
When loading notepad.exe into IDA, you will see the window below with load options. I recommend unchecking “Create imports segment,” at least for now. Leaving this checked means IDA will create an “.idata” section for imports, and for this discussion I prefer to more closely represent the raw binary by not creating additional sections.
Figure 8: IDA load file options
After clicking “OK,” you will also see a prompt asking if you want to take advantage of debug information. Choose “No” for now.
Next, let’s jump to the VA 0x14001F300 by typing ‘g’ and inserting the address:
Figure 9: Jumping to the Import Directory VA in IDA.
Note that jumping to the above address assumes the loader will respect the address in the ImageBase field. IDA Pro takes this approach, but the Windows loader and other dissassemblers like x64dbg will randomize the ImageBase unless ASLR is disabled for this executable (for more information on this point, see Lenny Zeltser’s article here).
Jumping to 0x14001F300 brings us here:
Figure 10: Beginning of the Import Directory Table
Since we calculated this address using the Import Directory RVA, it should be no surprise that this is the beginning of the Import Directory Table, which contains all the references we need to understand the program’s imports. The above excerpt shows two entries, one per imported DLL. Each entry consists of the following elements:
- Import Name Table (as shown in IDA Pro) or Import Lookup Table (as described in Microsoft documentation) RVA: This points to a list of function names imported from the specified DLL. Using the first entry as an example, double clicking on off_14001F558 takes us to the location below:
Figure 11: Import Name Table
At 0x14001F558 we find a list of addresses that appear in close proximity with one another (for more detail on the format of values in the Import Name Table, see here). Let’s double-click on the first address, word_14001FED0. The destination is below:
Figure 12: Hint/Name Table
This is the beginning of the Hint/Name Table. We see references to functions including OpenProcessToken, GetTokenInformation, and DuplicateEncryption – all functions imported from advapi32.dll. This makes sense since we arrived here after double-clicking the first entry in the advapi32.dll Import Name Table.
One Hint/Name Table covers all imported functions for the file. Each entry in the table has 3 components:
- Hint: This is an index into the imported DLL, and it is used to help locate the required function. In the first Hint/Name table entry above, the value is 0x214.
- Name: The name of the imported function, null terminated. This is used to find the imported function within a DLL when using the Hint does not suffice. In the first entry above, this value is OpenProcessToken.
- Padding: IDA Pro’s “align” directive refers to 0-byte padding.
2. Time Stamp: This value will generally be zero, unless the DLL is binded. DLL binding is out of scope for this post, but see this article to learn more.
3. Forwarder Chain: A DLL may reference another DLL’s functionality, but similar to the Time Stamp field above, this value is generally zero. Again, the details of this field are out of scope for this article, but you can search for “ForwarderChain” in this article for more information.
4. DLL Name RVA: A pointer (address) to the name of the imported DLL. In the case of advapi32.dll, the DLL Name RVA points to the string “ADVAPI32.DLL.”
5. Import Address Table (IAT) RVA: First, understand that the Import Address Table is populated by the loader when the executable and its imported DLLs are mapped into memory, and it is a table of pointers to the imported functions. Each entry in the table is called a “thunk” and the table is referred to as a “thunk table.” With that in mind, the RVA in this field points to the address of the imported function within the IAT. For example, double-clicking on OpenProcessToken at 0x14001F310 in Figure 10 takes us to the location below.
Figure 13: Import Address Table
The reference to OpenProcessToken at 0x14001A620 represents the address in memory where the function code resides. In other words, 0x14001A620 is referenced when OpenProcessToken is called within notepad.exe. To emphasize this point, highlight OpenProcessToken and hit “x” on the keyboard. The xrefs window (below) shows a CALL to the OpenProcessToken API.
Figure 14: OpenProcessToken references
Also note that the first address 0x14001A620 in Figure 13 matches the Import Address Table Directory RVA specified in Figure 6, if you add the ImageBase. This makes sense, because Figure 13 shows the start of the Import Address Table Directory.
This article introduced the PE header and used it as a starting point to explore a file’s imports. To recap, we:
- Began with the MS-DOS Header
- Identified the PE Header
- Observed the ImageBase in the Optional Header
- Viewed the various Data Directories
- Jumped to the Import Directory Table VA using IDA
- Reviewed the components of an Import Directory Table entry, including the Import Lookup Table
- Found its reference to the Hint/Name Table
- Ended at the Import Address Table, which points to the imported functions in memory
If you want to learn more about the PE header and the structures it includes, there are many excellent resources to explore. Below are some of my favorites:
- Microsoft’s Peering Inside the PE
- Microsoft’s PE Format
- Microsoft’s winnt.h documentation
- Corkami’s summarized Portable Executable graphic
- Corkami’s PE 101 graphic
- Corkami’s PE 102 graphic
- MalwareAnalysisForHedgehogs “Malware Theory – Basic Structure of PE Files” Video
About the Author:
Anuj Soni is a Senior Threat Researcher at Cylance, where he performs malware research and reverse engineering. He is also a SANS Certified Instructor and co-author of the course FOR610:Reverse-Engineering Malware. If you would like to learn more about malware analysis strategies, join him at an upcoming SANS FOR610 course.