In this post, we’ll continue exploring some of the helpful capabilities included in REMnux v6. Be sure to regularly update your REMnux VM by running the command update-remnux.
Analyzing suspect files can be overwhelming because there are often numerous paths to explore, and as you continue to observe activity and gather data, the additional areas of analysis seem to explode exponentially. One approach to guide your analysis is to focus first on answering key questions. Another (likely complimentary) approach is to apply the scientific method where you:
- Make an observation.
- Generate a hypothesis based on that observation.
- Test the hypothesis.
- Modify the hypothesis based on the outcome of the test and rerun the test.
Static file analysis, where you learn about a suspect file without launching it, can help generate observations that fuel this process. As a reminder, static file analysis typically results in information such as file and section hashes, compile times, extracted strings, library and function dependencies, and digital signature information. Using the scientific method described above, your analysis of a suspect file may involve the following sequence of activities:
- As part of your static analysis process, you extract the ASCII strings from a file and observe the text “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run”.
- You hypothesize that the suspect file uses this registry key to maintain persistence on a victim machine.
- You run the sample within a Windows 7 virtual machine and realize that this registry key is never modified. You dig deeper via code analysis and realize a Run key is only created if the victim is a Windows XP machine.
- You can now modify your hypothesis to specify the Windows XP caveat, rerun the test in a Windows XP VM, and confirm your theory. In doing so, you’ve performed focused analysis, learned about the sample’s persistence mechanism (which can be translated to an IOC), and identified an associated constraint.
Static file analysis is challenging, not because it is technically difficult, but because it is so hard to resist double-clicking immediately. I feel your pain, the double-click is my favorite part too. However, it is worth developing the discipline to complete a static file review before executing the sample because it fosters methodical analysis and produces tangible results.
REMnux includes some great tools to perform static analysis, including the ones listed here. This post will highlight just a few of my favorites.
pecheck.py, written by Didier Stevens, is a wrapper for the Python pefile module used to parse Windows PE files. Let’s explore this tool by analyzing the BACKSPACE backdoor malware described in FireEye’s APT 30 report. If you want to follow along, you can download the sample here (password: infected). As shown in the output below, running pecheck.py against the sample returns file hashes and file/section entropy calculations. Entropy is a measure of randomness, and more entropy indicates a higher likelihood of encoded or encrypted data. While this information is helpful, I want to focus on the “Dump Info:” section shown towards the end of the excerpt. This section basically runs the pefile dump_info() function, which parses the entire file and outputs, well, a lot of data (see the complete output here).
Figure 1: pecheck.py output
Among other information, the output includes the contents of the file’s Import Address Table (IAT), which represents the shared libraries (i.e., DLLs) and functions within those DLLs that the program relies upon:
Figure 2: pecheck.py Import Address Table (IAT) output
I like the <DLL>.<FUNCTION> format because 1) over time, it can help you remember which functions a DLL contains and 2) you can grep for the DLL name or function name and retrieve the entire line (not the case with output from other tools). In this particular excerpt, we can immediately see some Windows API calls that are often used for malicious purposes. For example, we see references to the CreateToolhelp32Snapshot, Process32First, and Process32Next functions commonly used by malware to capture a list of running processes and iterate through that list to enumerate activity or target specific programs. We could explore this hypothesis by using a debugger to set breakpoints on these API calls and determine if there is a certain process the code is looking for. Oh, and in case you’re wondering, the hint refers to the potential location of the function within the corresponding DLL – it’s an optimization that, in this case, is not helpful given that all values are zero.
In the case a program imports a function by ordinal and not name, this will be indicated clearly:
Figure 3: pecheck.py Import Address Table (IAT) output by ordinal
Note that since the above functions are imported by ordinal only, the function names (e.g., “ioctlsocket”) will not be listed in the strings output:
Figure 4: Grepping for Windows API
Beyond viewing the IAT output, pecheck.py output includes section hashes, version information, resource information and the ability to configure a PEiD database to search for packer signatures. While pecheck.py may not be the first script you turn to due to the large volume of output, I prefer it to others because I can extract the information I desire based on grep searches or modifications to the Python code. In addition, dump_info() sometimes results in parsing errors that may reveal other interesting anomalous characteristics associated with the target file.
pestr is part of the pev PE file analysis framework, and its primary purpose is to extract strings from Windows executable files. However, it goes beyond the traditional strings tool by providing options to show the offset of a string within a file and the section where it resides. For example, below are output excerpts after running pestr against the file analyzed above, using the –section option to print the section where the respective string is found (see complete output here):
Figure 4: pestr output #1
Figure 5: pestr output #2
Figure 4 shows the command executed and the beginning of the output. The first few strings are found in the PE header, so they are labeled as appearing in the “none” section. Figure 5 shows strings in the “.rdata” section, including DLL and Windows API function names. The “.rdata” section commonly contains the Import Address Table, which could explain the presence of these strings here. Looking at the pecheck.py output, we can confirm these strings are, in fact, present in the IAT.
Perusing the remaining pestr output shows additional strings, including the following:
Figure 6: pestr output #3
Note the presence of GetTickCount, a Windows function that returns the number of milliseconds that have passed since the system was started. This is a popular anti-analysis function because it can help detect if too much time has elapsed during code execution (possibly due to debugging activity). Interestingly, pestr ouput reveals this function name is located in the “.data” section, rather than “.rdata” section where the IAT resides. We might hypothesize that this is an attempt by the developer to evade traditional import table analysis by manually calling this function during program execution. We can dig deeper by finding the reference to this string in IDA Pro:
Figure 7: IDA Pro string reference
While we will not dive into code analysis details in this post, Figure 7 makes it clear that the GetTickCount string reference is indeed used to call the function at runtime using LoadLibraryA and GetProcAddress.
readpe.py + pe-carv.py
readpe.py can output information such as PE header data, imports and exports. For this post, I’ll highlight its simple ability to detect an overlay. An overlay is data appended to the end of an executable (i.e., it falls outside of any data described in the PE header). Using the following command against a Neshta.A specimen, readpe.py can detect if an overlay exists:
Figure 8: readpe.py overlay output
Upon detecting an overlay, the next step is to evaluate the contents of this additional data. Malware often includes executable content in the overlay, so you might consider using a tool called pe-carv.py, which is purpose-built to carve out embedded PE files:
Figure 9: pe-carv.py extracted file
As shown in the figure above, pe-carv.py successfully extracted a file it called 1.exe, and we could proceed with further static file analysis to better understand this embedded content.
Static analysis can generate useful data about a file, but it can also help direct your reverse engineering efforts. While running the tools mentioned above may get you the information you need, I encourage you to check out the source code and customize it based on your preferences. In particular, if you’re just getting started with Python, tweaking this code can serve as a great introduction and motivate further study.
If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.
About the Author:
Anuj Soni is a Senior Threat Researcher at Cylance, where he performs malware research and reverse engineering. He is also a SANS Certified Instructor and co-author of the course FOR610:Reverse-Engineering Malware. If you would like to learn more about malware analysis strategies, join him at an upcoming SANS FOR610 course.