REMnux v6 for Malware Analysis (Part 2): Static File Analayis

Introduction

In this post, we’ll continue exploring some of the helpful capabilities included in REMnux v6. Be sure to regularly update your REMnux VM by running the command update-remnux.

Analyzing suspect files can be overwhelming because there are often numerous paths to explore, and as you continue to observe activity and gather data, the additional areas of analysis seem to explode exponentially. One approach to guide your analysis is to focus first on answering key questions. Another (likely complimentary) approach is to apply the scientific method where you:

  1. Make an observation.
  2. Generate a hypothesis based on that observation.
  3. Test the hypothesis.
  4. Modify the hypothesis based on the outcome of the test and rerun the test.

Static file analysis, where you learn about a suspect file without launching it, can help generate observations that fuel this process.  As a reminder, static file analysis typically results in information such as file and section hashes, compile times, extracted strings, library and function dependencies, and digital signature information. Using the scientific method described above, your analysis of a suspect file may involve the following sequence of activities:

  1. As part of your static analysis process, you extract the ASCII strings from a file and observe the text “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run”.
  2. You hypothesize that the suspect file uses this registry key to maintain persistence on a victim machine.
  3. You run the sample within a Windows 7 virtual machine and realize that this registry key is never modified. You dig deeper via code analysis and realize a Run key is only created if the victim is a Windows XP machine.
  4. You can now modify your hypothesis to specify the Windows XP caveat, rerun the test in a Windows XP VM, and confirm your theory. In doing so, you’ve performed focused analysis, learned about the sample’s persistence mechanism (which can be translated to an IOC), and identified an associated constraint.

Static file analysis is challenging, not because it is technically difficult, but because it is so hard to resist double-clicking immediately. I feel your pain, the double-click is my favorite part too. However, it is worth developing the discipline to complete a static file review before executing the sample because it fosters methodical analysis and produces tangible results.

REMnux includes some great tools to perform static analysis, including the ones listed here. This post will highlight just a few of my favorites.

pecheck.py

pecheck.py, written by Didier Stevens, is a wrapper for the Python pefile module used to parse Windows PE files. Let’s explore this tool by analyzing the BACKSPACE backdoor malware described in FireEye’s APT 30 report. If you want to follow along, you can download the sample here (password: infected). As shown in the output below, running pecheck.py against the sample returns file hashes and file/section entropy calculations. Entropy is a measure of randomness, and more entropy indicates a higher likelihood of encoded or encrypted data. While this information is helpful, I want to focus on the “Dump Info:” section shown towards the end of the excerpt. This section basically runs the pefile dump_info() function, which parses the entire file and outputs, well, a lot of data (see the complete output here).

Screen Shot 2015-12-27 at 7.41.59 PM

Figure 1: pecheck.py output

Among other information, the output includes the contents of the file’s Import Address Table (IAT), which represents the shared libraries (i.e., DLLs) and functions within those DLLs that the program relies upon:

Screen Shot 2016-01-02 at 5.13.25 PM

Figure 2: pecheck.py Import Address Table (IAT) output

I like the <DLL>.<FUNCTION> format because 1) over time, it can help you remember which functions a DLL contains and 2) you can grep for the DLL name or function name and retrieve the entire line (not the case with output from other tools). In this particular excerpt, we can immediately see some Windows API calls that are often used for malicious purposes. For example, we see references to the CreateToolhelp32Snapshot, Process32First, and Process32Next functions commonly used by malware to capture a list of running processes and iterate through that list to enumerate activity or target specific programs. We could explore this hypothesis by using a debugger to set breakpoints on these API calls and determine if there is a certain process the code is looking for. Oh, and in case you’re wondering, the hint refers to the potential location of the function within the corresponding DLL – it’s an optimization that, in this case, is not helpful given that all values are zero.

In the case a program imports a function by ordinal and not name, this will be indicated clearly:

Screen Shot 2016-01-08 at 1.00.56 AM

Figure 3: pecheck.py Import Address Table (IAT) output by ordinal

Note that since the above functions are imported by ordinal only, the function names (e.g., “ioctlsocket”) will not be listed in the strings output:

Screen Shot 2016-01-09 at 5.32.25 PM

Figure 4: Grepping for Windows API

Beyond viewing the IAT output, pecheck.py output includes section hashes, version information, resource information and the ability to configure a PEiD database to search for packer signatures. While pecheck.py may not be the first script you turn to due to the large volume of output, I prefer it to others because I can extract the information I desire based on grep searches or modifications to the Python code. In addition, dump_info() sometimes results in parsing errors that may reveal other interesting anomalous characteristics associated with the target file.

pestr

pestr is part of the pev PE file analysis framework, and its primary purpose is to extract strings from Windows executable files. However, it goes beyond the traditional strings tool by providing options to show the offset of a string within a file and the section where it resides. For example, below are output excerpts after running pestr against the file analyzed above, using the –section option to print the section where the respective string is found (see complete output here):

Screen Shot 2016-01-09 at 6.39.15 PM.png

Figure 4: pestr output #1

Screen Shot 2016-01-09 at 10.15.07 PM

Figure 5: pestr output #2

Figure 4 shows the command executed and the beginning of the output. The first few strings are found in the PE header, so they are labeled as appearing in the “none” section. Figure 5 shows strings in the “.rdata” section, including DLL and Windows API function names. The “.rdata” section commonly contains the Import Address Table, which could explain the presence of these strings here. Looking at the pecheck.py output, we can confirm these strings are, in fact, present in the IAT.

Perusing the remaining pestr output shows additional strings, including the following:

Screen Shot 2016-01-09 at 11.16.06 PM

Figure 6: pestr output #3

Note the presence of GetTickCount, a Windows function that returns the number of milliseconds that have passed since the system was started. This is a popular anti-analysis function because it can help detect if too much time has elapsed during code execution (possibly due to debugging activity).  Interestingly, pestr ouput reveals this function name is located in the “.data” section, rather than “.rdata” section where the IAT resides. We might hypothesize that this is an attempt by the developer to evade traditional import table analysis by manually calling this function during program execution. We can dig deeper by finding the reference to this string in IDA Pro:

code_temp

Figure 7: IDA Pro string reference

While we will not dive into code analysis details in this post, Figure 7 makes it clear that the GetTickCount string reference is indeed used to call the function at runtime using LoadLibraryA and GetProcAddress.

readpe.py + pe-carv.py

readpe.py can output information such as PE header data, imports and exports. For this post, I’ll highlight its simple ability to detect an overlay. An overlay is data appended to the end of an executable (i.e., it falls outside of any data described in the PE header). Using the following command against a Neshta.A specimen, readpe.py can detect if an overlay exists:

Screen Shot 2016-02-06 at 12.00.23 AM

Figure 8: readpe.py overlay output

Upon detecting an overlay, the next step is to evaluate the contents of this additional data. Malware often includes executable content in the overlay, so you might consider using a tool called pe-carv.py, which is purpose-built to carve out embedded PE files:

Screen Shot 2016-02-06 at 12.13.28 AM

Figure 9: pe-carv.py extracted file

As shown in the figure above, pe-carv.py successfully extracted a file it called 1.exe, and we could proceed with further static file analysis to better understand this embedded content.

Closing Thoughts

Static analysis can generate useful data about a file, but it can also help direct your reverse engineering efforts. While running the tools mentioned above may get you the information you need, I encourage you to check out the source code and customize it based on your preferences. In particular, if you’re just getting started with Python, tweaking this code can serve as a great introduction and motivate further study.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s