tools

SANS FOR610 Reverse-Engineering Malware – Now, with Ghidra

I’m excited to announce that the SANS FOR610 Reverse-Engineering Malware course I co-author with Lenny Zeltser now uses Ghidra for static code analysis. Ghidra is a free and open-source software (FOSS) reverse engineering platform developed by the National Security Agency (NSA). It has an active community of users and contributors, and we are optimistic about the future of this analysis tool. I found it an invaluable addition to my toolkit, as have many other malware analysts.

Ghidra includes a full-featured, visual disassembler. Moreover, it comes with a built-in decompiler, which provides a C representation of the disassembly. Decompiled output complements disassembly nicely, and this additional perspective can accelerate the malware analysis process. For example, let’s compare some disassembly (Figure 1) with the decompiled code (Figure 2):

Picture1

Figure 1: Disassembly Example

Picture2

Figure 2: Decompiled Code

Some aspects of the analysis benefit from the low-level insights that the disassembler providers. Other tasks are faster when looking at the decompiler’s output, which is easier to review and assess. When reverse-engineering malware, I found it helpful to switch between Ghidra’s disassembler and decompiler output.

Ghidra also supports scripts and plugins for extensibility, providing ample opportunity for analysts to automate their work as their reverse engineering skills grow with experience. In addition, Ghidra has multiple collaborative work features to support teamwork for complex analysis tasks. The built-in help menu is an excellent resource to learn more about these features and many more.

If you’re wondering how you might incorporate Ghidra into your toolkit, take a look at the walkthrough I published earlier as an Introduction to Code Analysis With Ghidra. For additional insights, view the 20-minute video I recorded to explain a typical analysis workflow with Ghidra:

I hope you’ll join me and other FOR610 instructors at an upcoming course to explore this impressive analysis framework and strengthen your reverse engineering skills.

-Anuj Soni


About the Author:
Anuj Soni is a Senior Threat Researcher at Cylance, where he performs malware research and reverse engineering. He is also a SANS Certified Instructor and co-author of the course FOR610:Reverse-Engineering Malware. If you would like to learn more about malware analysis strategies, join him at an upcoming SANS FOR610 course.

REMnux v6 for Malware Analysis (Part 2): Static File Analayis

Introduction

In this post, we’ll continue exploring some of the helpful capabilities included in REMnux v6. Be sure to regularly update your REMnux VM by running the command update-remnux.

Analyzing suspect files can be overwhelming because there are often numerous paths to explore, and as you continue to observe activity and gather data, the additional areas of analysis seem to explode exponentially. One approach to guide your analysis is to focus first on answering key questions. Another (likely complimentary) approach is to apply the scientific method where you:

  1. Make an observation.
  2. Generate a hypothesis based on that observation.
  3. Test the hypothesis.
  4. Modify the hypothesis based on the outcome of the test and rerun the test.

Static file analysis, where you learn about a suspect file without launching it, can help generate observations that fuel this process.  As a reminder, static file analysis typically results in information such as file and section hashes, compile times, extracted strings, library and function dependencies, and digital signature information. Using the scientific method described above, your analysis of a suspect file may involve the following sequence of activities:

  1. As part of your static analysis process, you extract the ASCII strings from a file and observe the text “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run”.
  2. You hypothesize that the suspect file uses this registry key to maintain persistence on a victim machine.
  3. You run the sample within a Windows 7 virtual machine and realize that this registry key is never modified. You dig deeper via code analysis and realize a Run key is only created if the victim is a Windows XP machine.
  4. You can now modify your hypothesis to specify the Windows XP caveat, rerun the test in a Windows XP VM, and confirm your theory. In doing so, you’ve performed focused analysis, learned about the sample’s persistence mechanism (which can be translated to an IOC), and identified an associated constraint.

Static file analysis is challenging, not because it is technically difficult, but because it is so hard to resist double-clicking immediately. I feel your pain, the double-click is my favorite part too. However, it is worth developing the discipline to complete a static file review before executing the sample because it fosters methodical analysis and produces tangible results.

REMnux includes some great tools to perform static analysis, including the ones listed here. This post will highlight just a few of my favorites.

pecheck.py

pecheck.py, written by Didier Stevens, is a wrapper for the Python pefile module used to parse Windows PE files. Let’s explore this tool by analyzing the BACKSPACE backdoor malware described in FireEye’s APT 30 report. If you want to follow along, you can download the sample here (password: infected). As shown in the output below, running pecheck.py against the sample returns file hashes and file/section entropy calculations. Entropy is a measure of randomness, and more entropy indicates a higher likelihood of encoded or encrypted data. While this information is helpful, I want to focus on the “Dump Info:” section shown towards the end of the excerpt. This section basically runs the pefile dump_info() function, which parses the entire file and outputs, well, a lot of data (see the complete output here).

Screen Shot 2015-12-27 at 7.41.59 PM

Figure 1: pecheck.py output

Among other information, the output includes the contents of the file’s Import Address Table (IAT), which represents the shared libraries (i.e., DLLs) and functions within those DLLs that the program relies upon:

Screen Shot 2016-01-02 at 5.13.25 PM

Figure 2: pecheck.py Import Address Table (IAT) output

I like the <DLL>.<FUNCTION> format because 1) over time, it can help you remember which functions a DLL contains and 2) you can grep for the DLL name or function name and retrieve the entire line (not the case with output from other tools). In this particular excerpt, we can immediately see some Windows API calls that are often used for malicious purposes. For example, we see references to the CreateToolhelp32Snapshot, Process32First, and Process32Next functions commonly used by malware to capture a list of running processes and iterate through that list to enumerate activity or target specific programs. We could explore this hypothesis by using a debugger to set breakpoints on these API calls and determine if there is a certain process the code is looking for. Oh, and in case you’re wondering, the hint refers to the potential location of the function within the corresponding DLL – it’s an optimization that, in this case, is not helpful given that all values are zero.

In the case a program imports a function by ordinal and not name, this will be indicated clearly:

Screen Shot 2016-01-08 at 1.00.56 AM

Figure 3: pecheck.py Import Address Table (IAT) output by ordinal

Note that since the above functions are imported by ordinal only, the function names (e.g., “ioctlsocket”) will not be listed in the strings output:

Screen Shot 2016-01-09 at 5.32.25 PM

Figure 4: Grepping for Windows API

Beyond viewing the IAT output, pecheck.py output includes section hashes, version information, resource information and the ability to configure a PEiD database to search for packer signatures. While pecheck.py may not be the first script you turn to due to the large volume of output, I prefer it to others because I can extract the information I desire based on grep searches or modifications to the Python code. In addition, dump_info() sometimes results in parsing errors that may reveal other interesting anomalous characteristics associated with the target file.

pestr

pestr is part of the pev PE file analysis framework, and its primary purpose is to extract strings from Windows executable files. However, it goes beyond the traditional strings tool by providing options to show the offset of a string within a file and the section where it resides. For example, below are output excerpts after running pestr against the file analyzed above, using the –section option to print the section where the respective string is found (see complete output here):

Screen Shot 2016-01-09 at 6.39.15 PM.png

Figure 4: pestr output #1

Screen Shot 2016-01-09 at 10.15.07 PM

Figure 5: pestr output #2

Figure 4 shows the command executed and the beginning of the output. The first few strings are found in the PE header, so they are labeled as appearing in the “none” section. Figure 5 shows strings in the “.rdata” section, including DLL and Windows API function names. The “.rdata” section commonly contains the Import Address Table, which could explain the presence of these strings here. Looking at the pecheck.py output, we can confirm these strings are, in fact, present in the IAT.

Perusing the remaining pestr output shows additional strings, including the following:

Screen Shot 2016-01-09 at 11.16.06 PM

Figure 6: pestr output #3

Note the presence of GetTickCount, a Windows function that returns the number of milliseconds that have passed since the system was started. This is a popular anti-analysis function because it can help detect if too much time has elapsed during code execution (possibly due to debugging activity).  Interestingly, pestr ouput reveals this function name is located in the “.data” section, rather than “.rdata” section where the IAT resides. We might hypothesize that this is an attempt by the developer to evade traditional import table analysis by manually calling this function during program execution. We can dig deeper by finding the reference to this string in IDA Pro:

code_temp

Figure 7: IDA Pro string reference

While we will not dive into code analysis details in this post, Figure 7 makes it clear that the GetTickCount string reference is indeed used to call the function at runtime using LoadLibraryA and GetProcAddress.

readpe.py + pe-carv.py

readpe.py can output information such as PE header data, imports and exports. For this post, I’ll highlight its simple ability to detect an overlay. An overlay is data appended to the end of an executable (i.e., it falls outside of any data described in the PE header). Using the following command against a Neshta.A specimen, readpe.py can detect if an overlay exists:

Screen Shot 2016-02-06 at 12.00.23 AM

Figure 8: readpe.py overlay output

Upon detecting an overlay, the next step is to evaluate the contents of this additional data. Malware often includes executable content in the overlay, so you might consider using a tool called pe-carv.py, which is purpose-built to carve out embedded PE files:

Screen Shot 2016-02-06 at 12.13.28 AM

Figure 9: pe-carv.py extracted file

As shown in the figure above, pe-carv.py successfully extracted a file it called 1.exe, and we could proceed with further static file analysis to better understand this embedded content.

Closing Thoughts

Static analysis can generate useful data about a file, but it can also help direct your reverse engineering efforts. While running the tools mentioned above may get you the information you need, I encourage you to check out the source code and customize it based on your preferences. In particular, if you’re just getting started with Python, tweaking this code can serve as a great introduction and motivate further study.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

-Anuj Soni


About the Author:
Anuj Soni is a Senior Threat Researcher at Cylance, where he performs malware research and reverse engineering. He is also a SANS Certified Instructor and co-author of the course FOR610:Reverse-Engineering Malware. If you would like to learn more about malware analysis strategies, join him at an upcoming SANS FOR610 course.

REMnux v6 for Malware Analysis (Part 1): VolDiff

Introduction

As you may have heard, Lenny Zeltser recently released version 6 of his popular REMnux malware analysis Linux distribution. I’m a big fan of REMnux because it reduces some of the overhead associated with malware analysis. Rather than spending hours downloading software, installing tools, and navigating through dependency hell, this distribution gives you access and exposure to numerous tools quickly. Once you see the value of a tool for yourself, you can then dive into the code and configuration files to develop a deeper understanding of its inner workings and customize it to your needs.

This is the first in a series of posts where I will highlight my favorite new additions to REMnux and why you should include them in your malware analysis process.

VolDiff

One quick, effective approach to assessing a suspicious file is to capture a snapshot of system activity, execute the file, capture another snapshot, and then compare the two system states to determine the impact of execution. The popular regshot tool uses this approach to log registry and file system changes after an event like double-clicking malware. VolDiff, included in REMnux v6, allows us to perform similar analysis against memory dumps. Developed by @aim4r, VolDiff is a Python script that uses the Volatility memory analysis framework to analyze two memory dumps and output the differences between them. When applied to memory analysis, this script will focus your attention on memory artifacts generated after, and possibly as a result of, code execution. This can expedite your analysis of large memory dumps to detect activity such as code injection and provide visibility into packed or obfuscated code. However, keep in mind that memory is in a state of flux, so changes included in the diff results are not necessarily caused by executing the suspect file.

VolDiff resides on REMnux in /opt/remnux-scripts/, but you can run it from anywhere since this location is included in the PATH environment variable. If you do not have the latest version of VolDiff (v2.1 at the time of this writing), you can update your remnux-scripts directory by running the commands sudo apt-get update and then sudo apt-get install remnux-scripts.

An Example with VolDiff 

Let’s explore the value of VolDiff with a malware sample. I used a file named funfile.exe, and if you want to follow along, you can download the sample here (password: infected).

A few words about my test environment – since it’s advisable to perform malware analysis within a virtual machine, I used VMware Fusion. For this analysis, I started 1) a REMnux v6 VM with host-only networking and 2) a 32-bit Windows 8.1 VM with host-only networking. Within the Windows VM, I configured the Default Gateway and Preferred DNS Server with REMnux’s IP address. Lastly, I verified connectivity by pinging each host from the other. Note: If you dedicated more than a couple GB of memory to your Windows VMs, consider decreasing this value or you may be waiting hours for VolDiff processing to complete.

Some initial behavioral analysis indicated that this sample generated network traffic to an IP address. Since our goal is to assess memory artifacts, I chose to launch several “fake” services in REMnux to encourage activity. Specifically, I ran the following from a REMnux terminal:

  • accept-all-ips start: This bash shell script written by Lenny Zeltser redirects all network traffic destined for an IP address to the REMnux VM.
  • inetsim: This tool simulates a variety of network services, including HTTP, HTTPS, FTP, and SMTP. If my suspect file expected to contact a web server, for example, I wanted it to do so to facilitate additional activity.

To compare memory dumps using VolDiff, we need to capture a memory image before and after infecting a sacrificial host. With VMware, one approach to obtaining a memory image is to use the snapshot feature. Whenever a snapshot is created, VMware saves a “.vmem” file that includes the contents of memory at the time the snapshot was created. This file can then be analyzed using a memory analysis tool like Volatility. To create the memory dumps VolDiff requires, I followed these steps:

  • I copied funfile.exe to the Windows VM desktop.
  • I created a VM snapshot and noted the new “.vmem” file name on my host.
  • In the Windows VM, I right-clicked funfile.exe and selected “Run as administrator” to execute the sample with admin rights.
  • After giving the sample a couple minutes to run, I created another VM snapshot and noted this second “.vmem” file name.

I then copied these files into REMnux for analysis. While there are several ways to do this, I chose to start the SSH server on REMnux and SCP the files into the VM. To ensure I did not confuse the two “.vmem” files, I renamed my baseline file to “baseline.vmem” and my second snapshot to “infected.vmem”.

To kick off VolDiff against my two memory dumps, I ran the command shown below. Note that the command requires the correct OS profile for the memory images.

VolDiff Command

Figure 1: VolDiff command to compare two memory dumps

VolDiff processed my 2 GB (each) memory images in about 45 minutes. The result was a directory of output, but the critical file to review is VolDiff-report.txt. This file contained the key differences between the two memory dumps. My entire output file can be viewed here, but let’s discuss some excerpts.

Screen Shot 2015-06-25 at 8.58.29 PM

Figure 2: VolDiff malfind results

The output above shows new malfind results. The malfind Volatility plugin helps identify injected code, and in this case it discovered a suspicious memory segment within the svchost.exe process with PID 2976. Looking at the ASCII representation of the first few bytes of this segment, you may recognize the “MZ” string. This likely indicates we are looking at injected, executable code. It’s important to note that running malfind does sometimes result in hits even on a clean system; running the malfind plugin against my baseline image produced one hit. However, VolDiff’s diff operation focused my efforts only on new activity.

Let’s look at some more output:

Screen Shot 2015-06-25 at 9.02.47 PM

Figure 3: VolDiff netscan results

The output above shows new netscan entries. The netscan Volatility plugin locates network artifacts in memory. Running this plugin against my “infected.vmem” alone revealed 57 connection artifacts. Since VolDiff highlights changes in the victim system’s state, it trimmed my analysis data set to only two connections, one of which I have included above. This output clearly shows that the suspicious svchost.exe (based on malfind output) established a TCP connection over port 443.

VolDiff also includes a –malware-checks option to look for anomalous activity in an infected memory dump. You can run this option against a single memory dump if you do not have a baseline, or you can simply add it to the command line to both perform a diff and check the infected memory dump for potentially malicious behavior:

Screen Shot 2015-06-25 at 9.42.00 AM

Figure 4: VolDiff –malware-checks option

Much of the output mirrors the earlier VolDiff-report.txt, but it includes additional checks that compare the infected memory dump against characteristics of a known good Windows systems. You can view the entire output file here, but let’s look at one example included in the report:

Screen Shot 2015-06-25 at 9.05.47 PM

Figure 5: VolDiff –malware-checks result excerpt

In this case, VolDiff indicates that the svchost.exe is running in an unexpected session. Session 0 is reserved for system processes and services, and a legitimate svchost.exe process should be running in that session. However, the svchost.exe with PID 2976 is running in session 1, which is associated with a user session. In this way, VolDiff goes beyond simply diffing two memory snapshots and includes built-in heuristics to identify potential malicious activity. At its core, this is an even more powerful diff operation, because it relies on certain absolutes (i.e., a legitimate svchost.exe always runs in session 0) and makes no assumptions about the state of  your baseline image.

In case you’re wondering, this sample has a 39/53 detection rate on VirusTotal. Microsoft identifies it as a Win32/Tofsee variant, a spambot that is commonly spread via email. As we suspected, it launches and injects executable code into svchost.exe and attempts to connect to IP addresses for command and control.

Closing Thoughts

Diffing two system states is a powerful malware analysis technique because it shines a spotlight on new activity. VolDiff, included in REMnux v6, uses this approach to focus your analysis on memory artifacts most likely associated with code execution. I encourage you to explore this and other REMnux v6 tools on your own, or join me at the upcoming FOR610 Reverse-Engineering Malware course in Virginia Beach this August.

-Anuj Soni


About the Author:
Anuj Soni is a Senior Threat Researcher at Cylance, where he performs malware research and reverse engineering. He is also a SANS Certified Instructor and co-author of the course FOR610:Reverse-Engineering Malware. If you would like to learn more about malware analysis strategies, join him at an upcoming SANS FOR610 course.