Analyze files, not malware

Let’s dive right in. In the last post, I mentioned the value of reviewing the import address table (IAT) when performing static file analysis. Take a look at some IAT excerpts from three files:

Screen Shot 2016-03-03 at 9.19.44 PM

Screen Shot 2016-03-03 at 9.20.20 PM

Several functions may pique your interest, including:

  • IsDebuggerPresent and GetTickCount: functions that may be used to detect debugging activity.
  • RegCreateKeyW, RegSetValueExW: functions used to manipulate the registry, perhaps to configure persistence.
  • LoadLibraryW and GetProcAddress: functions used to call other functions at runtime, a strategy that hinders static file analysis.
  • FindResourceW and LoadResource: functions used to access embedded resources, where additional code may reside.

Let’s look behind the curtain:

  • File A = notepad.exe
  • File B = searchindexer.exe
  • File C = spoolsv.exe

These are all legitimate files found on a clean Windows 7 64-bit system.

This is not meant to be a trick, but instead a reminder. The rush of successfully identifying malware is one we all yearn for, but that glorious destination must be earned through careful analysis. This might be as simple as matching a suspect file’s hash with a known bad file hash, or it might require more robust static, behavioral and code analysis. All observations are not created equal, so we must weigh the severity of each one (i.e., how definitively it indicates malicious behavior) and consider their cumulative value when deciding if a file is malware. Files are innocent until proven guilty, and I challenge you to demonstrate, beyond a reasonable doubt, that a particular file is bad.

So how can you sharpen your ability to spot unusual characteristics that may indicate nefarious activity? As in all areas of incident identification and response, we need to understand the normal to discover the anomalous. Pick your favorite legitimate Windows programs and apply your file analysis process. You will likely identify characteristics that might otherwise seem alerting and, with practice, this will increase your tolerance for suspicious characteristics.

Inspecting known good files can also help validate (or invalidate) indicators of potential compromise. Think you’ve discovered a group of API calls, a set of strings, or a particular PE file characteristic that only exists in malware? Search across a large sample of legitimate files to test your theory.

There is nothing wrong with identifying indications of evil during the file analysis process, and that’s arguably the point of initiating an investigation. However, it is critical to view your suspicions about a file as hypotheses that you prove or disprove based on empirical evidence. Otherwise, you might miscategorize a legitimate file as malware, and that not only reflects poorly on you if someone checks your work – but it makes the file sad too.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

 

 

REMnux v6 for Malware Analysis (Part 2): Static File Analayis

Introduction

In this post, we’ll continue exploring some of the helpful capabilities included in REMnux v6. Be sure to regularly update your REMnux VM by running the command update-remnux.

Analyzing suspect files can be overwhelming because there are often numerous paths to explore, and as you continue to observe activity and gather data, the additional areas of analysis seem to explode exponentially. One approach to guide your analysis is to focus first on answering key questions. Another (likely complimentary) approach is to apply the scientific method where you:

  1. Make an observation.
  2. Generate a hypothesis based on that observation.
  3. Test the hypothesis.
  4. Modify the hypothesis based on the outcome of the test and rerun the test.

Static file analysis, where you learn about a suspect file without launching it, can help generate observations that fuel this process.  As a reminder, static file analysis typically results in information such as file and section hashes, compile times, extracted strings, library and function dependencies, and digital signature information. Using the scientific method described above, your analysis of a suspect file may involve the following sequence of activities:

  1. As part of your static analysis process, you extract the ASCII strings from a file and observe the text “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run”.
  2. You hypothesize that the suspect file uses this registry key to maintain persistence on a victim machine.
  3. You run the sample within a Windows 7 virtual machine and realize that this registry key is never modified. You dig deeper via code analysis and realize a Run key is only created if the victim is a Windows XP machine.
  4. You can now modify your hypothesis to specify the Windows XP caveat, rerun the test in a Windows XP VM, and confirm your theory. In doing so, you’ve performed focused analysis, learned about the sample’s persistence mechanism (which can be translated to an IOC), and identified an associated constraint.

Static file analysis is challenging, not because it is technically difficult, but because it is so hard to resist double-clicking immediately. I feel your pain, the double-click is my favorite part too. However, it is worth developing the discipline to complete a static file review before executing the sample because it fosters methodical analysis and produces tangible results.

REMnux includes some great tools to perform static analysis, including the ones listed here. This post will highlight just a few of my favorites.

pecheck.py

pecheck.py, written by Didier Stevens, is a wrapper for the Python pefile module used to parse Windows PE files. Let’s explore this tool by analyzing the BACKSPACE backdoor malware described in FireEye’s APT 30 report. If you want to follow along, you can download the sample here (password: infected). As shown in the output below, running pecheck.py against the sample returns file hashes and file/section entropy calculations. Entropy is a measure of randomness, and more entropy indicates a higher likelihood of encoded or encrypted data. While this information is helpful, I want to focus on the “Dump Info:” section shown towards the end of the excerpt. This section basically runs the pefile dump_info() function, which parses the entire file and outputs, well, a lot of data (see the complete output here).

Screen Shot 2015-12-27 at 7.41.59 PM

Figure 1: pecheck.py output

Among other information, the output includes the contents of the file’s Import Address Table (IAT), which represents the shared libraries (i.e., DLLs) and functions within those DLLs that the program relies upon:

Screen Shot 2016-01-02 at 5.13.25 PM

Figure 2: pecheck.py Import Address Table (IAT) output

I like the <DLL>.<FUNCTION> format because 1) over time, it can help you remember which functions a DLL contains and 2) you can grep for the DLL name or function name and retrieve the entire line (not the case with output from other tools). In this particular excerpt, we can immediately see some Windows API calls that are often used for malicious purposes. For example, we see references to the CreateToolhelp32Snapshot, Process32First, and Process32Next functions commonly used by malware to capture a list of running processes and iterate through that list to enumerate activity or target specific programs. We could explore this hypothesis by using a debugger to set breakpoints on these API calls and determine if there is a certain process the code is looking for. Oh, and in case you’re wondering, the hint refers to the potential location of the function within the corresponding DLL – it’s an optimization that, in this case, is not helpful given that all values are zero.

In the case a program imports a function by ordinal and not name, this will be indicated clearly:

Screen Shot 2016-01-08 at 1.00.56 AM

Figure 3: pecheck.py Import Address Table (IAT) output by ordinal

Note that since the above functions are imported by ordinal only, the function names (e.g., “ioctlsocket”) will not be listed in the strings output:

Screen Shot 2016-01-09 at 5.32.25 PM

Figure 4: Grepping for Windows API

Beyond viewing the IAT output, pecheck.py output includes section hashes, version information, resource information and the ability to configure a PEiD database to search for packer signatures. While pecheck.py may not be the first script you turn to due to the large volume of output, I prefer it to others because I can extract the information I desire based on grep searches or modifications to the Python code. In addition, dump_info() sometimes results in parsing errors that may reveal other interesting anomalous characteristics associated with the target file.

pestr

pestr is part of the pev PE file analysis framework, and its primary purpose is to extract strings from Windows executable files. However, it goes beyond the traditional strings tool by providing options to show the offset of a string within a file and the section where it resides. For example, below are output excerpts after running pestr against the file analyzed above, using the –section option to print the section where the respective string is found (see complete output here):

Screen Shot 2016-01-09 at 6.39.15 PM.png

Figure 4: pestr output #1

Screen Shot 2016-01-09 at 10.15.07 PM

Figure 5: pestr output #2

Figure 4 shows the command executed and the beginning of the output. The first few strings are found in the PE header, so they are labeled as appearing in the “none” section. Figure 5 shows strings in the “.rdata” section, including DLL and Windows API function names. The “.rdata” section commonly contains the Import Address Table, which could explain the presence of these strings here. Looking at the pecheck.py output, we can confirm these strings are, in fact, present in the IAT.

Perusing the remaining pestr output shows additional strings, including the following:

Screen Shot 2016-01-09 at 11.16.06 PM

Figure 6: pestr output #3

Note the presence of GetTickCount, a Windows function that returns the number of milliseconds that have passed since the system was started. This is a popular anti-analysis function because it can help detect if too much time has elapsed during code execution (possibly due to debugging activity).  Interestingly, pestr ouput reveals this function name is located in the “.data” section, rather than “.rdata” section where the IAT resides. We might hypothesize that this is an attempt by the developer to evade traditional import table analysis by manually calling this function during program execution. We can dig deeper by finding the reference to this string in IDA Pro:

code_temp

Figure 7: IDA Pro string reference

While we will not dive into code analysis details in this post, Figure 7 makes it clear that the GetTickCount string reference is indeed used to call the function at runtime using LoadLibraryA and GetProcAddress.

readpe.py + pe-carv.py

readpe.py can output information such as PE header data, imports and exports. For this post, I’ll highlight its simple ability to detect an overlay. An overlay is data appended to the end of an executable (i.e., it falls outside of any data described in the PE header). Using the following command against a Neshta.A specimen, readpe.py can detect if an overlay exists:

Screen Shot 2016-02-06 at 12.00.23 AM

Figure 8: readpe.py overlay output

Upon detecting an overlay, the next step is to evaluate the contents of this additional data. Malware often includes executable content in the overlay, so you might consider using a tool called pe-carv.py, which is purpose-built to carve out embedded PE files:

Screen Shot 2016-02-06 at 12.13.28 AM

Figure 9: pe-carv.py extracted file

As shown in the figure above, pe-carv.py successfully extracted a file it called 1.exe, and we could proceed with further static file analysis to better understand this embedded content.

Closing Thoughts

Static analysis can generate useful data about a file, but it can also help direct your reverse engineering efforts. While running the tools mentioned above may get you the information you need, I encourage you to check out the source code and customize it based on your preferences. In particular, if you’re just getting started with Python, tweaking this code can serve as a great introduction and motivate further study.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

REMnux v6 for Malware Analysis (Part 1): VolDiff

Introduction

As you may have heard, Lenny Zeltser recently released version 6 of his popular REMnux malware analysis Linux distribution. I’m a big fan of REMnux because it reduces some of the overhead associated with malware analysis. Rather than spending hours downloading software, installing tools, and navigating through dependency hell, this distribution gives you access and exposure to numerous tools quickly. Once you see the value of a tool for yourself, you can then dive into the code and configuration files to develop a deeper understanding of its inner workings and customize it to your needs.

This is the first in a series of posts where I will highlight my favorite new additions to REMnux and why you should include them in your malware analysis process.

VolDiff

One quick, effective approach to assessing a suspicious file is to capture a snapshot of system activity, execute the file, capture another snapshot, and then compare the two system states to determine the impact of execution. The popular regshot tool uses this approach to log registry and file system changes after an event like double-clicking malware. VolDiff, included in REMnux v6, allows us to perform similar analysis against memory dumps. Developed by @aim4r, VolDiff is a Python script that uses the Volatility memory analysis framework to analyze two memory dumps and output the differences between them. When applied to memory analysis, this script will focus your attention on memory artifacts generated after, and possibly as a result of, code execution. This can expedite your analysis of large memory dumps to detect activity such as code injection and provide visibility into packed or obfuscated code. However, keep in mind that memory is in a state of flux, so changes included in the diff results are not necessarily caused by executing the suspect file.

VolDiff resides on REMnux in /opt/remnux-scripts/, but you can run it from anywhere since this location is included in the PATH environment variable. If you do not have the latest version of VolDiff (v2.1 at the time of this writing), you can update your remnux-scripts directory by running the commands sudo apt-get update and then sudo apt-get install remnux-scripts.

An Example with VolDiff 

Let’s explore the value of VolDiff with a malware sample. I used a file named funfile.exe, and if you want to follow along, you can download the sample here (password: infected).

A few words about my test environment – since it’s advisable to perform malware analysis within a virtual machine, I used VMware Fusion. For this analysis, I started 1) a REMnux v6 VM with host-only networking and 2) a 32-bit Windows 8.1 VM with host-only networking. Within the Windows VM, I configured the Default Gateway and Preferred DNS Server with REMnux’s IP address. Lastly, I verified connectivity by pinging each host from the other. Note: If you dedicated more than a couple GB of memory to your Windows VMs, consider decreasing this value or you may be waiting hours for VolDiff processing to complete.

Some initial behavioral analysis indicated that this sample generated network traffic to an IP address. Since our goal is to assess memory artifacts, I chose to launch several “fake” services in REMnux to encourage activity. Specifically, I ran the following from a REMnux terminal:

  • accept-all-ips start: This bash shell script written by Lenny Zeltser redirects all network traffic destined for an IP address to the REMnux VM.
  • inetsim: This tool simulates a variety of network services, including HTTP, HTTPS, FTP, and SMTP. If my suspect file expected to contact a web server, for example, I wanted it to do so to facilitate additional activity.

To compare memory dumps using VolDiff, we need to capture a memory image before and after infecting a sacrificial host. With VMware, one approach to obtaining a memory image is to use the snapshot feature. Whenever a snapshot is created, VMware saves a “.vmem” file that includes the contents of memory at the time the snapshot was created. This file can then be analyzed using a memory analysis tool like Volatility. To create the memory dumps VolDiff requires, I followed these steps:

  • I copied funfile.exe to the Windows VM desktop.
  • I created a VM snapshot and noted the new “.vmem” file name on my host.
  • In the Windows VM, I right-clicked funfile.exe and selected “Run as administrator” to execute the sample with admin rights.
  • After giving the sample a couple minutes to run, I created another VM snapshot and noted this second “.vmem” file name.

I then copied these files into REMnux for analysis. While there are several ways to do this, I chose to start the SSH server on REMnux and SCP the files into the VM. To ensure I did not confuse the two “.vmem” files, I renamed my baseline file to “baseline.vmem” and my second snapshot to “infected.vmem”.

To kick off VolDiff against my two memory dumps, I ran the command shown below. Note that the command requires the correct OS profile for the memory images.

VolDiff Command

Figure 1: VolDiff command to compare two memory dumps

VolDiff processed my 2 GB (each) memory images in about 45 minutes. The result was a directory of output, but the critical file to review is VolDiff-report.txt. This file contained the key differences between the two memory dumps. My entire output file can be viewed here, but let’s discuss some excerpts.

Screen Shot 2015-06-25 at 8.58.29 PM

Figure 2: VolDiff malfind results

The output above shows new malfind results. The malfind Volatility plugin helps identify injected code, and in this case it discovered a suspicious memory segment within the svchost.exe process with PID 2976. Looking at the ASCII representation of the first few bytes of this segment, you may recognize the “MZ” string. This likely indicates we are looking at injected, executable code. It’s important to note that running malfind does sometimes result in hits even on a clean system; running the malfind plugin against my baseline image produced one hit. However, VolDiff’s diff operation focused my efforts only on new activity.

Let’s look at some more output:

Screen Shot 2015-06-25 at 9.02.47 PM

Figure 3: VolDiff netscan results

The output above shows new netscan entries. The netscan Volatility plugin locates network artifacts in memory. Running this plugin against my “infected.vmem” alone revealed 57 connection artifacts. Since VolDiff highlights changes in the victim system’s state, it trimmed my analysis data set to only two connections, one of which I have included above. This output clearly shows that the suspicious svchost.exe (based on malfind output) established a TCP connection over port 443.

VolDiff also includes a –malware-checks option to look for anomalous activity in an infected memory dump. You can run this option against a single memory dump if you do not have a baseline, or you can simply add it to the command line to both perform a diff and check the infected memory dump for potentially malicious behavior:

Screen Shot 2015-06-25 at 9.42.00 AM

Figure 4: VolDiff –malware-checks option

Much of the output mirrors the earlier VolDiff-report.txt, but it includes additional checks that compare the infected memory dump against characteristics of a known good Windows systems. You can view the entire output file here, but let’s look at one example included in the report:

Screen Shot 2015-06-25 at 9.05.47 PM

Figure 5: VolDiff –malware-checks result excerpt

In this case, VolDiff indicates that the svchost.exe is running in an unexpected session. Session 0 is reserved for system processes and services, and a legitimate svchost.exe process should be running in that session. However, the svchost.exe with PID 2976 is running in session 1, which is associated with a user session. In this way, VolDiff goes beyond simply diffing two memory snapshots and includes built-in heuristics to identify potential malicious activity. At its core, this is an even more powerful diff operation, because it relies on certain absolutes (i.e., a legitimate svchost.exe always runs in session 0) and makes no assumptions about the state of  your baseline image.

In case you’re wondering, this sample has a 39/53 detection rate on VirusTotal. Microsoft identifies it as a Win32/Tofsee variant, a spambot that is commonly spread via email. As we suspected, it launches and injects executable code into svchost.exe and attempts to connect to IP addresses for command and control.

Closing Thoughts

Diffing two system states is a powerful malware analysis technique because it shines a spotlight on new activity. VolDiff, included in REMnux v6, uses this approach to focus your analysis on memory artifacts most likely associated with code execution. I encourage you to explore this and other REMnux v6 tools on your own, or join me at the upcoming FOR610 Reverse-Engineering Malware course in Virginia Beach this August.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

Key Questions to Guide Malware Analysis

Introduction

Performing malware analysis during incident response can be an exciting, creative exercise. But it can also be a nebulous process, with no clear beginning and no defined end state. While there are numerous articles, books, and tools that cover the topic, the sheer volume of resources can sometimes lead to decision fatigue, and the question becomes: What do I do next? To focus your attention and guide your analysis, begin by answering the four key questions set forth below.

Ask Yourself

To be clear, my intent is not to create a comprehensive list of questions, but to highlight the ones that will yield the most value. If you work on answering these questions first, you will stay on task, make real progress, and better understand the next few steps in the context of your specific incident.

1) What are the artifacts of execution?

This question will fuel your static and behavioral analysis of the sample.  Your precise goal is to document activity on the file system, in memory, and across the network. This includes launched processes, created and deleted files, modified registry entries, and command and control network traffic. Assume the malware sample has the highest level of privilege on your network and has access to all the local and online resources it needs. Also, consider interacting with your analysis environment by launching enterprise applications, browsing to common sites, and rebooting the machine to facilitate activity. Be sure to record any active you observe.

2) What is the potential impact of code execution?

This question expands upon the first question by requiring you to dig deeper and piece together observed activity to determine functionality and purpose. For example, perhaps you observed a file created during behavioral analysis. You must now determine what this file is used for. Does it log keystrokes? Perhaps it stores encoded configuration data that the malware relies upon. Answering this question often requires iterative testing (an important reason to use virtual machines and create snapshots throughout analysis). Reaching a solution may be as simple as completing a few Google searches or may involve more complex code analysis.

3) What is the potential impact of code execution in your environment?

Notice the difference between this question and the second question. While understanding the absolute potential impact is important, in the context of an enterprise security incident, your management is most concerned about the impact on the corporate network. Becoming equipped with the answer to this question and differentiating it from the previous one is a distinguishing factor between a shining, proficient malware analyst and a reckless one.

4) What are the sample’s key host and network indicators of compromise (IOCs)?

Review your artifacts of execution, including registry keys, file names and locations, hashes, C&C specifics, and strings to highlight key information which will allow you to seek out similar activity across the network. This information will not only expedite the detection of other compromised machines on the network, but it will also feed into generating valuable threat intel for your organization and any information sharing partners.

Final Thoughts

As you try to answer these questions, remember that malware analysis is an iterative process. You may not answer each question in totality with 100% certainty before moving onto the next. This is why performing malware analysis is similar to the practice of an art – not because it is indescribable or intangible (these are usually symptoms of a poor process); but because it requires patience and the discipline to know what approaches are working, which ones are not, when you need to start over, and when you need to step away and refocus on the problem at another time.

Clearly, there are other important questions to answer, but investigating the ones listed above will get you moving in the right direction, and answering them as quickly as possible will arm you with the information management often needs once an incident kicks off.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.