Python for Malware Analysis – Getting Started

Introduction

Improving your Python programming skills is likely on your to-do list – just like cleaning your closet, painting that wall, or tightening that loose screw (you know which one I’m talking about).

Scripting, in general, is a useful skill to have across most security disciplines.  Writing a script can help you automate a menial task, scale your analysis to large volumes of data, and share your work.

Although there are multiple programming languages to choose from, Python is the most popular language of choice because, among other reasons, it is cross platform and relatively easy to read and write. Many existing open-source security tools are also written in Python, so learning this language helps you better understand existing capabilities.

This blog post introduces Python programming for Portable Executable (PE) file analysis. In this context, a script can enable you to quickly parse an individual file and extract key characteristics, or scale that activity across numerous files to help prioritize work.

Note that this post assumes the reader has had some basic exposure to Python and programming concepts.

Learning Python

With some basic programming skills, it’s possible to improve your knowledge of Python by simply reviewing existing code, making changes as needed. While simply tweaking code may yield the desired results in some cases, many will likely benefit from a more formal introduction to the language. A quick online search will reveal many freely available written and video Python tutorials. For a structured, interactive introduction, I recommend Code Academy. If you’re available for a more rigorous, immersive Python learning experience, consider the SANS SEC573 “Automating Information Security with Python” course (full disclosure, I’m a SANS Certified Instructor).

Existing Tools

There are many Python-based malware analysis tools you can use today. Below are just a few that I find helpful for static file analysis:

These tools produce useful output and serve as excellent starting points for understanding Python. By simply viewing the source code and performing research as necessary, you can learn from what the authors wrote and modify the code to serve your own purpose. However, as you build experience in technical analysis, you will likely encounter scenarios where existing tools do not meet your needs, and a  customized solution must be developed. Rest assured, these cases do not require you to write code from scratch. Instead, you can rely upon existing Python libraries to extract data and manipulate output in a way specific to your needs.

A popular, long-standing library for PE file analysis is aptly called pefile. This module provides easy access to the structure of a portable executable. Another fairly recent and more versatile cross-platform library is called Library to Instrument Executable Formats (LIEF), and it includes a Python module for PE file analysis (documented here).

This blog post will focus on using Python 2 and pefile for file analysis. Note that pefile is a third-party module, not one that is built-in with a standard Python install.  As a result, you may have to install it it first; try pip install pefile.

Exploring pefile

For our environment, we will use the REMnux malware analysis Linux distribution, which you can download here. We begin by launching the Python interactive shell to explore the pefile module and write some initial code. Rather than diving straight into creating a script, the interactive shell is a great way to learn about available modules and perform quick testing. Simply type python at the terminal and you’ll see a prompt similar to the following:

python
Next, import pefile to make use of its functionality:

import_pefile.jpg

Let’s explore this module by viewing its help information. Type help(pefile). Below is an excerpt of the output.

help_pefile.jpg

In addition to an overview of the module, we see a description of classes contained within the module. Scrolling down provides information about each class. For now, we will only focus on the PE class:

class_PE

The description tells us that this class will give us access to the structure of a PE file, which is precisely what we need for our Windows file analysis. The output also explains how to create an instance of the PE class. Let’s read in a file for testing. For this post, we’ll use an emotet sample.

pefile_pe

We can return to the help menu to read more about the methods and attributes of the PE class. Alternatively, we can view a summary of this information by typing dir(pefile.PE). An excerpt of this output is below.

help_pefilepe

There is a lot of text here, and much of it may not make depending on your prior exposure to PE file analysis. However, let’s look for some basic terms we may recognize. We see references to multiple methods beginning with “get_” that are helpful for collecting some basic static information about a file. For example, get_impash() returns an MD5 hash of the Import Address Table (IAT). Let’s give this a try using our file instance.

file_instance

The get_imphash() method worked as expected, providing the file’s import table hash.

Another”get_” function I find valuable is get_warnings(). When pefile parses a Windows executable, it may encounter errors along the way. The get_warnings() function returns a list of warnings generated as the PE file is processed. Security analysis is all about investigating anomalies, so this output can reveal useful starting points for further review. For example, this function’s output may indicate the file is obfuscated, even if the specific packer cannot be identified by common tools that look for packer signatures (e.g., ExeInfo or PEid). In this particular case, however, executing the function did not provide errors:

get_warnings.jpg

Let’s continue our journey with pefile and extract other static information often reviewed during initial malware analysis. For example, how can we use pefile to understand which DLLs and functions are imported by this executable? To answer this question, we will again use the built-in help() system with some old fashioned trial and error. This methodology can be used with any well documented Python module.

First, let’s review our options by learning more about the PE class. We can type help(pefile.PE) and scroll through the output. An excerpt of interest is below:

sections.jpg

We see references to many “DIRECTORY_ENTRY_” attributes, which point to the location of key file components. Since we’re interested in imports, we will focus on DIRECTORY_ENTRY_IMPORT, which is described as a list of ImportDescData instances. Let’s begin by iterating through this list to see what information it provides:

item.jpg

Just as the the help output specified, we see a list of ImportDescData objects. What do these objects represent? We will return to help again and type help(pefile.ImportDescData):

ImportDescData.jpg

As shown above, this structure contains the name of the DLL and a list of imported symbols. This sounds like the information we need. Let’s again iterate to confirm:

ImportData.jpg

We’re making progress, but we have a new structure to investigate. We type help(pefile.ImportData):

ImportData2

For now, we will just focus on imports by name, so the name attribute should have the information we need. Let’s incorporate this into our code and make the output a bit more readable.

imports.jpg

Success! This code provided us with the name of an imported DLL and its corresponding imported function names. We could make this output more elegant, but the information we need is here.

Scaling

As discussed in the Introduction, automating work with a script enables you to scale a task across a larger volume of data. The individual file analysis performed above has its place, but if your day-to-day job involves malware analysis, you may have hundreds or thousands of files to sift through before choosing one for closer review. In these scenarios, extracting key information from all files allows you to group and prioritize samples for more efficient analysis.

Let’s again consider a file’s imphash. Across a large number of samples, grouping by imphash makes it easier to identify similar functionality or a common packer/packaging tool used to generate the binary. To explore this idea, we will write a small script to extract the imphash from a directory of files. The code should accomplish the following tasks:

  1. Create a list of all files in the directory (full path).
  2. Open an XLSX file for writing (I often use Excel for easy viewing/sorting, but you can certainly output to CSV or, even better, write this information to a database).
  3. Calculate and write each file’s sha256 hash and imphash to the XLSX file.
  4. Autofilter the data.

Below is one way to approach these tasks.

#~/usr/bin/env python
import sys,os
import pefile
import hashlib
import xlsxwriter

if __name__ == "__main__":

	#Identify specified folder with suspect files
	dir_path = sys.argv[1]

	#Create a list of files with full path
	file_list = []
	for folder, subfolder, files in os.walk(dir_path):
		for f in files:
			full_path = os.path.join(folder, f)
			file_list.append(full_path)

	#Open XLSX file for writing
	file_name = "pefull_output.xlsx"
	workbook = xlsxwriter.Workbook(file_name)
	bold = workbook.add_format({'bold':True})
	worksheet = workbook.add_worksheet()

	#Write column headings
	row = 0
	worksheet.write('A1', 'SHA256', bold)
	worksheet.write('B1', 'Imphash', bold)
	row += 1

	#Iterate through file_list to calculate imphash and sha256 file hash
	for item in file_list:

		#Get sha256
		fh = open(item, "rb")
		data = fh.read()
		fh.close()
		sha256 = hashlib.sha256(data).hexdigest()

		#Get import table hash
		pe = pefile.PE(item)
		ihash = pe.get_imphash()			 

		#Write hashes to doc
		worksheet.write(row, 0, sha256)
		worksheet.write(row, 1, ihash)
		row += 1

	#Autofilter the xlsx file for easy viewing/sorting
	worksheet.autofilter(0, 0, row, 2)
	workbook.close()

I titled the above script pe_stats.py and ran it against a directory named “suspect_files” with the command python pe_stats.py suspect_files. To populate the target directory, I downloaded 100 highly convicted files from VT (specifically, I used the basic VTI query “type:peexe positives:50+”)An excerpt of the resulting data, when opened in Microsoft Excel, is below.

xlsx

A quick glance at the first few rows immediately reveals a pattern in the imphash values. As a next step, perhaps you will investigate the largest cluster of import table hashes to understand why these groups of files have the same imphash. You may also revisit the pefile library documentation to explore additional static characteristics worth including in this spreadsheet. With more detail, this document could help you triage and prioritize samples for analysis . I leave these tasks to you for further exploration.

Conclusion

This post provided an initial approach to analyzing PE files using Python. Most importantly, it walked through how to use the built-in Python help feature and some basic knowledge of PE files to systematically explore a file’s characteristics and then scale that process to a larger set of files.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

Analyze files, not malware

Let’s dive right in. In the last post, I mentioned the value of reviewing the import address table (IAT) when performing static file analysis. Take a look at some IAT excerpts from three files:

Screen Shot 2016-03-03 at 9.19.44 PM

Screen Shot 2016-03-03 at 9.20.20 PM

Several functions may pique your interest, including:

  • IsDebuggerPresent and GetTickCount: functions that may be used to detect debugging activity.
  • RegCreateKeyW, RegSetValueExW: functions used to manipulate the registry, perhaps to configure persistence.
  • LoadLibraryW and GetProcAddress: functions used to call other functions at runtime, a strategy that hinders static file analysis.
  • FindResourceW and LoadResource: functions used to access embedded resources, where additional code may reside.

Let’s look behind the curtain:

  • File A = notepad.exe
  • File B = searchindexer.exe
  • File C = spoolsv.exe

These are all legitimate files found on a clean Windows 7 64-bit system.

This is not meant to be a trick, but instead a reminder. The rush of successfully identifying malware is one we all yearn for, but that glorious destination must be earned through careful analysis. This might be as simple as matching a suspect file’s hash with a known bad file hash, or it might require more robust static, behavioral and code analysis. All observations are not created equal, so we must weigh the severity of each one (i.e., how definitively it indicates malicious behavior) and consider their cumulative value when deciding if a file is malware. Files are innocent until proven guilty, and I challenge you to demonstrate, beyond a reasonable doubt, that a particular file is bad.

So how can you sharpen your ability to spot unusual characteristics that may indicate nefarious activity? As in all areas of incident identification and response, we need to understand the normal to discover the anomalous. Pick your favorite legitimate Windows programs and apply your file analysis process. You will likely identify characteristics that might otherwise seem alerting and, with practice, this will increase your tolerance for suspicious characteristics.

Inspecting known good files can also help validate (or invalidate) indicators of potential compromise. Think you’ve discovered a group of API calls, a set of strings, or a particular PE file characteristic that only exists in malware? Search across a large sample of legitimate files to test your theory.

There is nothing wrong with identifying indications of evil during the file analysis process, and that’s arguably the point of initiating an investigation. However, it is critical to view your suspicions about a file as hypotheses that you prove or disprove based on empirical evidence. Otherwise, you might miscategorize a legitimate file as malware, and that not only reflects poorly on you if someone checks your work – but it makes the file sad too.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

 

 

REMnux v6 for Malware Analysis (Part 2): Static File Analayis

Introduction

In this post, we’ll continue exploring some of the helpful capabilities included in REMnux v6. Be sure to regularly update your REMnux VM by running the command update-remnux.

Analyzing suspect files can be overwhelming because there are often numerous paths to explore, and as you continue to observe activity and gather data, the additional areas of analysis seem to explode exponentially. One approach to guide your analysis is to focus first on answering key questions. Another (likely complimentary) approach is to apply the scientific method where you:

  1. Make an observation.
  2. Generate a hypothesis based on that observation.
  3. Test the hypothesis.
  4. Modify the hypothesis based on the outcome of the test and rerun the test.

Static file analysis, where you learn about a suspect file without launching it, can help generate observations that fuel this process.  As a reminder, static file analysis typically results in information such as file and section hashes, compile times, extracted strings, library and function dependencies, and digital signature information. Using the scientific method described above, your analysis of a suspect file may involve the following sequence of activities:

  1. As part of your static analysis process, you extract the ASCII strings from a file and observe the text “HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run”.
  2. You hypothesize that the suspect file uses this registry key to maintain persistence on a victim machine.
  3. You run the sample within a Windows 7 virtual machine and realize that this registry key is never modified. You dig deeper via code analysis and realize a Run key is only created if the victim is a Windows XP machine.
  4. You can now modify your hypothesis to specify the Windows XP caveat, rerun the test in a Windows XP VM, and confirm your theory. In doing so, you’ve performed focused analysis, learned about the sample’s persistence mechanism (which can be translated to an IOC), and identified an associated constraint.

Static file analysis is challenging, not because it is technically difficult, but because it is so hard to resist double-clicking immediately. I feel your pain, the double-click is my favorite part too. However, it is worth developing the discipline to complete a static file review before executing the sample because it fosters methodical analysis and produces tangible results.

REMnux includes some great tools to perform static analysis, including the ones listed here. This post will highlight just a few of my favorites.

pecheck.py

pecheck.py, written by Didier Stevens, is a wrapper for the Python pefile module used to parse Windows PE files. Let’s explore this tool by analyzing the BACKSPACE backdoor malware described in FireEye’s APT 30 report. If you want to follow along, you can download the sample here (password: infected). As shown in the output below, running pecheck.py against the sample returns file hashes and file/section entropy calculations. Entropy is a measure of randomness, and more entropy indicates a higher likelihood of encoded or encrypted data. While this information is helpful, I want to focus on the “Dump Info:” section shown towards the end of the excerpt. This section basically runs the pefile dump_info() function, which parses the entire file and outputs, well, a lot of data (see the complete output here).

Screen Shot 2015-12-27 at 7.41.59 PM

Figure 1: pecheck.py output

Among other information, the output includes the contents of the file’s Import Address Table (IAT), which represents the shared libraries (i.e., DLLs) and functions within those DLLs that the program relies upon:

Screen Shot 2016-01-02 at 5.13.25 PM

Figure 2: pecheck.py Import Address Table (IAT) output

I like the <DLL>.<FUNCTION> format because 1) over time, it can help you remember which functions a DLL contains and 2) you can grep for the DLL name or function name and retrieve the entire line (not the case with output from other tools). In this particular excerpt, we can immediately see some Windows API calls that are often used for malicious purposes. For example, we see references to the CreateToolhelp32Snapshot, Process32First, and Process32Next functions commonly used by malware to capture a list of running processes and iterate through that list to enumerate activity or target specific programs. We could explore this hypothesis by using a debugger to set breakpoints on these API calls and determine if there is a certain process the code is looking for. Oh, and in case you’re wondering, the hint refers to the potential location of the function within the corresponding DLL – it’s an optimization that, in this case, is not helpful given that all values are zero.

In the case a program imports a function by ordinal and not name, this will be indicated clearly:

Screen Shot 2016-01-08 at 1.00.56 AM

Figure 3: pecheck.py Import Address Table (IAT) output by ordinal

Note that since the above functions are imported by ordinal only, the function names (e.g., “ioctlsocket”) will not be listed in the strings output:

Screen Shot 2016-01-09 at 5.32.25 PM

Figure 4: Grepping for Windows API

Beyond viewing the IAT output, pecheck.py output includes section hashes, version information, resource information and the ability to configure a PEiD database to search for packer signatures. While pecheck.py may not be the first script you turn to due to the large volume of output, I prefer it to others because I can extract the information I desire based on grep searches or modifications to the Python code. In addition, dump_info() sometimes results in parsing errors that may reveal other interesting anomalous characteristics associated with the target file.

pestr

pestr is part of the pev PE file analysis framework, and its primary purpose is to extract strings from Windows executable files. However, it goes beyond the traditional strings tool by providing options to show the offset of a string within a file and the section where it resides. For example, below are output excerpts after running pestr against the file analyzed above, using the –section option to print the section where the respective string is found (see complete output here):

Screen Shot 2016-01-09 at 6.39.15 PM.png

Figure 4: pestr output #1

Screen Shot 2016-01-09 at 10.15.07 PM

Figure 5: pestr output #2

Figure 4 shows the command executed and the beginning of the output. The first few strings are found in the PE header, so they are labeled as appearing in the “none” section. Figure 5 shows strings in the “.rdata” section, including DLL and Windows API function names. The “.rdata” section commonly contains the Import Address Table, which could explain the presence of these strings here. Looking at the pecheck.py output, we can confirm these strings are, in fact, present in the IAT.

Perusing the remaining pestr output shows additional strings, including the following:

Screen Shot 2016-01-09 at 11.16.06 PM

Figure 6: pestr output #3

Note the presence of GetTickCount, a Windows function that returns the number of milliseconds that have passed since the system was started. This is a popular anti-analysis function because it can help detect if too much time has elapsed during code execution (possibly due to debugging activity).  Interestingly, pestr ouput reveals this function name is located in the “.data” section, rather than “.rdata” section where the IAT resides. We might hypothesize that this is an attempt by the developer to evade traditional import table analysis by manually calling this function during program execution. We can dig deeper by finding the reference to this string in IDA Pro:

code_temp

Figure 7: IDA Pro string reference

While we will not dive into code analysis details in this post, Figure 7 makes it clear that the GetTickCount string reference is indeed used to call the function at runtime using LoadLibraryA and GetProcAddress.

readpe.py + pe-carv.py

readpe.py can output information such as PE header data, imports and exports. For this post, I’ll highlight its simple ability to detect an overlay. An overlay is data appended to the end of an executable (i.e., it falls outside of any data described in the PE header). Using the following command against a Neshta.A specimen, readpe.py can detect if an overlay exists:

Screen Shot 2016-02-06 at 12.00.23 AM

Figure 8: readpe.py overlay output

Upon detecting an overlay, the next step is to evaluate the contents of this additional data. Malware often includes executable content in the overlay, so you might consider using a tool called pe-carv.py, which is purpose-built to carve out embedded PE files:

Screen Shot 2016-02-06 at 12.13.28 AM

Figure 9: pe-carv.py extracted file

As shown in the figure above, pe-carv.py successfully extracted a file it called 1.exe, and we could proceed with further static file analysis to better understand this embedded content.

Closing Thoughts

Static analysis can generate useful data about a file, but it can also help direct your reverse engineering efforts. While running the tools mentioned above may get you the information you need, I encourage you to check out the source code and customize it based on your preferences. In particular, if you’re just getting started with Python, tweaking this code can serve as a great introduction and motivate further study.

If you would like to learn more about malware analysis strategies, join me at an upcoming SANS FOR610 course.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

REMnux v6 for Malware Analysis (Part 1): VolDiff

Introduction

As you may have heard, Lenny Zeltser recently released version 6 of his popular REMnux malware analysis Linux distribution. I’m a big fan of REMnux because it reduces some of the overhead associated with malware analysis. Rather than spending hours downloading software, installing tools, and navigating through dependency hell, this distribution gives you access and exposure to numerous tools quickly. Once you see the value of a tool for yourself, you can then dive into the code and configuration files to develop a deeper understanding of its inner workings and customize it to your needs.

This is the first in a series of posts where I will highlight my favorite new additions to REMnux and why you should include them in your malware analysis process.

VolDiff

One quick, effective approach to assessing a suspicious file is to capture a snapshot of system activity, execute the file, capture another snapshot, and then compare the two system states to determine the impact of execution. The popular regshot tool uses this approach to log registry and file system changes after an event like double-clicking malware. VolDiff, included in REMnux v6, allows us to perform similar analysis against memory dumps. Developed by @aim4r, VolDiff is a Python script that uses the Volatility memory analysis framework to analyze two memory dumps and output the differences between them. When applied to memory analysis, this script will focus your attention on memory artifacts generated after, and possibly as a result of, code execution. This can expedite your analysis of large memory dumps to detect activity such as code injection and provide visibility into packed or obfuscated code. However, keep in mind that memory is in a state of flux, so changes included in the diff results are not necessarily caused by executing the suspect file.

VolDiff resides on REMnux in /opt/remnux-scripts/, but you can run it from anywhere since this location is included in the PATH environment variable. If you do not have the latest version of VolDiff (v2.1 at the time of this writing), you can update your remnux-scripts directory by running the commands sudo apt-get update and then sudo apt-get install remnux-scripts.

An Example with VolDiff 

Let’s explore the value of VolDiff with a malware sample. I used a file named funfile.exe, and if you want to follow along, you can download the sample here (password: infected).

A few words about my test environment – since it’s advisable to perform malware analysis within a virtual machine, I used VMware Fusion. For this analysis, I started 1) a REMnux v6 VM with host-only networking and 2) a 32-bit Windows 8.1 VM with host-only networking. Within the Windows VM, I configured the Default Gateway and Preferred DNS Server with REMnux’s IP address. Lastly, I verified connectivity by pinging each host from the other. Note: If you dedicated more than a couple GB of memory to your Windows VMs, consider decreasing this value or you may be waiting hours for VolDiff processing to complete.

Some initial behavioral analysis indicated that this sample generated network traffic to an IP address. Since our goal is to assess memory artifacts, I chose to launch several “fake” services in REMnux to encourage activity. Specifically, I ran the following from a REMnux terminal:

  • accept-all-ips start: This bash shell script written by Lenny Zeltser redirects all network traffic destined for an IP address to the REMnux VM.
  • inetsim: This tool simulates a variety of network services, including HTTP, HTTPS, FTP, and SMTP. If my suspect file expected to contact a web server, for example, I wanted it to do so to facilitate additional activity.

To compare memory dumps using VolDiff, we need to capture a memory image before and after infecting a sacrificial host. With VMware, one approach to obtaining a memory image is to use the snapshot feature. Whenever a snapshot is created, VMware saves a “.vmem” file that includes the contents of memory at the time the snapshot was created. This file can then be analyzed using a memory analysis tool like Volatility. To create the memory dumps VolDiff requires, I followed these steps:

  • I copied funfile.exe to the Windows VM desktop.
  • I created a VM snapshot and noted the new “.vmem” file name on my host.
  • In the Windows VM, I right-clicked funfile.exe and selected “Run as administrator” to execute the sample with admin rights.
  • After giving the sample a couple minutes to run, I created another VM snapshot and noted this second “.vmem” file name.

I then copied these files into REMnux for analysis. While there are several ways to do this, I chose to start the SSH server on REMnux and SCP the files into the VM. To ensure I did not confuse the two “.vmem” files, I renamed my baseline file to “baseline.vmem” and my second snapshot to “infected.vmem”.

To kick off VolDiff against my two memory dumps, I ran the command shown below. Note that the command requires the correct OS profile for the memory images.

VolDiff Command

Figure 1: VolDiff command to compare two memory dumps

VolDiff processed my 2 GB (each) memory images in about 45 minutes. The result was a directory of output, but the critical file to review is VolDiff-report.txt. This file contained the key differences between the two memory dumps. My entire output file can be viewed here, but let’s discuss some excerpts.

Screen Shot 2015-06-25 at 8.58.29 PM

Figure 2: VolDiff malfind results

The output above shows new malfind results. The malfind Volatility plugin helps identify injected code, and in this case it discovered a suspicious memory segment within the svchost.exe process with PID 2976. Looking at the ASCII representation of the first few bytes of this segment, you may recognize the “MZ” string. This likely indicates we are looking at injected, executable code. It’s important to note that running malfind does sometimes result in hits even on a clean system; running the malfind plugin against my baseline image produced one hit. However, VolDiff’s diff operation focused my efforts only on new activity.

Let’s look at some more output:

Screen Shot 2015-06-25 at 9.02.47 PM

Figure 3: VolDiff netscan results

The output above shows new netscan entries. The netscan Volatility plugin locates network artifacts in memory. Running this plugin against my “infected.vmem” alone revealed 57 connection artifacts. Since VolDiff highlights changes in the victim system’s state, it trimmed my analysis data set to only two connections, one of which I have included above. This output clearly shows that the suspicious svchost.exe (based on malfind output) established a TCP connection over port 443.

VolDiff also includes a –malware-checks option to look for anomalous activity in an infected memory dump. You can run this option against a single memory dump if you do not have a baseline, or you can simply add it to the command line to both perform a diff and check the infected memory dump for potentially malicious behavior:

Screen Shot 2015-06-25 at 9.42.00 AM

Figure 4: VolDiff –malware-checks option

Much of the output mirrors the earlier VolDiff-report.txt, but it includes additional checks that compare the infected memory dump against characteristics of a known good Windows systems. You can view the entire output file here, but let’s look at one example included in the report:

Screen Shot 2015-06-25 at 9.05.47 PM

Figure 5: VolDiff –malware-checks result excerpt

In this case, VolDiff indicates that the svchost.exe is running in an unexpected session. Session 0 is reserved for system processes and services, and a legitimate svchost.exe process should be running in that session. However, the svchost.exe with PID 2976 is running in session 1, which is associated with a user session. In this way, VolDiff goes beyond simply diffing two memory snapshots and includes built-in heuristics to identify potential malicious activity. At its core, this is an even more powerful diff operation, because it relies on certain absolutes (i.e., a legitimate svchost.exe always runs in session 0) and makes no assumptions about the state of  your baseline image.

In case you’re wondering, this sample has a 39/53 detection rate on VirusTotal. Microsoft identifies it as a Win32/Tofsee variant, a spambot that is commonly spread via email. As we suspected, it launches and injects executable code into svchost.exe and attempts to connect to IP addresses for command and control.

Closing Thoughts

Diffing two system states is a powerful malware analysis technique because it shines a spotlight on new activity. VolDiff, included in REMnux v6, uses this approach to focus your analysis on memory artifacts most likely associated with code execution. I encourage you to explore this and other REMnux v6 tools on your own, or join me at the upcoming FOR610 Reverse-Engineering Malware course in Virginia Beach this August.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.

Key Questions to Guide Malware Analysis

Introduction

Performing malware analysis during incident response can be an exciting, creative exercise. But it can also be a nebulous process, with no clear beginning and no defined end state. While there are numerous articles, books, and tools that cover the topic, the sheer volume of resources can sometimes lead to decision fatigue, and the question becomes: What do I do next? To focus your attention and guide your analysis, begin by answering the four key questions set forth below.

Ask Yourself

To be clear, my intent is not to create a comprehensive list of questions, but to highlight the ones that will yield the most value. If you work on answering these questions first, you will stay on task, make real progress, and better understand the next few steps in the context of your specific incident.

1) What are the artifacts of execution?

This question will fuel your static and behavioral analysis of the sample.  Your precise goal is to document activity on the file system, in memory, and across the network. This includes launched processes, created and deleted files, modified registry entries, and command and control network traffic. Assume the malware sample has the highest level of privilege on your network and has access to all the local and online resources it needs. Also, consider interacting with your analysis environment by launching enterprise applications, browsing to common sites, and rebooting the machine to facilitate activity. Be sure to record any active you observe.

2) What is the potential impact of code execution?

This question expands upon the first question by requiring you to dig deeper and piece together observed activity to determine functionality and purpose. For example, perhaps you observed a file created during behavioral analysis. You must now determine what this file is used for. Does it log keystrokes? Perhaps it stores encoded configuration data that the malware relies upon. Answering this question often requires iterative testing (an important reason to use virtual machines and create snapshots throughout analysis). Reaching a solution may be as simple as completing a few Google searches or may involve more complex code analysis.

3) What is the potential impact of code execution in your environment?

Notice the difference between this question and the second question. While understanding the absolute potential impact is important, in the context of an enterprise security incident, your management is most concerned about the impact on the corporate network. Becoming equipped with the answer to this question and differentiating it from the previous one is a distinguishing factor between a shining, proficient malware analyst and a reckless one.

4) What are the sample’s key host and network indicators of compromise (IOCs)?

Review your artifacts of execution, including registry keys, file names and locations, hashes, C&C specifics, and strings to highlight key information which will allow you to seek out similar activity across the network. This information will not only expedite the detection of other compromised machines on the network, but it will also feed into generating valuable threat intel for your organization and any information sharing partners.

Final Thoughts

As you try to answer these questions, remember that malware analysis is an iterative process. You may not answer each question in totality with 100% certainty before moving onto the next. This is why performing malware analysis is similar to the practice of an art – not because it is indescribable or intangible (these are usually symptoms of a poor process); but because it requires patience and the discipline to know what approaches are working, which ones are not, when you need to start over, and when you need to step away and refocus on the problem at another time.

Clearly, there are other important questions to answer, but investigating the ones listed above will get you moving in the right direction, and answering them as quickly as possible will arm you with the information management often needs once an incident kicks off.

-Anuj Soni

Anuj Soni is a Senior Incident Responder at Booz Allen Hamilton, where he leads intrusion investigations and performs forensic and malware analysis to investigate security incidents. He also teaches FOR610: Reverse-Engineering Malware for the SANS Institute. Anuj excels not only in delivering rigorous technical analysis, but also in process development, knowledge management, and team leadership to accelerate incident response efforts. Anuj presents at events including the U.S. Cyber Crime Conference, SANS DFIR Summit, and the Computer and Enterprise Investigations Conference (CEIC).  He received his Bachelors and Masters degrees from Carnegie Mellon University.