Over the past few weeks I've spent some time learning Visual Basic for Applications (VBA), specifically for creating malicious Word documents to act as an initial stager. When taking operational security into consideration and brainstorming ways of evading macro detection, I had the question, how does anti-virus detect a malicious macro?
The hypothesis I came up with was that anti-virus would parse out macro content from the word document and scan the macro code for a variety of malicious techniques, nothing crazy. A common pattern I've seen attackers counter this sort-of detection is through the use of macro obfuscation, which is effectively scrambling macro content in an attempt to evade the malicious patterns anti-virus looks for.
The questions I wanted answered were:
According to Wikipedia,Open Office XML (OOX) "is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents". This is the file format used for the common Microsoft Word extensions docx
and docm
. The fact that Microsoft Office documents were essentially a zip file of XML files certainly piqued my interest.
Since the OOX format is just a zip file, I found that parsing macro content from a Microsoft Word document was simpler than you might expect. All an anti-virus would need to do is:
word\vbaProject.bin
.The differences I was interested in was how the methods would handle errors and corruption. For example, common implementations of ZIP extraction will often have error checking such as:
0x04034b50
?What I was really after was finding ways to break the ZIP parser in anti-virus without breaking the ZIP parser used by Microsoft Office.
Before we get into corrupting anything, we need a base sample first. As an example, I simply wrote a basic macro "Hello World!" that would appear when the document was opened.
For the purposes of testing detection of macros, I needed another sample document that was heavily detected by anti-virus. After a quick google search, I found a few samples shared by @malware_traffic here. The sample named HSOTN2JI.docm
had the highest detection rate, coming in at 44/61 engines marking the document as malicious.
To ensure that detections were specifically based on the malicious macro inside the document's vbaProject.bin
OLE file, I...
HSOTN2JI
macro documents as ZIP files.vbaProject.bin
OLE file in my "Hello World" macro document with the vbaProject.bin
from the malicious HSOTN2JI
macro document.Running the scan again resulted in the following detection rate:
Fortunately, these anti-virus products were detecting the actual macro and not solely relying on conventional methods such as blacklisting the hash of the document. Now with a base malicious sample, we can begin tampering with the document.
The methodology I used for the methods of corruption is:
Before continuing, it's important to note that the methods discussed in this blog post does come with drawbacks, specifically:
Although adding any user interaction certainly increases the complexity of the attack, if a victim was going to enable macros anyway, they'd probably also be willing to recover the document.
We'll first start with the effects of general corruption on a Microsoft Word document. What I mean by this is I'll be corrupting the file using methods that are non-specific to the ZIP file format.
First, let's observe the impact of adding random bytes to the beginning of the file.
With a few bytes at the beginning of the document, we were able to decrease detection by about 33%. This made me confident that future attempts could reduce this even further.
Result: 33% decrease in detection
This time, let's do the same thing except prepend a JPG file, in this case, a photo of my cat!
You might think that prepending some random data should result in the same detection rate as an image, but some anti-virus marked the file as clean as soon as they saw an image.
To aid in future research, the anti-virus engines that marked the random data document as malicious but did not mark the cat document as malicious were:
Ad-Aware
ALYac
DrWeb
eScan
McAfee
Microsoft
Panda
Qihoo-360
Sophos ML
Tencent
VBA32
The reason this list is larger than the actual difference in detection is because some engines strangely detected this cat document, but did not detect the random data document.
Result: 50% decrease in detection
Purely appending data to the end of a macro document barely impacts the detection rate, instead we'll be combining appending data with other methods starting with my cat.
What was shocking about all of this was even when the ZIP file was in the middle of two images, Microsoft's parser was able to reliably recover the document and macro! With only extremely basic modification to the document, we were able to essentially prevent most detection of the macro.
Result: 88% decrease in detection
Microsoft's fantastic document recovery is not just exclusive to general methods of file corruption. Let's take a look at how it handles corruption specific to the ZIP file format.
The only file we care about preventing access to is the vbaProject.bin
file, which contains the malicious macro. Without corrupting the data, could we corrupt the file header for the vbaProject.bin
file and still have Microsoft Word recognize the macro document?
Let's take a look at the structure of a local file header from Wikipedia:
I decided that the local file header signature would be the least likely to break file parsing, hoping that Microsoft Word didn't care whether or not the file header had the correct magic value. If Microsoft Word didn't care about the magic, corrupting it had a high chance of interfering with ZIP parsers that have integrity checks such as verifying the value of the magic.
After corrupting only the file header signature of the vbaProject.bin
file entry, we get the following result:
With a ZIP specific corruption method, we almost completely eliminated detection.
Result: 90% decrease in detection
With all of these methods, we've been able to reduce static detection of malicious macro documents quite a bit, but it's still not 100%. Could these methods be combined to achieve even lower rates of detection? Fortunately, yes!
Method | Detection Rate Decrease |
---|---|
Prepending Random Bytes | 33% |
Prepending an Image | 50% |
Prepending and Appending an Image | 88% |
Corrupting ZIP File Header | 90% |
Prepending/Appending Image and Corrupting ZIP File Header | 100% |
Interested in trying out the last corruption method that reduced detection by 100%? I made a script to do just that! To use it, simply execute the script providing document filename as the first argument and a picture filename for the second parameter. The script will spit out the patched document to your current directory.
As stated before, even though these methods can bring down the detection of a macro document to 0%, it comes with high costs to attack complexity. A victim will not only need to click to recover the document, but will also need to save the recovered document before the malicious macro executes. Whether or not that added complexity is worth it for your team will widely depend on the environment you're against.
Regardless, one must heavily applaud the team working on Microsoft Office, especially those who designed the fantastic document recovery functionality. Even when compared to tools that are specifically designed to recover ZIP files, the recovery capability in Microsoft Office exceeds all expectations.