January 16, 2022 in Yara sigs
A lot of people are sharing their Yara creation (look for #100DaysofYARA tag on Twitter), so I thought I will share a bit too.
This is a very unusual way of using Yara and I hope you will find it interesting.
When we think of Yara rules we usually have very specific cluster of strings in mind – formed by be it an API, a debug string, a snippet of code, etc. What if instead we used yara to scan files for much large sets of strings? While it may sound counterintuitive, Yara is really very well prepared to do “carpet bombing” string scans on target files. It’s actually super fast and efficient.
Let’s have a look at an example.
Imagine that you want to find all English words inside a file. I choose “English” because it’s easy to demo, but you could use any other language really. The traditional approach would rely on running the “strings” tool over the target file and then manually combing through the results, cherry-picking words that “look” English. For other languages you may need a localized version of “strings” tool (e.g. my old tool hstrings could help), but the principle is the same. In some cases you could also apply knowledge of file structure so that could extract some of the strings ‘natively’ (e.g. from resources in PE file).
We can also approach it from a different angle. We will build a list of all English words and then search for them in the file. All at once. There are obvious caveats – we can never sure we have a list of all English words e.g. gobbledygook or ragamuffin may not be on the list, and short words will certainly be causing a lot of False Positives, but it’s just a POC of an idea.
So, we find a random English words list. We write a small script to extract all 6+ character long strings and exclude strings starting with digits and we then convert it into a set of Yara rules. Yara accepts up to 10K strings per rule so we have to split the dictionary into multiple rules.
my $cnt=0; my $n=0; while (<>) { s/[\r\n]+//g; next if length($_)<6; next if /^[0-9]/; s/\"/\\"/g; if ($n==0) { print " rule ".sprintf("eng_%04d", $cnt)." { strings: "; } print "\$ = \"$_\" ascii wide nocase\n"; $n++; if ($n>9999) { $cnt++; $n=0; print " condition: any of them } "; } } print "condition: any of them } ";
The resulting rules can be saved into eng.yar file and then compiled with yarac to eng.yac:
yarac eng.yar eng.yac
We will get a lot of warnings about the rule slowing down the scanning, but who cares 🙂
warning: rule "eng_0000" in eng.yar(10008): rule is slowing down scanning warning: rule "eng_0001" in eng.yar(20016): rule is slowing down scanning warning: rule "eng_0002" in eng.yar(30024): rule is slowing down scanning warning: rule "eng_0003" in eng.yar(40032): rule is slowing down scanning warning: rule "eng_0004" in eng.yar(50040): rule is slowing down scanning ...
Note, the resulting file is gigantic – ~600MB in size. You can reduce is by mingling with “ascii wide nocase” sets (if you exclude them, the file will be only ~70MB).
We can now use the rules on e.g. Notepad:
yara -s -C eng.yac c:\windows\notepad.exe
-s – will extract strings
-C – will tell yara the rules are compiled
The results will look like this:
eng_0000 c:\windows\notepad.exe 0x280b5:$: Accelerator 0x2822a:$: Accelerator 0x2822a:$: Accelerators 0x26f00:$: Accept 0x2b9bd:$: Access 0x2862e:$: Acquire 0x286f6:$: Acquire 0x2862e:$: AcquireS 0x286f6:$: AcquireS 0x28df9:$: Activation 0x27faf:$: Active 0x28e04:$: actory 0x286d9:$: Address 0x289c5:$: alLock 0x28b68:$: alLock eng_0001 c:\windows\notepad.exe 0x25050:$: A\x00p\x00p\x00l\x00i\x00c\x00a\x00t\x00i\x00o\x00n\x00 0x25260:$: A\x00p\x00p\x00l\x00i\x00c\x00a\x00t\x00i\x00o\x00n\x00 0x2ba0f:$: application 0x2baf4:$: application eng_0002 c:\windows\notepad.exe 0x2b75e:$: Archit 0x2b88f:$: Archit 0x2b75e:$: Architect 0x2b88f:$: Architect 0x2b75e:$: Architecture 0x2b88f:$: Architecture 0x227ba:$: A\x00r\x00o\x00u\x00n\x00d\x00 0x2b6c8:$: assembl 0x2b713:$: assembl 0x2b7e5:$: Assembl 0x2b7f9:$: assembl [...]