The art of artifact collection and hoarding for the sake of forensic exclusivity… – Part 2
2024-5-4 07:29:59 Author: www.hexacorn.com(查看原文) 阅读量:6 收藏

In the first part I had promised that I would demonstrate that the piracy is good! (sometimes)

I kinda lied back there, but I am not going to lie today: I will tell you all about it in the part 3.

Forensic data hoarding has a lot of benefits. It helps to solve many very common yet often difficult problems (I will cover one of them later in this post), and it also has a nice side-effect to it – it makes us more aware of available forensic artifacts and the fact that there is, or at least should be a very basic need to collect data for everyone in this field.

For example, I keep reminding everyone who wants to listen that there are many localized versions of Windows, and there are lots of architectural quirks around OS folders as well. Yes, it means that your c:\Program Files folder name, same as many others, can be localized and often is. This doesn’t stop people from continuing to write English-centric detections, but at least my conscience is clean…

I mentioned these common yet often difficult problems… Let’s focus on one of them for a moment.

When you analyze malware you often come across code that focuses on terminating processes and/or stopping and/or removing services (service processes). The easy ones do it by the book — they use direct string comparisons, Windows APIs. and the list of targets are often present inside the malware in a form of a string list. The more advanced ones use various hashing algorithms for comparison, and instead of actual strings they store hashes identifying the targets inside the malware samples.

As analysts looking at such hash lists we face an obvious challenge – given a list of hashes, how can we reconstruct a list of strings that these hashes were generated from?

This may sound easy, but it is not. We can brute-force all combinations, but it often can turn to be very costly, plus brute-force attack may end up with a list of random process names for which a calculated hash happens to be identical with the one on the target list, but may not be a correct one (so-called hash collision). A more promising approach here relies on a dictionary attack where we compute hashes for all the possible known process names, and then compare against the targets, but it’s not easy either…

Why?

For the latter to be successful, one needs a large list of legitimate process names in a first place. Googling around and github searches may give you a head start, but it’s often not enough. Yes, many of these process names are often related to security software so an extensive list of process names used by antivirus, edr, firewall, etc. software may help, but it’s often not enough. Nowadays, the target lists are often far wider than that – f.ex. ransomware often kills many other programs as well: multiple variants of Office software, database software, various backup services, email clients, and so on and so forth.

It’s time for a recipe.

If you were about to collect the largest list of process names, how would you do it?

The below list is not extensive, but may help you out:

  • Find your first list of interesting process names 🙂
  • Use a set of the most unique process names from that list and Google it
  • Search github repositories as well

Chances are, that a set of these will lead you to many interesting process lists.

And now you have your base. It is probably around 1% of all the process names that you want though…

So… we dig deeper.

  • Web scrap data – there are a lot of websites out there that somehow managed to accumulate a lot of process names; they use it to generate tremendous number of landing pages that search engines will hit when people search for issues related to various software packages; these sites are trash of the internet, but in this particular case, they are great as we can leverage all these landing pages to collect more process names
  • Download software and driver repositories and post-process them – many installers can be now unpacked and their metadata and installer scripts can be extracted and analyzed – they are a very juicy material for building database of process names
  • Analyze large corpora of malware samples – for every advanced and complex malware family you will find 10 if not 100 dumb families that do it all in the open and include target process and/or service list in plain text – easy to extract and collect
  • If you got access to any telemetry – from EDR, sysmon, 4688 – extract that process lists as well and best — do it on regular basis
  • Web scrap forums for HijackThisLog reports – old, but helpful
  • Web scrap forums for OTL logfile reports – as above
  • Post-process bundles of CyberSecurity reports – they often reference a lot of interesting process and service names
  • Actively install software of interest and gather list of processes and services they create – nuh, just kidding, this takes too much time

My personal process list is 1.7M items long. I used it to crack quite a few malware families’ target lists. Yet it still fails me sometimes. Yes, the hoarding never stops.


文章来源: https://www.hexacorn.com/blog/2024/05/03/the-art-of-artifact-collection-and-hoarding-for-the-sake-of-forensic-exclusivity-part-2/
如有侵权请联系:admin#unsafe.sh