When you look at a large repository of clean files there is always an opportunity to find something interesting. For instance, list of precursors to forensic artifacts that one can find in legitimate software installation packages. Both pre- and post- install.
Why these may come handy?
Well, while this will never be a 100%-reliable solution these may help to automate at least some of the digital forensic triage processdx. And by that, I mean f.ex. exclusions via file names or their clusters (as opposed to hashes).
I wrote about it long time ago in a context of filelighting, but there are perhaps other, simpler avenues to pursue as well. Filelighting idea focuses on looking for file names referenced by files residing in the installed program folder. We can as well expand it to pre-install directories — be it temporary created folders, manually unpacked drivers or software package installation folders, etc.. And while some of this is no longer that important — after all more and more updates and installations happen in a background, often w/o user’s knowledge, via App stores, etc. and frankly, people probably download less and less software directly today than say 10 years ago, well… it still does happen a lot, and if we can help with some automation… why not?
One of the most interesting sources of information about software packages are good old fashioned .inf files. The other is the good ol’ NSRL database. Yes, the latter focuses primarily on post-install, but we should use whatever is available.
The .inf files reference everything that is there to be installed, often in many configurations, and they provide a list of created / modified files, directories, but also – Registry keys, service names – you name it. It’s a gold mine of information of how a ‘good’ Windows software looks like. It’s a gold mine of forensic artifact precursors. The NSRL database is kinda similar, is a superset of everything good really, but it’s also obviously limited to data available in a dump.
Let’s have a look.
The top of the .inf file usually includes [Version] section. You can find description of the .inf files elsewhere, here, we are focused on stats only. I must note here that parsing .inf files is not as easy as it may seem as they heavily rely on self-referencing, multiple .inf files can be merged together, and there is also a mechanism of string substitution (tokens) in play. Lots of quirks to take care of.
The top occurrences of fields within this section are as follows:
Combing .inf files for say… CatalogFile field can give us a list of all legitimate .cat files out there (with an obvious caveat that the list is as good, as our ‘good files’ repo). Still, this may come handy for filename-based exclusions. There is a double-edged sword lying somewhere there of course — if you are a bad guy, knowing what good file names are available in legit software packs will very well serve your nefarious purposes as you may surely pick up a file name for your payload from the list…. Oh well…
The NSRL database is well known, so doesn’t need any introduction. What is interesting about the set is an often-forgotten ProductCode field. This is an indicator of where the file/hash/tuple comes from. If you cluster the set by ProductCode you may end up with clusters of file names that belong to a specific product. For example, if we look at say product code 196184 we get this result. As a side note, some of the file names seem to be section names of executables, so the drill-down NSRL guys use seems to be going really deep.
So… there you have it… parse your good .inf files, enrich it by clusters of file names extracted from NSRL set and you may generate a nice cluster-based exclusion list! Happy filelighting!
Bonus:
Okay, not everything is rosy. Here’s a list of .cat file names I have collected during this exercise. Lots of them. I think they can only make sense in a context of either a software installation package (hint: the one with .inf file), or ProductCode in NSRL.