sysdiagnose is a utility on most macOS and iOS devices that can be used to gather system-wide diagnostic information. Currently on version 3.0, sysdiagnose collects a large amount of data from a wide array of locations on the system.
This blog post will seek to outline the immediate value of the data collected by sysdiagnose for the purpose of an investigation. There have been multiple guides and breakdowns of sysdiganose in the past however none I'd identified that sought to address the short comings of the data collected in identifying compromise. As a result this post will look to wrap sysdiagnose in a collection script can make up for some of these limitations. This post will then provide some example detection's using the collected data using a revised collection script to investigate malicious activity.
sysdiagnose can be useful in host investigations, either to conduct live forensic or where a full disk acquisition is a required part of the investigation.
The process of forensic imaging on macOS devices is comparatively challenging compared to other operating systems, and relies on specialized commercial software; therefore, live forensics can often be a more suitable option for conducting investigations on these devices.
The output of the sysdiagnose command provides preliminary triage data that can identify areas for further investigation once full disk acquisition has finished. The data collected includes:
Reference: Information on data collected by sysdiagnose from the man page.
Data collected by sysdiagnose can be valuable in a variety of investigations. For investigations involving malware, the data captured can help identify the malicious binary, a persistence mechanism, or any C2 connections. For investigations where data exfiltration is a concern, there is network data that can identify any open connections, or any USB devices mounted to the file system using Apple's unified log archives.
In this section we will review several data sources collected by sysdiagnose. With brevity in mind, this post will not cover all artefacts in detail, but will seek to highlight those valuable for forensic investigations. These artefacts can mostly be analysed using tools on non-macOS operating systems, with the only notable exception being Apple’s unified logs.
When executed sysdiagnose will display a warning message noting that the output will often contain personal information and a lot of device information; therefore, any data collected using this utility should be handled securely. By default, the resulting collection is a ‘tar.gz’ file saved to ’/var/tmp/sysdiagnose_[Timestamp]_[Hostname].tar.gz’. There are options to not compress the output and to also save the output to an alternate location. The screenshot below shows the command line output of running sysdiagnose as well as the directory opened after completion.
Reference: Screenshot showing sysdiagnose being run in Terminal and the output directory the collection was saved to.
A typical collection includes:
One of the more verbose logs collected by sysdiagnose are the Apple unified logs, introduced in iOS 10 and macOS 10.12[1]. Making use of the OSLog framework[2] this system of logging collates several important log sources into one. There is vast potential value in these logs, but due to their complexity and size, a full exploration is out of scope for this blog post. A limitation with this log format is parsing them into a human readable format. The log command is exclusive to macOS and the files contained in the 'system_logs.logarchive' directory are in a proprietary format, referred to as a 'logarchive' bundle here.
Included below are a few methods for reading and analysing the logs that demonstrate their usefulness. The first method uses the log show command with the predicate flag and syntax[3]. This search returns USB/file system mounting events that can show when a user mounts a USB to the system. This might be useful in investigations where data exfiltration is a concern.[4]
log show –archive /path/to/sysdiagnose/output/system_logs.logarchive -predicate 'eventMessage contains[cd] "USBMSC" or processImagePath contains[cd] "fseventsd" or subsystem = "com.apple.imagecapture"'
It is possible to use the same command to look for examples of any remote logins via SSH, a common piece of evidence to check for attacker activity.
log show –archive /path/to/sysdiagnose/output/system_logs.logarchive -predicate 'processImagePath contains[cd] "sshd"'
The Console application on macOS is another method for searching the logs and has a simple and easy to use search syntax, an example is shown later in this post.
Reference: Screenshot of console showing a record of sudo being used to run collection.sh.
There is a python script on Github by ydkhatri that will read a ‘logarchive’[5]. The script reads the files that make up the ‘logarchive’ and provides several options for output format including TSV and an SQLite database. As the project status says this is a work in progress. Apple may well release a change to the format rendering a component unreadable on other operating systems.
It should be noted that the log command can also be used to create a file containing each log line as JSON using the style flag, though due to the number of logs contained in a ‘logarchive’ it is not recommended to dump the contents to file unfiltered.
The acdiagnose text files contain details of the various accounts associated with the user’s local and iCloud accounts. Details include UUIDs, account configuration, syncing status and a breakdown on the types of accounts plus the ‘supportingDataClasses’ they can access. The information contained within this file that can be used to answer a number of investigative questions, such as:
SIP was introduced in 10.11, El Capitan to prevent users with root access manipulating system files. SIP can only be disabled from recovery mode, a safe boot feature allowing a user to configure the disk, reinstall the OS, or restore from backup.
This is a relatively simple file, though a valuable one, which shows whether the SIP is enabled or disabled. The data stored in the 'csrutil.txt' file should always indicate SIP is enabled. Opening the terminal window in recovery mode and running csrutil disable turns of SIP allowing changes to System files. SIP can then also be renabled in recovery mode by running csrutil enable.
launchctl is the service that loads and unloads launch agents and daemons, which are applications and services that are executed at launch and or logon. This can be useful for identifying possible persistence mechanisms. The data in 'launchctl-dumpstate.txt' contains environmental variables and paths to applications and executables. The other files contain individual user launchctl configurations.
There are couple of potentially valuable logs contained within this directory, for example:
Reference: loginwindow process log as seen in Console.App shown launching Spotify
Reference: dock process log as seen in Console.App shown launching Spotify
There are several plain text files containing readouts of top and ps from the time of collection. This could be valuable in identifying anomalous processes. However, in cases where the process might be named something inconspicuous there limitations in the data to enable the conclusive identification of suspicious processes.
Included in the sysdiagnose collection is a directory of assorted network data; such as ifconfig output, routing, proxy and netstat command output. This data contains some active network connections around the time of collection that could be used to identify suspicious network activity.
There are some limitations, for example linking a connection back to a process or file is not as trivial as it could be. However, there is more valuable networking data that could be collected and used to greatly supplement analysis, which is covered in the following section.
There are some areas sysdiagnose does not cover that would add value in most incident response investigations. For this reason, we created a collection script that addresses these short comings and collects the following:
Reference: A screenshot of the code that handles collecting meta data on each file in a file listing.
Reference: Screenshot of the code that collects that handles collecting extended attributes
Apfell/Mythic is a cross platform C2 framework[7] with capability on macOS. Setting up an Apfell agent in a virtual machine we can explore some examples of data might be useful from the collection script.
First, checking the ‘logarchive’ for osascript logs, osascript is the command used to run a variety of scripting languages like AppleScript and JavaScript. The Apfell agent can have a variety of payloads, the one selected for this example had a JavaScript payload. The ‘logarchive’ showed osascript running and making network connections every 10 seconds or so. There are also logs indicating successful transfer of data; however, the destination IP of both these logs however is obfuscated.
Reference: Highlighted in the screenshot above is a block of logs indicating an established connection to the C2
By searching the data collected we can try to identify the C2 address. Based on the logs above it is possible to identify the victim hosts IP address and the port on the C2 the victims host is connecting to. Running a simple grep over the data for our victims IP and ‘:80’ should provide us with some interesting information. However, this did not return the expected results due to the fact that in netstat output on macOS ports are not denoted by at colon but by a period. The results of a grep on ‘.80’ and our local IP shows matches from the 'WiFi' directory of sysdiagnose output.
./sysdiagnose.../WiFi/netstat-POST.txt: tcp4 0 0 172.16.88.130.51203 172.16.88.134.80 ESTABLISHED
./sysdiagnose.../WiFi/netstat-PRE.txt: tcp4 0 0 172.16.88.130.51203 172.16.88.134.80 ESTABLISHED
Reference: The WiFi directory contains two netstat outputs, netstat -n, showing ‘ESTABLISHED’ connections to the C2 address on port 80.<\p>
Using the C2 address we can identify some further information, because this is a test your mileage may vary, but for demonstration purposes it is possible to identify the URL and how the payload was downloaded. These can been in the ‘History.db’ file and the ‘Downloads.plist’ snippets below.
{
"DownloadHistory" => [
0 => {"DownloadEntryBookmarkBlob" => {length = 756, bytes = 0x626f6f6b f4020000 00000410 30000000 ... 04000000 00000000 }
"DownloadEntryDateAddedKey" => 2020-11-16 10:29:44 +0000
"DownloadEntryDateFinishedKey" => 2020-11-16 10:29:44 +0000
"DownloadEntryIdentifier" => "831B3780-1959-4BC5-978B-FB63AA430C25"
"DownloadEntryPath" => "/Users/test_account/Downloads/apfell.js"
"DownloadEntryProgressBytesSoFar" => 112116
"DownloadEntryProgressTotalToLoad" => 112116
"DownloadEntryRemoveWhenDoneKey" => 0
"DownloadEntryShouldUseRequestURLAsOriginURLIfNecessaryKey" => 0
"DownloadEntryURL" => "http://172.16.88.134:8081/apfell.js"
}
]
}
Reference: The output of plutil -p run against the 'Downloads.plist' from the collected Safari data.
Reference: The 'apfell.js' payload being downloaded by Safari.
Searching on ‘apfell.js’ shows some interesting records in the ps and the bash history output. In ‘ps.txt’ we can see the osascript process running. Similarly, we can see the script being run in the ‘.zsh_history’ file however this will of course not always be the case.
./BashData/.zsh_history:sudo osascript apfell.js
./sysdiagnose.../ps.txt:root 0 2040 2037 0.0 0.2 31 0 4310940 6364 - s001 S+ 10:34AM 0:00.04 sudo osascript apfell.js
Reference: Additional searching in the 'logarchive' data identified the osascript being executed with sudo privileges for the ‘apfell.js’ script.
Testing against this framework highlighted a number of further interesting files worthy of collection. Therefore, support was added for file listings on file extensions and detection of ‘LoginItems’, a persistence mechanism, in the ‘backgrounditems.btm’ file.
Although intended for troubleshooting crashes and diagnosing problems, there are some useful applications of sysdiagnose data in forensic investigations. The inclusion of larger log sources as well as some process and network data make it an excellent tool for gathering triage information. Wrapping sysdiagnose in a collection script with some useful additions result in a comprehensive tool for gathering macOS triage data.
There is room for improvement here, originally the script used MD5s and was then changed to SHA1s but further research could be done in bench-marking which would be more performant. It should be noted the extended attributes collection is not completely stable. This appears to be due to running xattr on certain files located in iCloud, errors for all sections are saved to the '*.errors' files. As understanding and attacker trade-craft continues to be exposed on macOS there is opportunity for further expansion to collection. It is hoped that the release of this script will be valuable for other blue teams in their investigations and serves as a baseline for further development.
sysdiagnose? More like sysdiag-’the more you knows’.