I’ve done a lot of research on threat hunting and detection for most of my life. Over the past 6-7 years I have spent a lot of time on cloud threat hunting and detection, which is interesting, because it is a different beast due to cloud infrastructure complexity and the extensive use of API services. Click here to see Uptycs anomaly detections in action, as I walk through the use cases discussed in this article and demonstrate detections right from the Uptycs platform. Let me start with a basic definition of what I mean by “anomaly detection.” In practical terms, anomaly detection usually means looking for some kind of behavior, transaction, or a combination of factors, observables, or field values that represents an outlier. That is, what is observed is extremely rare, extremely dense, or possibly new because it hasn’t manifested before. In some cases, really large spikes in certain kinds of event data streams—like errors, and really large spikes in data volumes to/from particular kinds of destinations—can be interesting. Some anomaly detection aspects are about things that 1) are statistically sparse or dense, and 2) are at the extreme opposite ends of that spectrum. And sometimes we're looking for things that are new that we simply haven't seen before. Anomaly detection is good at discovering items of potential concern within the haystack that initially appear too similar to normal, innocent activity. It's also good at finding the next emerging threat where we don't yet know what it is or what it looks like. That is, we need to be able to find and see what's coming. Though I’m a big advocate of the latter, I’m not saying you should get rid of existing threat detection technologies and only use anomaly detection. No single detection technology uncovers every kind and class of threat. You get much better results and much better efficacy when you use different kinds of detection technologies and tools in concert with each other, in ways that are complementary. There are a few public record case studies of data breaches that contain elements relevant to anomaly detection. One event occurred in the political sector in 2016 (see pg. 13, § 34). Here, an attacker obtained cloud user credentials, authenticated as that person, then shared snapshots to exfiltrate data. The attacker wasn’t copying data around a network or anything like that. They were just sharing snapshots with their account as a method of large-scale data exfiltration in order to forklift data from the victim account to the attacker's account. It was a case of credentialed access in large-scale exfiltration. There's nothing inherently unusual or malicious about sharing snapshots. That's true of most cloud actions because most organizations have many accounts where such activity is completely normal. On any given day, there might be up to 13,000 other actions to consider being performed by hundreds or thousands of users. That makes it really hard to try to find those few suspicious events among millions or possibly billions of normal actions. That's quite the formidable haystack. Another interesting case study is where an employee had legitimate cloud service access but started using their access to do illegitimate things, as alleged in the public record indictment. Their suspicious activity began at 3:00 AM. You might ask, "Can I just look for people logging in at that time and assume it’s malicious activity?" If everyone is a 9–5 knowledge worker, that might have good efficacy. But in large distributed systems (cloud or otherwise), there are SREs, there are people that are doing break/fix, and they're logging in that early because they're on pager duty and something is broken. There is a good amount of just normal people logging in to debug and fix things, sometimes in the middle of the night, sometimes at odd hours to negate alerting on a single data point (i.e., odd activity time) likely isn’t a viable detection. Moreover, people roam in the post-COVID era and log in from various time zones. Geolocation traceability – Another data point that differed in this second case study was how the user logged in via a VPN—possibly to obfuscate their location and identity. The presence of that particular VPN in the log data was likely new, whereby that provider didn’t normally appear in the system logs. (Some VPNs are located in geographies that don't make sense in relation to a company’s business; they’re fertile ground for the appearance of anomalies.) Uptycs enriches CloudTrail events with the name of the source, city, region, and country, as well as the latitude-longitude coordinates so we can plot it on a map. This enables analysts to reason about which geographies users are logging in from and what kinds of activity correspond with those geolocations. Something else of note is that the user modified some retention policies on specific logs. Presumably, their rationale was to create short log retention; by the time anybody looked at the affected log the next day or a few days later, there might not exist any evidence in the log to use for an investigation. Log retention, log truncation, log destruction and deactivation are common. In a cloud environment, modifying lifecycle retention policies for logging is not abnormal by itself. Applications, systems, and worlds get spun up and torn down; logging and associated services get turned on and off. Since much of this is done by automation or by a small number of users. So, working in this particular service, modifying and truncating of the logs was likely anomalous for the user in question. Better anomaly detection results are often achieved from using a combination of fields and observables. For example, don’t just look for rare values in a single field, but rather rare combinations of, say, user and service—such as a user working in a service they don't normally use. (It could be that somebody started a new project, but it could otherwise be due to unauthorized credentialed access—their account has been hijacked.) One of the best anomaly detection characteristics is that you don’t have to know what you’re looking for to find a potential threat. A good example is how Uptycs found suspicious activity with no threat intel, rules, nor information in the MITRE ATT&CK matrix because this is relatively new emerging threat research. The research involves a method called GetFederationToken that's being used as a persistence mechanism. It's being used in a way where a bad actor can continue to persist in an account by making a federated user. Even after the original user is disabled, an attacker can still persist with this GetFederationToken technique. The process around this technique is a bit complex and quite interesting because, from what I can see in the data that I've studied, this is something the technique is relatively unusual. We have output from a function called New Action for User. In one case, we were able to detect this unique (and potentially malicious) event earlier this year when a user called this GetFederationToken method. Though we didn’t know what we were initially looking for, we were still able to discover and alert on this service event. It was an action the user had never done before and it didn’t make sense for users to be doing this. It’s a great example of how anomaly detection can reveal the risk associated with an attacker maintaining persistence in an environment that would not be detected otherwise. Uptycs provides a dashboard that displays anomalous findings or anomaly detection results from CloudTrail data. Threat hunters can assess the high-level information, drill down to learn more about what has been discovered, then initiate an investigation if necessary. A few examples of what the Uptycs dashboard might show include: The dashboard has a button called Create Alert used to allow you to determine for yourself what to alert on. Such customization helps avoid alert fatigue by having more of a say in what is and is not turned into a detection or an alert. Uptycs does have alert and event rules but this function gives you powerful flexibility to decide for yourself what an alert should be based on. Uptycs plans to apply anomaly detection to other data sources in addition to cloud data. Being able to discover what's coming next—even when we don't know what we're looking for—provides another great tool in the threat hunters toolbox. Interested in seeing more? Explore our webinar uncovering alerts using anomaly detections. Or, watch it here:
Very often in this landscape, people bring their existing security tools, technologies, standards, and practices to the cloud and expect to do security the same way they did in the on-premise world. It's not that you can't do that, or that it doesn't have value. It does have some value because there are virtual networks to monitor, and we can put EDR agents, endpoint agents, or instrumentation on virtual machines and servers. But there are other worlds to consider, where traditional tools simply are not revealing.
The vast majority of cloud environments consist of services that can’t be monitored with a firewall, an intrusion detection system (IDS), or any kind of host-based or endpoint instrumentation. It takes a totally different approach to look for and detect threats in the cloud. That’s why Uptycs incorporates anomaly detection into our threat hunting and detection capabilities.What is Anomaly Detection?
Case Study – How Anomaly Detection Could Have Identified Unusual Cloud Activity
Case Study – Correlating Activities to Spot an Employee’s Rogue Behavior
How Uptycs Can Detect New and Emerging Threats
Uptycs' Dashboard Displays Anomalous Detections