What if I told you that dark data is lying in wait in some of your API responses?
You know the data. It’s the extra stuff being returned in an endpoint that probably shouldn’t be getting returned at all.
It’s a potential treasure trove of insights and a possible security risk.
In this article, I will explain what dark data is all about and how to look for it when testing an API.
Let’s get to it!
Dark data can be defined as any data collected and stored by an organization but not generally used for any practical purpose.
It often resides in the shadows of databases and storage systems, hidden away from the light of analytics and business intelligence.
Typically, organizations consider this type of data low-value or irrelevant, leading to its oversight or neglect.
However, when it comes to API responses, dark data can contain sensitive information that should not be exposed to the public.
There are several reasons why you should pay attention to dark data in your API responses:
It might surprise you to know that developers are human. That’s right. A shocker there.
The reality is modern development practices rely heavily on object models. They are used to bind data to database records. Structured variables and objects in code depend on them. And yes… the serialized JSON objects normally included in a data contract in API endpoints rely on them, too.
In many cases, developers don’t even realize that they may be publishing dark data when they expose an object model in an API call.
Let me give you a practical example of what dark data may look like.
Let’s refer to the vulnerability of Broken Object Property Level Authorization that is defined in the OWASP API Security Top 10.
Pay close attention to the description of the impact of this class of vulnerability:
Unauthorized access to private/sensitive object properties may result in data disclosure, data loss, or data corruption. Under certain circumstances, unauthorized access to object properties can lead to privilege escalation or partial/full account takeover.
API3:2023 Broken Object Property Level Authorization
Imagine you have an API endpoint that returns User information. Let’s say the User object contains a few properties:
It seems harmless enough. Things like the last login date and the location information aren’t something you probably want to share with other users. Nor is the email address.
But depending on how the object model is used, it very well may return that on a direct call to the Users endpoint. Or indirectly in an endpoint that nests the User object in their responses, like in an endpoint that lists members of a group or project.
Usually, developers don’t want to modify the schema of object models or change API contracts. They filter out the data in the front end and only share/use the data they think is pertinent.
But here’s the rub. At that point, it’s too late. We will have captured the dark data in the attack proxy and can exfiltrate all the information to use as we see fit.
See where I am going with this?
As an API hacker, it is essential to identify any dark data present in an API.
Here are a few things you can do…
First and foremost, it is crucial to understand the API contract or documentation. Look for any mention of sensitive data that should not be returned in responses. Try to map out the schema of every object model in the API and see if any properties exist that may look like dark data.
Just remember, developers don’t always keep their API documentation and definitions up to date.
This is one of the reasons I am a big fan of generating my own rogue API documentation based on the real-world traffic to the API. It will capture the data as it flows through and show you EXACTLY what data is being used and what the object model looks like.
Use tools like Burp Suite or Postman to explore all possible endpoints and parameters. Try different combinations and variations to see if you can uncover any dark data.
Look closely at GET operations that may filter or return specific properties based on query parameters. It may be that normal responses don’t include dark data, but with additional parameters, they can return the data directly.
This is really helpful in complex REST and GraphQL endpoints where query mutations may expose significantly more dark data that you shouldn’t normally have access to.
Use a proxy tool like Burp Suite to intercept and inspect the API responses. Look for any unexpected data that does not seem to be relevant or necessary.
One trick is to monitor how objects/resources are created in the API by looking at what is sent in a POST operation. Then, look at how they are updated in PUT and/or POST operations. Use tools like Burp Comparer and see what is different in the payload body of the request.
If you see different properties being used, you might be able to map out dark data properties you didn’t know existed.
Then compare that with any GET operations that return the objects. Anything additional there? Anything missing? These are leading indicators of possible dark data.
As mentioned earlier, sensitive data can sometimes be included indirectly through nesting in an endpoint response. Make sure to check all related objects and their properties. Look for any unexpected or sensitive data that may have been included.
I once stumbled upon an endpoint that exposed a list of all Users who were members of a specific channel in a collaboration app. I noticed a parameter in the GET request that ultimately returned the complete, expanded list of User objects, including hidden properties I had never seen before that were internal to the system.
That discovery allowed me to learn about a field being used to map users to particular roles in the tenant. The access token for the API reflected these roles on creation.
The result? I discovered a method to overwrite that hidden property with a PUT operation, enabling my next login to possess greater privileges than intended.
Privesc thanks to the discovery of dark data.
The elusive nature of dark data in an API can pose significant security risks if left unattended.
However, with the right approach and tools, you can illuminate these hidden data patterns, enhancing your application’s security.
By meticulously examining the objects and their properties in API operations and looking for unexpected, sensitive, or missing data, you can uncover dark data. And probably some Broken Object Property Level Authorization vulnerabilities too.
This process is not just a test of diligence; it is a journey of discovery. It challenges us to think differently, to question the norms, and to leave no stone unturned in our quest for secure and efficient applications.
Vulnerabilities may lurk in the shadows of nested objects or hidden properties, but with persistence and curiosity, we can bring them to light.
So, let’s keep probing, keep questioning, and keep discovering. Because in the world of API security, the hunt for dark data never ends.
Good luck!
Have you joined The API Hacker Inner Circle yet? It’s my FREE weekly newsletter where I share articles like this, along with pro tips, industry insights, and community news that I don’t tend to share publicly. Subscribe at https://apihacker.blog.
The post Finding “dark data” in an API appeared first on Dana Epp's Blog.
*** This is a Security Bloggers Network syndicated blog from Dana Epp's Blog authored by Dana Epp. Read the original post at: https://danaepp.com/finding-dark-data