Devfile file write vulnerability in GitLab

This post is a extensive walkthrough of the process of identifying and exploiting CVE-2024-0402.

The vulnerability in GitLab was fixed on January 25th 2024 with a Critical Security Release.

Sometimes exploits for vulnerabilities are like onions: they have layers. This particular exploit had several layers and two underlying vulnerabilities which were used in combination to achieve an arbitrary file write on a GitLab instance. This file write then can be further abused to run arbitrary commands on a GitLab instance. This post will cover the general approaches to identify such issues and the technical details the exploit relies upon. However we'll spare the full end-to-end exploit as we deem the underlying techniques and approaches much more relevant in general than sharing an exploit for a patched vulnerability.

Starting point: dependencies

The adventure started by looking at the dependencies of the GitLab main project.

While looking for some low-hanging fruit in the project dependencies I noticed the devfile Gem making calls to an external helper binary by using Open3.capture3. Calling external binaries is a typical red flag for me, there are many things which might go sideways. Command or Argument injections for example, or other surprises from lesser known features of the called binary. Not to speak of actual vulnerabilities in the binary or its dependencies.

Notably the devfile Gem is written in-house at GitLab. A quick review of the repository revealed some Go based code from which the devfile binary called by the Ruby Gem was being created. I didn't know much about Devfiles at this point. I only had the vague concept in the back of my head that those were YAML files used to describe the environments for GitLab's Workspaces feature.

Workspaces are isolated web-based development environments which are deployed by the GitLab application into Kubernetes clusters. Zooming out a bit we have:

Devfiles: YAML files which are used to describe Workspaces in Kubernetes environments
A Ruby library which parses those devfiles by calling a Go based helper binary

A quick check of the Ruby code calling the Go binary showed there were no surprises or low hanging fruits. The binary is being called directly with no shell being involved, simple command injections were not possible. Also the Go binary had no opportunity for Argument injections.

Digging deeper

Sometimes it's great to just start messing around with a software to get a feeling for potential vulnerabilities. The design of the devfile Gem made this easy. I could just use the included Go based binary and feed it some YAML. Digging into the documentation and looking for some sample files to use I came across the parent feature which allows another devfile to be specified. That file is then used as a base for your current devfile. I used the example from the documentation with the devfile binary in the Gem like so:

joern@host2:~/projects/deps/ruby/3.2.0/gems/devfile-0.0.25.pre.alpha1-x86_64-linux/bin$ ls
devfile
joern@host2:~/projects/deps/ruby/3.2.0/gems/devfile-0.0.25.pre.alpha1-x86_64-linux/bin$ ./devfile flatten 'schemaVersion: 2.2.0
metadata:
  name: my-project-dev
parent:
  uri: https://raw.githubusercontent.com/devfile/registry/main/stacks/nodejs/devfile.yaml'
failed to populateAndParseDevfile: error getting devfile info from url: failed to retrieve https://raw.githubusercontent.com/devfile/registry/main/stacks/nodejs/devfile.yaml, 404: Not Foundjoern@host2:~/projects/deps/ruby/3.2.0/gems/devfile-0.0.25.pre.alpha1-x86_64-linux/bin$

I was quite surprised when I did a ls in the directory after that command:

joern@host2:~/projects/deps/ruby/3.2.0/gems/devfile-0.0.25.pre.alpha1-x86_64-linux/bin$ ls
2.1.1  2.2.0  devfile  OWNERS  stack.yaml

A directory from the devfile/registry repository has been copied into the working directory where I ran the ./devfile command. This was the moment I figured I might be on to something worth spending some more time on.

One step back and three steps forward

The I'll copy some stuff from the Internet into your working directory when you parse a devfile thing was very promising. I was hoping to use this vector to gain an arbitrary file write from it. Should be easy enough, all I needed to do was to figure out the code path from within GitLab to trigger the dependency to flatten a devfile I crafted. That, and yes maybe a traversal out of the working directory, maybe not. That would depend on my current unknowns where on a GitLab instance this working directory would be and what I would control in terms of written files. But overall, this issue looked like a juicy starting place.

So I took a deeper dive into the main Ruby on Rails codebase of GitLab, just to find out that there's a validation to prevent the usage of that parent feature in a devfile. This was really inconvenient, I almost saw that awesome file write vanish in that line of code:

return err(_("Inheriting from 'parent' is not yet supported")) if devfile['parent']

But it wasn't really that bad, this roadblock was something for which I was prepared. In May 2023 I was "messing" with YAML files. This was inspired by a blog post from Jake Miller about parser differentials in JSON parsers, as JSON and YAML are somewhat similar in their use cases, but YAML is a bit more complex I thought it might be worthwhile looking at YAML from a parser differential perspective. Back then I was able to craft a YAML file which would parse differently in Ruby and Python. However it would parse the same in Ruby and in Go, so I "just" needed to find a similar parser differential in Go and in Ruby. Let's first look at the initial YAML file targeting Ruby/Go vs. Python:

joern@host2:~/projects/devfile$ cat 1.yaml 
test: python
!!binary dGVzdA==: ruby & go
joern@host2:~/projects/devfile$ python -c 'import yaml;x = yaml.safe_load(open("1.yaml"));print(x["test"])'
python
joern@host2:~/projects/devfile$ ruby -ryaml -e 'x = YAML.safe_load(File.read("1.yaml"));puts x["test"]'
ruby & go
joern@host2:~/projects/devfile$ cat g.go 
package main

import (
        "fmt"
        "log"
        "os"
        "gopkg.in/yaml.v3"
)

func main() {
    data, _ := os.ReadFile(os.Args[1])
        unmarshalled := &yaml.Node{}

        err := yaml.Unmarshal([]byte(data), unmarshalled)
        if err != nil {
                log.Fatalf("error: %v", err)
        }
        var expanded interface{}
        err = unmarshalled.Content[0].Decode(&expanded)
        if err != nil {
                log.Fatalf("error: %v", err)
        }

        d, err := yaml.Marshal(expanded)
        if err != nil {
                log.Fatalf("error: %v", err)
        }
        fmt.Printf("%s\n", string(d))
}

joern@host2:~/projects/devfile$ go run g.go 1.yaml
test: ruby & go

The very simple "secret sauce" here is using the !!binary notation to introduce a Base64 encoded key:

test: python
!!binary dGVzdA==: ruby & go

The Base64 (!!binary) string dGVzdA== becomes test when being decoded. In Ruby and in Go this will overwrite the previously defined test: python value. But in Python the following will happen:

python -c 'import yaml;y = yaml.safe_load(open("1.yaml"));print(y)'        
{'test': 'python', b'test': 'ruby & go'}

The !!binary notation will create a Binary Sequence (b'test') in Python which is different from the string test. So we'll have two keys, test and b'test' here, instead of one overwriting the other like it happens in Ruby and Go.

This behavior was the base I had ready, would GitLab have been a Python code base with the same Go parser backend this would've been ready to use right away to bypass the parent key filtering. Unfortunately this isn't the case so I had to find a different way to bypass the key filter.

I spent a bit of time reading about Tags in YAML and noticed the line Local tags start with “!”. Well OK then I thought, let's just try and see what happens when I use !binary instead of !!binary:

joern@host2:~/projects/devfile$ cat 2.yaml
test: non-binary 
!binary dGVzdA==: binary
joern@host2:~/projects/devfile$ ruby -ryaml -e 'x = YAML.safe_load(File.read("2.yaml"));puts x'
{"test"=>"binary"}
joern@host2:~/projects/devfile$ go run g.go 2.yaml
dGVzdA==: binary
test: non-binary

joern@host2:~/projects/devfile$

Yes, it really was that "easy". Having this !binary behavior difference in Ruby and Go we can construct a YAML file like:

whatever: is here
!binary parent: hehehe injected

Now when we look at it through the eyes of Ruby it will appear as:

joern@host2:~/projects/devfile$ ruby -ryaml -e 'x = YAML.safe_load(File.read("what.yaml"));puts x'
{"whatever"=>"is here", "\xA5\xAA\xDE\x9E"=>"hehehe injected"}

The !binary value has been decoded to a binary key "\xA5\xAA\xDE\x9E" and now, drumroll for the Go parser:

joern@host2:~/projects/devfile$ go run g.go what.yaml 
parent: hehehe injected
whatever: is here

The !binary tag has just been silently dropped and the resulting YAML key is called parent. These two behaviors combinded is exactly what we need to bypass the Ruby validation for the parent key in the Devfile YAML.

Writing files where they don't belong

Now that we're able to sneak the parent key past Ruby and into the Go code it is time to dig deeper into the devfile library and the odd file writing behavior to the working directory I noticed earlier.

First of all I wanted to know where that working directory was when the devfile binary from the Gem was called on a GitLab instance. I was hoping that it would be somewhat useful directory from an exploitation perspective.

To find this out I looked at the documentation howto set up GitLab Workspaces. There's quite a lot of requirements here to set up those Workspaces properly, a Kubernetes Cluster and a GitLab Agent for it, as well as certain configuration projects on the GitLab instance to set up and tie everything together. The TL;DR of my test setup was to run everything locally. GitLab was run using Docker and the Kubernetes side was set up with minikube.

With those two pieces in place we can connect the minkube cluster with the GitLab Agent to a project in a group on the GitLab instance. In another within the same group we can then create a .devfile.yaml with the following content:

schemaVersion: 2.2.0
!binary parent: 
    uri: https://raw.githubusercontent.com/devfile/registry/main/stacks/nodejs/devfile.yaml'

To trigger the Devfile parsing we now just need to create a workspace for that project:

new workspace

After this step I logged into the GitLab Docker container and searched for the file stack.yaml which was present when I parsed the same Devfile earlier when I initially observed the file writing behavior.

root@localhost:/# find . -name stack.yaml 2> /dev/null 
./var/opt/gitlab/gitlab-rails/working/stack.yaml
root@localhost:/# cd /var/opt/gitlab/gitlab-rails/working/
root@localhost:/var/opt/gitlab/gitlab-rails/working# ls -lart
total 46
drwxr-xr-x 9 git root  12 Apr 11 12:59 ..
-rw-r--r-- 1 git git  283 Apr 12 13:09 stack.yaml
-rw-r--r-- 1 git git   73 Apr 12 13:09 OWNERS
drwxr-xr-x 2 git git    2 Apr 12 13:09 2.2.0
drwxr-xr-x 2 git git    2 Apr 12 13:09 2.1.1
drwx------ 4 git root   6 Apr 12 13:09 .

This result was not very promising, the directory was empty besides the files written via the Devfile parser. While there might be some race conditions with other parts of GitLab where we could write into temporary directories I decided to not go down this route and instead dig deeper into the Devfile library.

The main logic which parses the parent key in the Devfile was quickly identified. It starts in parseParentAndPlugin(). The name of the function already indicates another similar feature comparable to parent, namely the plugin. As both features, parent and plugin had pretty much the same underlying logic in a switch statement for plugin and for parent:

switch {
case parent.Uri != "":
    parentDevfileObj, err = parseFromURI(parent.ImportReference, d.Ctx, resolveCtx, tool)
case parent.Id != "":
    parentDevfileObj, err = parseFromRegistry(parent.ImportReference, resolveCtx, tool)
case parent.Kubernetes != nil:
    parentDevfileObj, err = parseFromKubeCRD(parent.ImportReference, resolveCtx, tool)
default:
    return fmt.Errorf("devfile parent does not define any resources")
}

I digged deeper into the parseFrom* methods. At first I looked into parseFromURI, a URI to download a Devfile from I thought, should be easy enough. Surprisingly it was not that easy. The parseFromURI function had quite some logic involved about local and remote URLs. What caught my attention was:

if tool.downloadGitResources {
    destDir := path.Dir(curDevfileCtx.GetAbsPath())
    err = tool.devfileUtilsClient.DownloadGitRepoResources(newUri, destDir, token)
    if err != nil {
        return DevfileObj{}, err
    }

Litte did I know back when I examined this code about CVE-2023-49569:

CVE-2023-49569 A path traversal vulnerability was discovered in go-git versions prior to v5.11. This vulnerability allows an attacker to create and amend files across the filesystem. In the worse case scenario, remote code execution could be achieved.

This vulnerability was published on January 12th, and could have been very useful for my proposed attack as the Devfile library utilizes go-git for the underlying Git operations. However when I was looking at the Devfile library this vulnerability was not public yet. I might have found it if I had decided to dig deeper into the Git operations, but I didn't ;). Instead when I was realizing that a simple URL pointing to one of either gitlab.com, github.com, raw.githubusercontent.com or bitbucket.org the Devfile library would do its magic and try to git clone the according repository to get the referenced files.

The relevant implementation parts are

In pkg/devfile/parser/util/utils.go

// DownloadGitRepoResources downloads the git repository resources
func (c DevfileUtilsClient) DownloadGitRepoResources(url string, destDir string, token string) error {
    var returnedErr error
    if util.IsGitProviderRepo(url) {
        gitUrl, err := util.NewGitURL(url, token)
        if err != nil {
            return err
        }

        if !gitUrl.IsFile || gitUrl.Revision == "" || !ValidateDevfileExistence((gitUrl.Path)) {
            return fmt.Errorf("error getting devfile from url: failed to retrieve %s", url)
        }

        stackDir, err := os.MkdirTemp("", "git-resources")
        if err != nil {
            return fmt.Errorf("failed to create dir: %s, error: %v", stackDir, err)
        }

        defer func(path string) {
            err := os.RemoveAll(path)
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
            }
        }(stackDir)

        gitUrl.Token = token

        err = gitUrl.CloneGitRepo(stackDir)
        if err != nil {
            returnedErr = multierror.Append(returnedErr, err)
            return returnedErr
        }

        dir := path.Dir(path.Join(stackDir, gitUrl.Path))
        err = util.CopyAllDirFiles(dir, destDir)
        if err != nil {
            returnedErr = multierror.Append(returnedErr, err)
            return returnedErr
        }
    } else {
        return fmt.Errorf("failed to download resources from parent devfile.  Unsupported Git Provider for %s ", url)
    }

    return nil
}

and in /pkg/util/util.go:

// IsGitProviderRepo checks if the url matches a repo from a supported git provider
func IsGitProviderRepo(url string) bool {
    if strings.Contains(url, RawGitHubHost) || strings.Contains(url, GitHubHost) ||
        strings.Contains(url, GitLabHost) || strings.Contains(url, BitbucketHost) {
        return true
    }
    return false
}

So instead of digging into the go-git path here I performed some simple checks using symbolic links in repositories to see if that would bring me any further. That wasn't the case, so next I started looking into parseFromRegistry. A registry for Devfiles is based on the Open Container Initiative (OCI) Specification and pretty much behaves like for instance a Docker registry.

Diving into parseFromRegistry quickly escalated as I was confronted with just another dependency of the dependency. parseFromRegistry calls getResourcesFromRegistry which itself leaves the heavy-lifting to registryLibrary. This library, registry-support is also developed by the Devfile project and I decided to take a look at it. After following the code flow, I arrived at the PullStackFromRegistry function, which calls the decompress function, which takes a tar.gz archive from the registry library and extracts the files inside that archive. Let's have a look at that decompress function:

// decompress extracts the archive file
func decompress(targetDir string, tarFile string, excludeFiles []string) error {
    var returnedErr error

    reader, err := os.Open(filepath.Clean(tarFile))
...
    gzReader, err := gzip.NewReader(reader)
...
    tarReader := tar.NewReader(gzReader)
    for {
...

        target := path.Join(targetDir, filepath.Clean(header.Name))
        switch header.Typeflag {
...
        case tar.TypeReg:
            /* #nosec G304 -- target is produced using path.Join which cleans the dir path */
            w, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
                return returnedErr
            }
            /* #nosec G110 -- starter projects are vetted before they are added to a registry.  Their contents can be seen before they are downloaded */
            _, err = io.Copy(w, tarReader)
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
                return returnedErr
            }
            err = w.Close()
            if err != nil {
                returnedErr = multierror.Append(returnedErr, err)
                return returnedErr
            }
        default:
            log.Printf("Unsupported type: %v", header.Typeflag)
        }
    }

    return nil
}

This looked promising: the line

target := path.Join(targetDir, filepath.Clean(header.Name))

followed by:

/* #nosec G304 -- target is produced using path.Join which cleans the dir path */
w, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))

The gosec rule 304, File path provided as taint input, had alerted here and the developer had instructed the gosec scanner to ignore the finding. The comment even gives us the reasoning for this: target is produced using path.Join which cleans the dir path, which references how the filepath.Clean(header.Name) is used a little earlier in the code flow.

However filepath.Clean does not work as expected here. It's not really obvious from the documentation either, but when supplying a relative path to filepath.Clean the Clean()ed path will stay relative.

Consider the following example:

package main

import (
    "fmt"
    "path/filepath"
)

func main() {
    fmt.Println(filepath.Clean("/../../../../../../../tmp/test")) // absolute path
    fmt.Println(filepath.Clean("../../../../../../../tmp/test"))  // relative path
}

The output of this program is:

/tmp/test
../../../../../../../tmp/test

And this was the missing puzzle piece for a successful exploit. Tar files can contain /es and .s in their entry names. So we can traverse out of the intended directory and decompress and write files to arbitrary locations on disk when including a parent from a registry in the Devfile.

After this path traversal was identified a lot of time actually went into setting up a fake registry server and delivering a proper payload. In total this vulnerability, from starting to look into the Devfile Gem to having the a working exploit ready took about two working days where a lot of time was spent in setting up both the Workspaces feature and the fake registry.

Conclusions

There are several takeaways more or less hidden between the lines in this post I would like to highlight here.

Parser differentials

Parser differentials can be a very powerful tool when it comes to exploitation. They are very context dependent though and hard to generalize in their use for exploiting software.

For once the SAST scanner was right. The path traversal was not stopped by filepath.Clean, but the comment's author thought it was. They explicitly turned off the gosec warning. The whole point in software exploitation is to let some software do what it wasn't intended to do by the authors. This means when reading comments in source code they should be taken as an inspiration to think how can I falsify this comment?.

`../` keeps on giving

The character sequence ../ is really a gift which keeps on giving. Path traversals most of the time are simple and reliably to exploit. They've been around 30+ years, still this vulnerability class has not yet been solved.

Can't find all the bugs

The go-git vulnerability (CVE-2023-49569) was disclosed only a few days after I internally reported the file write issue based on the registry parser. This vulnerability used in combination with the parser differential would have been another way to write files where they don't belong. The message here is kind of two fold: while it might not be possible to find all the bugs there's often enough bugs to reach your goal in a sufficiently big code base. ;)

Keep digging everyone

Finally, I'd like to highlight that to find vulnerabilities it's always worth digging into source code, reading it and trying to understand the assumptions under which it has been developed. The real hard part is to "know" where to look and when to stop looking.