Fuzzing Farm #1: Fuzzing GEGL with fuzzuf

Author: arata-nvm

Our Fuzzing Farm team mainly works on open source software to find bugs in applications using a variety of techniques. Starting with this blog post, we will introduce our works in the Fuzzing Farm team in four parts: Using Fuzzer, Measuring Fuzzing Performance, Developing 1-day Exploits, and Developing 0-day Exploits.

In Part 1 of this series, we will present an example of how we can use fuzzing to find vulnerabilities in real-world programs.

As part of the Fuzzing Farm team’s works, we utilize fuzzuf, a fuzzing framework we are developing, to find bugs in various products. This article describes the process of finding and fixing bugs in an image processing library called GEGL.

GEGL is a library for image processing developed in the GNOME project. GIMP, which is widely used for image editing, as well as some other programs in GNOME use GEGL for image processing.

The most important feature of GEGL is that each step of image processing is represented by a data structure called DAG. DAG stands for Directed Acyclic Graph, which is a directed graph without closed paths. GEGL receives the DAG as XML and processes the image as described in the XML. For example, if we pass the following XML to GELG, GELG will load the image in.png and output an image with Gaussian blurring applied.

<?xml version='1.0' encoding='UTF-8'?>
<gegl>
  <node operation='gegl:gaussian-blur'>
    <params>
      <param name='std-dev-x'>0.999</param>
      <param name='std-dev-y'>0.999</param>
    </params>
  </node>
  <node operation='gegl:load'>
    <params>
      <param name='path'>in.png</param>
    </params>
  </node>
</gegl>

Let’s discuss more about the detail of the XML format mentioned earlier.

In GEGL, each image processing step is called an “operation.” This is a concept that other image processing tools call “filters” and so on. GEGL has the following operations for instance. (Identifiers in GEGL are in parentheses)

Crop（gegl:crop）
Gamma correction（gegl:gamma）
Invert（gegl:invert）

Each operation can also take some parameters. For example, cropping can take parameters such as x,y,width,height to specify the range of the image to be cropped. GEGL treats these operations and parameters together as a single DAG node; a list of operations supported by GEGL. Their details can be found on the official GEGL website. For example, the following node represents cropping an image from coordinates (10, 10) to a width of 100px and a height of 100px.

<node operation='gegl:crop'>
  <params>
     <param name='x'>10</param>
     <param name='y'>10</param>
     <param name='width'>100</param>
     <param name='height'>100</param>
  </params>
</node>

A combination of multiple nodes can be represented by arranging the nodes in sequence at the same depth. For example, the following DAG represents a sequence of image loading and scale reduction.

<?xml version='1.0' encoding='UTF-8'?>
<gegl>
  <node operation='gegl:scale-ratio'>
    <params>
      <param name='sampler'>cubic</param>
      <param name='x'>0.5</param>
      <param name='y'>0.5</param>
    </params>
  </node>
  <node operation='gegl:load'>
    <params>
      <param name="path">data/grid.png</param>
    </params>
  </node>
</gegl>

You can also apply the effect of a node only to a portion of the node by nesting the nodes as shown below. In this example, checkerboard generation and overlay are applied.

<?xml version='1.0' encoding='UTF-8'?>
<gegl>
  <node operation='svg:src-over'>
 （redacted）
    <node operation='gegl:checkerboard'>
        <params>
            <param name='color1'>rgb(0.0, 0.0, 0.0)</param>
            <param name='color2'>rgb(1.0, 1.0, 1.0)</param>
            <param name='x'>32</param>
            <param name='y'>32</param>
            <param name='format'>YA float</param>
        </params>
    </node>
  </node>
 （redacted）
  <node operation='gegl:checkerboard'>
      <params>
          <param name='color1'>rgb(0.0, 0.0, 0.0)</param>
          <param name='color2'>rgb(1.0, 1.0, 1.0)</param>
          <param name='x'>32</param>
          <param name='y'>32</param>
      </params>
  </node>
</gegl>

As explained, GEGL has a plenty of ways to represent operations since the user can decide in detail how to set up and combine each operation.

Fuzzing GEGL

We have performed fuzzing on GEGL using fuzzuf, a fuzzing framework we are developing. For more information on fuzzuf, please refer to this blog post.

Instrumenting GEGL

In this fuzzing, we will use the AFL mode of fuzzuf. To use this mode, we will first instrument and build GELG. Since the build instruction of GELG is documented in detail, we will refer to the official GEGL page. We used the version with commit ID a566b738331757cf25118af5bdc65218ae5eb3b2 this time.

First, install the dependencies of GEGL.

$ sudo apt update
$ sudo apt install build-essential pkg-config python3 python3-pip \\
  ninja-build git libglib2.0-dev libjson-glib-dev libpng-dev libgegl-dev
$ sudo pip3 install meson

Next, build GEGL using afl-gcc and afl-g++ included in AFL.

$ git clone --depth 1 <https://gitlab.gnome.org/GNOME/gegl> && cd gegl
$ CC=afl-gcc CXX=afl-g++ meson _build
$ ASAN_OPTIONS=detect_leaks=0 AFL_USE_ASAN=1 ninja -C _build
$ export BABL_PATH=/usr/lib/x86_64-linux-gnu/babl-0.1
$ export GEGL_PATH=/usr/lib/x86_64-linux-gnu/gegl-0.4

You can find the instrumented binaries in _build/bin/gegl after the build is complete.

Collecting Corpus

We need to collect corpus to start fuzzing, which can be done by using search engines such as Google, but in this case we used the set of XML files prepared for testing GELG. We decided this was the best method since the names of operations and parameters in the XML must be correct for GEGL to properly recognize the XML.

Writing Harness

We can now start fuzzing the GEGL binary that we have just built. However, it may cause performance problems if we simply fuzz the GEGL binary because it includes initialization of functions and parsing of command line arguments, which are all unnecessary for this fuzzing.

Therefore, we decided to extract only the main processing from the functions corresponding to the entry point of GEGL and use the following code as a harness.

gint main(gint argc, gchar **argv) {
  GeglNode    *gegl      = NULL;
  gchar       *script    = NULL;
  GError      *err       = NULL;
  gchar       *path_root = NULL;

  // [1]
  gegl_init(NULL, NULL);
  gegl_path_smooth_init();
  path_root = g_get_current_dir ();

  // [2]
  g_file_get_contents (argv[1], &script, NULL, &err);
  if (err != NULL) {
    return 1;
  }

  // [3]
  gegl = gegl_node_new_from_xml (script, path_root);
  if (!gegl) {
    return 1;
  }

  // [4]
  GeglNode *output = gegl_node_new_child (gegl,
                                          "operation", "gegl:save",
                                          "path", "out.png",
                                          NULL);                              
  gegl_node_connect_from (output, "input", gegl, "output");
  gegl_node_process (output);

  // [5]
  g_object_unref (output);
  g_object_unref (gegl);

  g_free (script);
  g_clear_error (&err);
  g_free (path_root);
  gegl_exit ();

  return 0;
}

Let’s briefly check the code. First, [1] initializes the GEGL internals: programs for each GEGL operation is not included in the binary but is included in a shared library in a specific directory (under the GEGL_PATH environment variable). This initialization process loads those operations and makes them available for use in image processing. Next, in [2] and [3], it reads the contents of the XML file as strings, which are then parsed and stored as GeglNode-type data. Then, in [4], we specify out.png as the output destination for images, and then the actual image processing is performed. Finally, at [5], the memory allocated so far is freed and the program terminates.

Fuzzing

Now we can finally start fuzzing. Since the default option would cause fuzzuf to exit on timeout, we set the timeout to 10 seconds by passing --exec_timelimit_ms 10000.

$ ASAN_OPTIONS=detect_leaks=0:abort_on_error=1:symbolize=0 fuzzuf afl -i ./corpus -o ./out --exec_timelimit_ms 10000 -- _build/bin/gegl @@

Triaging Crashes

After fuzzing GEGL for about two weeks, we found a total of 134 unique crashes, and after triaging the crashes found by AFLTriage(), we found the following vulnerabilities in GEGL.

2 Heap Buffer Overflow
2 Integer Overflow
1 DoS caused by resource consumption
1 Stack Buffer Overflow
1 Stack Buffer Underflow
10 NULL pointer dereferences

Root-Cause Analysis

In this article, we will discuss a NULL pointer dereference vulnerability as a simple example of the crashes we discovered. Let’s check the root-cause of this vulnerability. The following code shows gegl_path_parse_string function which has the vulnerability.

void
gegl_path_parse_string (GeglPath    *vector,
                        const gchar *path)
{
  GeglPathPrivate *priv = GEGL_PATH_GET_PRIVATE (vector);
  const gchar *p = path;
  InstructionInfo *previnfo = NULL; // [3]
  gdouble x0, y0, x1, y1, x2, y2;

  while (*p)
    {
      gchar            type = *p;
      InstructionInfo *info = lookup_instruction_info(type);

      if (!info && ((type>= '0' && type <= '9') || type == '-')) // [1]
        {
          if (previnfo->type == 'M') // [2]
            {
              info = lookup_instruction_info(type = 'L');
            }
          else if (previnfo->type == 'm')
            {
              info = lookup_instruction_info(type = 'l');
            }
          else if (previnfo->type == ' ')
            g_warning ("EEEK");
        }

      // redacted

      if (*p)
        p++;
    }

  gegl_path_dirty (vector);
}

This function is called if the XML file received by GEGL contains a path string, and parses the string. A path string is used to describe a shape composed of multiple straight lines like an SVG path.

Let’s now pass the following XML file to GEGL.

<gegl:fill-path d='0'/>

Then the gegl_path_parse_string function is passed the string ”0” as the path argument. In the first loop of the while statement, the character ’0’ is assigned to type, so the conditional expression in the if statement in [1] evaluates to true. Next, previnfo is dereferenced in [2]. In [3], however, previnfo is initialized to NULL and has not been changed since. The resulting NULL pointer dereference causes the program to crash.

Writing Patch

Since the NULL pointer dereference is caused by previnfo being initialized with NULL, we can simply modify the program to perform a NULL check before dereferencing. As you can see from the code below, the fix is very simple.

// redacted
if (previnfo && previnfo->type == 'M')
// redacted
else if (previnfo && previnfo->type == 'm')
// redacted
else if (!previnfo || previnfo->type == ' ')
// redacted

Conclusion

In this article, we described the fuzzing of GEGL and its vulnerabilities. GEGL is a widely used library that has been developed for more than 20 years, but it is not a major target for fuzzing. We were able to find previously undiscovered vulnerabilities through fuzzing.

We will continute to fuzz variety of software and work with developers to reduce the number of threats lurking in applications.

In the next article, we will cover theoretical topics related to benchmark of fuzzers.