Restructuring the Binary Ninja Decompiler

Restructuring the Binary Ninja Decompiler
2024-6-20 01:33:37 Author: binary.ninja(查看原文) 阅读量:9 收藏

For the upcoming Binary Ninja 4.1, we will be releasing a new implementation of our decompiler’s control flow recovery. You can try it today by switching to the development channel and updating to the latest build. It isn’t fully optimized yet and may produce non-optimal results in some cases, but there should already be improvements in readability of the output, including a reduction in nesting depth and a significant reduction in the complexity of conditional expressions.

This new implementation aims to improve the readability of the decompiler output while simultaneously improving accuracy. It also aims to significantly improve maintainability, allowing us to iterate on our decompiler faster. We have additionally added a new suite of tests to allow us to make changes to the decompiler and have more confidence that the changes haven’t caused regressions in accuracy.

Control Flow Recovery in Binary Ninja 4.0 and Below

In Binary Ninja up to and including version 4.0, we use an algorithm we call condition graph resolution in order to take an arbitrary control flow graph and turn it into High Level IL (which can then be displayed as C code if you prefer). This algorithm is similar in nature to the approach taken by the No More Gotos paper¹ describing the Dream decompiler.

Given a control flow graph, Binary Ninja first tries to find regions of code that have a single entry point and a single exit destination. It traverses the graph in a way that causes the innermost structure of the graph to be resolved first, simplifying regions into blocks of structured IL. A region of a graph looks like the highlighted region below:

flowchart TB
X["if (X)"] -- true --> Y
X -- false --> Z
Y --> Z
Z --> A
subgraph cfg[" "]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
D & E --> F

Loops are resolved by treating the loop body as a new subgraph, and resolving the subgraph in the same way. If the loop has abnormal entries or exits, there are rules for transforming the graph into one that can be resolved.

Often there will be a region of code that can’t be trivially resolved into a simple set of if conditions. In this case, Binary Ninja will take the nodes that make up the region and compute the condition required to reach each node. The nodes are traversed in a way to preserve ordering of the program. The result is called the condition graph. An example is below:

flowchart LR
subgraph cfg [Control Flow Graph]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph cond [Condition Graph]
direction LR
cX["if (!A)"] --> cC["C"]
cZ["if (!A || !B)"] --> cE["E"]
cY["if (A && B)"] --> cD["D"]
end
subgraph ifelse [If/Else Detection]
direction LR
iX["if (!A)"] --> iC["C"]
iY["if (!A || !B)"] -- true --> iE["E"]
iY -- false --> iD["D"]
end
subgraph result [Resolution]
R["<div style="text-align: left;"><pre>

if (!A)
    C

if (!A || !B)
    E
else
    D</pre></div>"]
end
cfg --> cond
cond --> ifelse
ifelse --> result

Once the condition graph is constructed, rules are applied to it to simplify the graph. This includes merging common conditions, detecting if/else constructs, and resolving switch statements. Once there are no longer optimizations that can be applied, Binary Ninja turns the condition graph into High Level IL, and replaces the entire region of the graph with a single node containing that High Level IL.

You can watch the control flow resolution process in detail for any function by entering the following into the Python console in Binary Ninja:

current_function.request_debug_report("high_level_il")

This will display a detailed report of how the decompiler resolved the function, including graphs for each step of the process.

Advantages of Condition Graph Resolution

Traditional approaches to control flow resolution struggle to handle control flow graphs that the compiler has optimized to share code. The condition graph resolver is designed to try and avoid falling back to goto statements to resolve optimized graphs, producing code that is more natural.

Even the example control flow graph above shows the advantages of using condition graphs to resolve control flow. Binary Ninja 4.0 and prior give results that are consistent with what a typical programmer might write when asked to write C code for the graph:

if (!A) {
    C
}

if (!A || !B) {
    E
} else {
    D
}

When a compiler comes across such a structure, it might see the condition !A is shared between two of the if statements, and optimize that into a single comparison. Binary Ninja uses the condition graph to recreate the original set of conditions, undoing the compiler’s optimization.

The following is what the other major decompilers, which use more traditional methods, will produce for such a construct:

if (!A) {
    C
    goto label;
}

if (B) {
    D
} else {
label:
    E
}

Traditional methods tend to struggle when compilers optimize duplicated code away and leave a single shared piece of code. Without undoing the compiler’s optimization, the decompiler can’t produce valid code without using a goto statement.

Condition graphs are not without their own problems, however.

Condition Complexity Explosion

When faced with code where the compiler deduplicated code, not a condition, or when the original author used a goto statement to manually combine two or more paths, the condition graph resolver can have trouble producing readable conditions. Take the following graph of a switch statement with an extra edge coming in from elsewhere into the block of code making up the default case:

flowchart TB
X["if (x > 0)"] -- true --> A["if (x == 4)"]
X -- false --> Y["switch (y)"]
Y --> c0["case 0: A"]
Y --> c1["case 1: B"]
Y --> c2["case 2: C"]
Y --> c3["case 3: D"]
Y --> c4["case 4: E"]
Y --> c5["case 5: F"]
Y --> c6["case 6: G"]
Y --> c7["default: H"]
c0 & c1 & c2 & c3 & c4 & c5 --> I
A -- true --> c7
c7 --> I

When generating the condition for H in the default case, the condition graph produces a large condition:

x == 4 || (x <= 0 && y != 0 && y != 1 && y != 2 && y != 3 && y != 4 && y != 5 && y != 6)

The more cases there are, the larger the condition. If the switch statement is enclosed in another if statement, the conditions compound and produce an extremely hard to read condition. Large, complex software with optimizations often creates complex graphs that yield conditions that are orders of magnitude worse than the one shown here.

Binary Ninja has heuristics built in to try and avoid producing these large conditions when possible. In this case, it will redirect the if (x == 4) node with a goto statement to reduce the graph to use trivial conditions.

Sometimes the graph is so complex that the condition graph resolver struggles to make sense of it all. When you see warnings like the following in the log, Binary Ninja has encountered this issue:

[Default] Function 0x31d5684 has HLIL condition that exceeds maximum reduction iterations, condition not optimized (step 7)
[Default] Function 0x321244 at 0x3212d4 exceeds 'analysis.hlil.maxIntermediateConditionComplexity' limit with: 1552969, omitting condition (step 45)

We have an internal condition complexity measurement that measures the size of a condition, and we call these issues “condition complexity explosion” as it tends to be exponential in nature. Given unlimited time and memory, it might be possible to optimize or transform them into something usable, but that isn’t a luxury we have.

Redirection with goto statements was an escape hatch to make sure that Binary Ninja doesn’t spend an inordinate amount of time and/or memory trying to decompile functions. Unfortunately, we recently discovered that the goto redirection has problems with soundness. Additionally, we discovered some bugs with the data flow accuracy when resolving the condition graph. We were also having issues with bug fixes uncovering new bugs that were masked by the original bug. The decompiler is a large, complex system, and fixing bugs or trying to add improvements was causing difficult regressions in places that were previously working well.

It turns out that condition graph resolution may not be sound when any goto statements are introduced into the code. This means that there can’t be any escape hatches, and you’d effectively need unlimited amounts of time and memory to resolve correct control flow when faced with large, complex modern software like browsers and operating system kernels.

We needed a new approach to control flow resolution. Our goal was to preserve the advantages of the condition graph approach while avoiding the problems with conditional complexity explosion and the issues with keeping the results sound in a performant way.

The Traditional Approach

First we need to look at the traditional approach to decompilation.

Traditional approaches take the control flow graph and iteratively apply a set of rules to collapse patterns in the graph into control flow structures. When it can’t find a rule to apply, a goto statement is usually inserted to simplify the structure of the graph.

Once the graph is resolved into structured control flow, another set of rules is applied to simplify the code into something more readable. The approach looks like the following:

flowchart LR
subgraph cfg [Initial Graph]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph goto [Irreducible, Insert Goto]
gA["if (A)"] -- true --> gB["B"]
gA -- false --> gC["C; goto label"]
gB["if (B)"] -- true --> gD["D"]
gB -- false --> gE["label: E"]
end
subgraph if1 [If/Else Pattern]
iA["if (A)"] -- true --> iB["<div style="text-align: left;"><pre>
if (B) {
    D
} else {
label:
    E
}</pre></div>"]
iA -- false --> iC["C; goto label"]
end
subgraph if2 [If/Else Pattern]
jA["<div style="text-align: left;"><pre>
if (!A) {
    C
    goto label;
} else {
    if (B) {
        D
    } else {
    label:
        E
    }
}</pre></div>"]
end
subgraph simplify [Simplify]
kA["<div style="text-align: left;"><pre>
if (!A) {
    C
    goto label;
}

if (B) {
    D
} else {
label:
    E
}</pre></div>"]
end
cfg --> goto
goto --> if1
if1 --> if2
if2 --> simplify

The advantage of this approach is that it is flexible. The developer can add as many rules as they need to cover new or uncommon patterns found in the latest compilers.

The disadvantage is that you need to maintain a large set of these rules to get good decompilation results. Products using these techniques have been iterating on their control flow resolution rules for decades.

Restructuring the Problem

We would like to preserve the advantages of the condition graph approach, but gain the flexibility and maintainability of the traditional approach.

A key insight into making this possible is that condition graphs are really just another way of transforming the control flow graph. You can re-express the entire condition graph resolution algorithm as a set of transformations on the control flow graph itself. These representations are equivalent:

flowchart LR
subgraph cfg [Control Flow Graph]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph equal [These are Equivalent]
direction LR
subgraph cond [Condition Graph]
direction LR
cX["if (!A)"] --> cC["C"]
cZ["if (!A || !B)"] --> cE["E"]
cY["if (A && B)"] --> cD["D"]
end
subgraph xform [Transformed Control Flow Graph]
direction TB
xX["if (!A)"] -- true --> xC["C"]
xC --> xZ
xX -- false --> xZ
xZ["if (!A || !B)"] -- true --> xE["E"]
xE --> xY
xZ -- false --> xY
xY["if (A && B)"] -- true --> xD["D"]
end
cond <--> xform
end
cfg --> equal

This means that the algorithms from the condition graph resolver can be done on the control flow graph itself, and they can be done selectively. The ability to selectively apply the technique allows us to avoid situations where it can’t be applied while maintaining correct output, and also lets us expand our options on how to resolve graphs.

Representing the problem as graph transformations means that we can make an algorithm that is a superset of both the traditional approach and the condition graph approach, and even allows us to add new approaches that aren’t used in either method.

Overall Algorithm

The overall structure of the new control flow recovery algorithm looks a lot like the existing condition graph approach. The graph is broken into regions with a single entry and a single exit destination, and the graphs that make up those regions are resolved in isolation. The innermost regions are resolved first, allowing the graphs to resolve to be as simple as possible. Loops are resolved in the same way as before, where the loop body is extracted into a subgraph to be resolved.

Once a region is identified for resolution, the algorithm changes significantly. Instead of constructing a condition graph, the new algorithm first creates a subgraph containing the nodes within the region.

The end goal for resolving a region is to make a graph where there are no nodes with more than one incoming edge. Graphs that only have at most one incoming edge for every node are trivially resolvable into control flow structures.

In the example control flow graph we’ve been using, the highlighted node E is one of these nodes with more than one incoming edge. This is the node that would have a goto label inserted when using the traditional approach:

flowchart TB
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
subgraph highlight[" "]
E["`**Node with more than one
incoming edge**
E`"]
end

When nodes have more than one incoming edge, we have a set of transformation primitives that can be applied to the graph to reduce the number of incoming edges. We apply transformations until the goal is reached, then structured High Level IL is emitted for the resulting graph. The region is then replaced with a single code containing the IL.

The `&&` and `||` Transformation

The simplest transformation takes nested conditionals and turns them into a single conditional. Here, node D has two incoming edges and must be resolved. The transformation combines the A and B conditionals into a single node, and now there are no longer nodes with more than one incoming edge:

flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> D
B["if (B)"] -- true --> C
B -- false --> D
end
subgraph output[Output]
direction TB
cX["if (A && B)"] -- true --> cC["C"]
cX -- false --> cD["D"]
end
input --> output

There are four combinations of this pattern, depending on whether the innermost node is reached by the true or false edges. The resulting condition will be either an && or an || depending on which edges are traversed.

Condition Duplication

Condition duplication is effectively a re-representation of the condition graph resolution method using graph transformation. This transformation duplicates one or more conditions to reduce the complexity of the graph:

flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph output[Output]
direction TB
cA["if (A)"] -- true --> cB
cA -- false --> cC["C"]
cC --> cB["if (A && B)"]
cB -- true --> cD["D"]
cB -- false --> cE["E"]
end
input --> output

If you represent the condition graph resolution algorithm from Binary Ninja 4.0 in this new context, this is the only transformation that was available. The rest of the algorithm was trying to simplify the resulting graph.

Notice that there is still a node with more than one incoming edge after performing this transformation. However, now there are two subgraphs that have a single entry and a single exit destination. We can resolve the graph fully with the next transformation.

Sub-region Resolution

After applying transformations to simplify the graph, it is possible to end up with new regions of the graph that have a single entry point and a single exit destination. These regions are resolvable independently. If one of these sub-regions appears in the graph, we extract it into a new graph and resolve it using the same algorithms. The result is a node with structured IL, and this replaces the entire region with a single node. The process then continues with the simplified graph.

flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> C
C --> B["if (A && B)"]
B -- true --> D
B -- false --> E
end
subgraph regions[Regions Identified]
direction TB
subgraph a[" "]
direction TB
cA["if (A)"] -- false --> cC["C"]
end
subgraph b[" "]
direction TB
cB["if (A && B)"] -- true --> cD["D"]
cB -- false --> cE["E"]
end
a --> b
end
subgraph output[Output]
direction TB
oA["<div style="text-align: left;"><pre>
if (!A)
    C</pre></div>"]
oB["<div style="text-align: left;"><pre>
if (A && B)
    D
else
    E</pre></div>"]
oA --> oB
end
input --> regions
regions --> output

Code Duplication

The next transformation duplicates code instead of conditions to reduce graph complexity. While this idea was developed independently, during review (thanks Zion, and make sure to checkout his decompilation wiki² as well as related SAILR work³!) it was pointed out that it is quite similar to prior work⁴ from the rev.ng team. Going back to the graph of a switch case statement with an extra edge to the default case, we can duplicate the code in node H to make a resolvable graph:

flowchart TB
subgraph input[Input]
direction TB
X["if (x > 0)"] -- true --> A["if (x == 4)"]
X -- false --> Y["switch (y)"]
Y --> c0["case 0: A"]
Y --> c1["case 1: B"]
Y --> c2["case 2: C"]
Y --> c3["case 3: D"]
Y --> c4["case 4: E"]
Y --> c5["case 5: F"]
Y --> c6["case 6: G"]
subgraph h[" "]
c7["`default: H
**Problematic incoming
edges from optimizations**`"]
end
Y --> c7
c0 & c1 & c2 & c3 & c4 & c5
A -- true --> c7
end
subgraph output[Output]
direction TB
oX["if (x > 0)"] -- true --> oA["if (x == 4)"]
oX -- false --> oY["switch (y)"]
oY --> oc0["case 0: A"]
oY --> oc1["case 1: B"]
oY --> oc2["case 2: C"]
oY --> oc3["case 3: D"]
oY --> oc4["case 4: E"]
oY --> oc5["case 5: F"]
oY --> oc6["case 6: G"]
subgraph oh[" "]
oc7["default: H"]
oH["H"]
end
oY --> oc7
oA -- true --> oH
end
input --> output

This is a common optimization done by compilers when the source contains duplicated code. This transformation undoes that optimization and produces a clean, easy to read decompilation.

Jump Table Transformation

When lifting jump tables, Binary Ninja will emit IL that contains a switch statement with a flag indicating that the default case can never be reached. This is known by the analysis because it verifies the data flow of the jump table before emitting the IL for it.

Because of the way that compilers emit jump tables, these switch statements are almost always preceded by an if statement that ensures that the jump table will never be accessed out of bounds. This if statement still shows up in the control flow graph, but it often produces extra edges to what should be the default case, as shown below:

flowchart LR
subgraph input[Input]
direction TB
A["if (x >= 5)"] -- true --> E
A -- false --> F["`switch (x)
**Unreachable default case**`"]
F ---> B["case 1: B"]
F ---> C["case 3: C"]
F ---> D["case 4: D"]
F ---> E["case 0, 2: E"]
end
subgraph output[Output]
direction TB
cF["switch (x)"] ---> cB["case 1: B"]
cF ---> cC["case 3: C"]
cF ---> cD["case 4: D"]
cF ---> cE["default: E"]
end
input --> output

This transformation looks for switch statements with the “unreachable default” flag and checks the surrounding conditionals. If there is a node where the edge from the condition covers all possible cases except for the other edges in the switch statement, it rewrites the graph to turn that node into a default case and removes the conditional, as it is now implicit.

Merge Adjacent Code Nodes

After some transformations, it is possible to end up with two blocks of code adjacent in the graph.

flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
B ---> C
A -- false --> D
C ---> E
D ---> E
end
subgraph output[Output]
direction TB
cA["if (A)"] -- true --> cB["B; C"]
cA -- false --> cD["D"]
cB ---> cE["E"]
cD ---> cE
end
input --> output

Nodes B and C in this graph can be combined into a single node containing the code from both. This doesn’t reduce the number of incoming edges, but it does allow the rest of the transformations to be recognized more easily.

Goto Insertion

If there are no transformations that can be reasonably applied to the graph, we fall back to redirecting nodes using goto statements. The goto statement removes an edge from the graph and allows it to be resolved.

flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph output[Output]
direction TB
cA["if (A)"] -- true --> cB
cA -- false --> cC["C; goto label"]
cB["if (B)"] -- true --> cD["D"]
cB -- false --> cE["label: E"]
end
input --> output

This is the least desirable transformation, but may be chosen over others if the other transformations would have significant negative effects. For example, it would be chosen instead of repeated condition duplication transformations if that would cause the condition complexity explosion issue that the condition graph resolution method commonly ran into.

All possible graphs can be transformed using this method to reach the goal of a graph with at most one incoming edge on every node. This allows the algorithm to always complete.

Flexibility for the Future

With this new algorithm, any number of additional transformation primitives can be added to the decompiler. Any type of algorithm can be added, as long as we properly define the rules for when it can be safely applied. As we discover new techniques for resolving control flow, we can add them to the overall algorithm and get the benefits of everything combined.

This gives us the flexibility that the traditional methods provide, but without giving up the advantages of alternative strategies we are already employing or might find in the future.

Which Transformation Should Be Applied?

One significant challenge of this approach is deciding which transformation to apply. In many cases, more than one transformation is available for a given graph. In fact, the goto insertion transformation is always available regardless of whether there are additional transformations that can be applied.

During the early days of development of this new algorithm we will be exploring the heuristics necessary to determine which is the “best” transformation to apply in various cases. This will take time to perfect, and there isn’t necessarily a unique “right answer” to many control flow constructs. We welcome any feedback on control flow recovery that you feel is suboptimal.

Test Suite

When reimplementing a major component of the decompiler, it is important to have some way of knowing if the results are correct and improved. To do this, we implemented a new test generator framework in the Rust programming language using our Rust API.

We made several test generators to test the various control flow constructs found in typical programming languages:

Simple control flow: random, small control flow graphs without loops.
Complex control flow: random, large control flow graphs without loops.
Data flow: random control flow graphs with attempts to break data flow of surrounding conditionals.
Simple loops: random loops with normal entry and normal exits.
Complex loops: random loops with abnormal entries and exits coming to and from different parts of the control flow graph.
Simple switch: random switch statements, using both condition trees and jump tables, with normal entry and exit.
Complex switch: random switch statements, using both condition trees and jump tables, with random extra edges to or from other parts of the graph.

The test generators all directly output Aarch64 machine code to avoid the compiler optimizing the code and reducing the complexity and coverage of the tests. The generated tests are run directly on the CPU, then the known correct result is compared against a Low Level IL, Medium Level IL, and High Level IL simulator that interprets the IL and returns the result.

The tests are designed to trigger any problems with the soundness of control flow recovery and are not designed to be representative of normal code. These tests are what we use to verify that new control flow transformation code produces correct results.

The test framework also outputs various readability statistics to catch any changes to the number of goto statements inserted, the number of for loop constructs recovered, and the number of switch statement cases recovered.

Running this test framework on Binary Ninja 4.0 and comparing to the new algorithm, we see a drastic improvement in accuracy:

xychart-beta horizontal
title "Binary Ninja 4.0 Test Suite Accuracy"
x-axis ["Simple Control", "Complex Control", "Data Flow", "Simple Loops", "Complex Loops", "Simple Switch", "Complex Switch"]
y-axis "Accuracy %" 0 --> 100
bar [99.2, 75.7, 76.4, 98.7, 41.2, 91.9, 45.6]

xychart-beta horizontal
title "New Graph Transformation Resolver Test Suite Accuracy"
x-axis ["Simple Control", "Complex Control", "Data Flow", "Simple Loops", "Complex Loops", "Simple Switch", "Complex Switch"]
y-axis "Accuracy %" 0 --> 100
bar [100, 100, 100, 100, 99.8, 100, 100]

It is worth stating again that the test generator is in no way representative of normal code. It is specifically designed to find weaknesses in control flow recovery algorithms, and the 7000 test cases generated are very effective at uncovering bugs in the old algorithm. Normal, everyday code is not going to trigger these bugs so effectively.

Results

We wrote the test suite before starting work on the new control flow resolution algorithm. The first test was to implement the simplest possible version of the graph transformation strategy: use only the goto insertion transformation and nothing else. This yielded 100% accuracy on the tests, and we had a baseline to work from. Improvements to control flow recovery could now be verified as we made them.

We quickly improved the output, making changes that had been previously discarded in earlier iterations of the decompiler because of difficulties in making them work well with the rest of the system. The results were striking in some cases.

Example 1

Take the following function from a challenge from the Cyber Apocalypse CTF 2022. This is the output from Binary Ninja 4.0:

if (allocated == 0xf)
    puts(str: "Nothing more!")
else
    allocated += 1
    printf(format: "Choose an index: ")
    int64_t index
    __isoc99_scanf(format: "%lu", &index)

    if (weapons[index].detail_size != 0 || weapons[index].details != 0 || index u> 0xe)
        puts(str: "[-] Invalid!")
    else
        printf(format: "\nHow much space do you need for it: ")
        uint64_t n
        __isoc99_scanf(format: "%lu", &n)

        if (n u<= 0x1f || n u> 0x38)
            puts(str: "[-] Your inventory cannot provide this type of space!")
        else
            weapons[index].detail_size = n
            weapons[index].details = calloc(n, elem_size: 1)

        if (weapons[index].details == 0)
            puts(str: "[-] Something didn't work out...")
            exit(status: 0xffffffff)
            noreturn

        puts(str: "Input your weapon's details: ")
        read(fd: 0, buf: weapons[index].details, nbytes: n + 1)

Here is the same function with the new graph transformation algorithm:

if (allocated == 0xf)
    puts(str: "Nothing more!")
    return

allocated += 1
printf(format: "Choose an index: ")
int64_t index
__isoc99_scanf(format: "%lu", &index)

if (weapons[index].detail_size != 0 || weapons[index].details != 0 || index u> 0xe)
    puts(str: "[-] Invalid!")
    return

printf(format: "\nHow much space do you need for it: ")
uint64_t n
__isoc99_scanf(format: "%lu", &n)

if (n u<= 0x1f || n u> 0x38)
    puts(str: "[-] Your inventory cannot provide this type of space!")
    return

weapons[index].detail_size = n
weapons[index].details = calloc(n, elem_size: 1)

if (weapons[index].details == 0)
    puts(str: "[-] Something didn't work out...")
    exit(status: 0xffffffff)
    noreturn

puts(str: "Input your weapon's details: ")
read(fd: 0, buf: weapons[index].details, nbytes: n + 1)

This reads like what you’d expect out of the original source code.

Example 2

Let’s look at another example, this time from WinMain in a 32-bit Windows application. Here is the output from Binary Ninja 4.0:

void lpStartupInfo
GetStartupInfoA(&lpStartupInfo)
int32_t pNumArgs
PWSTR* eax_1 = CommandLineToArgvW(lpCmdLine: GetCommandLineW(), &pNumArgs)
data_40c2c8 = GetModuleHandleA(lpModuleName: nullptr)
int32_t result

if (sub_404691() == 0)
    MessageBoxA(hWnd: nullptr, lpText: "Unable to register window classe…", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
    result = 1
else
    char var_24
    enum SHOW_WINDOW_CMD eax_4
    int16_t var_20

    if ((var_24 & 1) == 0)
        eax_4 = SW_SHOWDEFAULT
    else
        eax_4 = zx.d(var_20)

    if (sub_40474f(eax_4, pNumArgs, eax_1) != 0)
        HACCEL hAccTable = LoadAcceleratorsA(hInstance: data_40c2c8, lpTableName: 0x66)

        while (true)
            void lpMsg

            if (GetMessageA(&lpMsg, hWnd: nullptr, wMsgFilterMin: 0, wMsgFilterMax: 0) == 0)
                break

            if (TranslateAcceleratorA(hWnd: data_40c2c4, hAccTable, &lpMsg) == 0)
                TranslateMessage(&lpMsg)
                DispatchMessageA(&lpMsg)

        uint32_t uExitCode
        ExitProcess(uExitCode)
        noreturn

    MessageBoxA(hWnd: nullptr, lpText: "Unable to create main window.", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
    result = 0

return result

Here is the same WinMain function with the new algorithm:

void lpStartupInfo
GetStartupInfoA(&lpStartupInfo)
int32_t pNumArgs
PWSTR* eax_1 = CommandLineToArgvW(lpCmdLine: GetCommandLineW(), &pNumArgs)
data_40c2c8 = GetModuleHandleA(lpModuleName: nullptr)

if (sub_404691() == 0)
    MessageBoxA(hWnd: nullptr, lpText: "Unable to register window classe…", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
    return 1

char var_24
enum SHOW_WINDOW_CMD eax_5
int16_t var_20

if ((var_24 & 1) == 0)
    eax_5 = SW_SHOWDEFAULT
else
    eax_5 = zx.d(var_20)

if (sub_40474f(eax_5, pNumArgs, eax_1) == 0)
    MessageBoxA(hWnd: nullptr, lpText: "Unable to create main window.", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
    return 0

HACCEL hAccTable = LoadAcceleratorsA(hInstance: data_40c2c8, lpTableName: 0x66)

while (true)
    void lpMsg

    if (GetMessageA(&lpMsg, hWnd: nullptr, wMsgFilterMin: 0, wMsgFilterMax: 0) == 0)
        break

    if (TranslateAcceleratorA(hWnd: data_40c2c4, hAccTable, &lpMsg) == 0)
        TranslateMessage(&lpMsg)
        DispatchMessageA(&lpMsg)

uint32_t uExitCode
ExitProcess(uExitCode)
noreturn

As you can see, the new algorithm tends to produce much more natural output, looking more like what you’d expect to see in the original source code.

The Current State

The various transformations described in this post are implemented, but may not be optimal yet. As such, we may still have some regressions in decompiler output quality, producing goto statements where there weren’t any before.

However, we now have several transformations that were not available in Binary Ninja 4.0 that can significantly improve output. There is also a drastic reduction in the number of functions that don’t successfully decompile, and there shouldn’t be many, if any, warnings about conditions left in the logs.

If you’re curious if the new algorithm is already better than the previous one for your workflow, you can try it today by switching to the development channel.

The new algorithm will be improved up to the Binary Ninja 4.1 release and beyond. It is now the new default control flow recovery algorithm in 4.1, taking care of several accuracy bugs in the process.

The Future

We now have a much better framework for improving decompiler output quality going forward. We can be more confident that our changes don’t cause regressions in accuracy, and have measurements to check for improvements in readability. However, it’s hard to know what control flow constructs are the most important to cleanly resolve next. If you find some goto statements that you don’t think should be there (or optimally, you have source code to show the original construct), or see control flow that is hard to understand, let us know on Slack or by submitting an issue on GitHub.

The following resources may be helpful for understanding additional approaches to decompilation, provided motivating examples, or directly inspired work described here. Additionally, one of the primary motivations for this improvement was a privately reported decompilation flaw from Zao Yang and Dr. Stefan Nagy of the FuTURES³ Lab. Keep an eye on their forthcoming research and we’re grateful for their notification!

文章来源: https://binary.ninja/2024/06/19/restructuring-the-decompiler.html
如有侵权请联系:admin#unsafe.sh