For the upcoming Binary Ninja 4.1, we will be releasing a new implementation of our decompiler’s control flow recovery. You can try it today by switching to the development channel and updating to the latest build. It isn’t fully optimized yet and may produce non-optimal results in some cases, but there should already be improvements in readability of the output, including a reduction in nesting depth and a significant reduction in the complexity of conditional expressions.
This new implementation aims to improve the readability of the decompiler output while simultaneously improving accuracy. It also aims to significantly improve maintainability, allowing us to iterate on our decompiler faster. We have additionally added a new suite of tests to allow us to make changes to the decompiler and have more confidence that the changes haven’t caused regressions in accuracy.
In Binary Ninja up to and including version 4.0, we use an algorithm we call condition graph resolution in order to take an arbitrary control flow graph and turn it into High Level IL (which can then be displayed as C code if you prefer). This algorithm is similar in nature to the approach taken by the No More Gotos paper1 describing the Dream decompiler.
Given a control flow graph, Binary Ninja first tries to find regions of code that have a single entry point and a single exit destination. It traverses the graph in a way that causes the innermost structure of the graph to be resolved first, simplifying regions into blocks of structured IL. A region of a graph looks like the highlighted region below:
flowchart TB
X["if (X)"] -- true --> Y
X -- false --> Z
Y --> Z
Z --> A
subgraph cfg[" "]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
D & E --> F
Loops are resolved by treating the loop body as a new subgraph, and resolving the subgraph in the same way. If the loop has abnormal entries or exits, there are rules for transforming the graph into one that can be resolved.
Often there will be a region of code that can’t be trivially resolved into a
simple set of if
conditions. In this case, Binary Ninja will take the nodes that
make up the region and compute the condition required to reach each node. The
nodes are traversed in a way to preserve ordering of the program. The result is
called the condition graph. An example is below:
flowchart LR
subgraph cfg [Control Flow Graph]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph cond [Condition Graph]
direction LR
cX["if (!A)"] --> cC["C"]
cZ["if (!A || !B)"] --> cE["E"]
cY["if (A && B)"] --> cD["D"]
end
subgraph ifelse [If/Else Detection]
direction LR
iX["if (!A)"] --> iC["C"]
iY["if (!A || !B)"] -- true --> iE["E"]
iY -- false --> iD["D"]
end
subgraph result [Resolution]
R["<div style="text-align: left;"><pre>
if (!A)
C
if (!A || !B)
E
else
D</pre></div>"]
end
cfg --> cond
cond --> ifelse
ifelse --> result
Once the condition graph is constructed, rules are applied to it to simplify the graph. This includes merging common conditions, detecting if/else constructs, and resolving switch statements. Once there are no longer optimizations that can be applied, Binary Ninja turns the condition graph into High Level IL, and replaces the entire region of the graph with a single node containing that High Level IL.
You can watch the control flow resolution process in detail for any function by entering the following into the Python console in Binary Ninja:
current_function.request_debug_report("high_level_il")
This will display a detailed report of how the decompiler resolved the function, including graphs for each step of the process.
Traditional approaches to control flow resolution struggle to handle
control flow graphs that the compiler has optimized to share code.
The condition graph resolver is designed to try and avoid falling
back to goto
statements to resolve optimized graphs, producing code
that is more natural.
Even the example control flow graph above shows the advantages of using condition graphs to resolve control flow. Binary Ninja 4.0 and prior give results that are consistent with what a typical programmer might write when asked to write C code for the graph:
if (!A) {
C
}
if (!A || !B) {
E
} else {
D
}
When a compiler comes across such a structure, it might see the condition
!A
is shared between two of the if
statements, and optimize that into
a single comparison. Binary Ninja uses the condition graph to recreate the
original set of conditions, undoing the compiler’s optimization.
The following is what the other major decompilers, which use more traditional methods, will produce for such a construct:
if (!A) {
C
goto label;
}
if (B) {
D
} else {
label:
E
}
Traditional methods tend to struggle when compilers optimize duplicated
code away and leave a single shared piece of code. Without undoing the
compiler’s optimization, the decompiler can’t produce valid code without
using a goto
statement.
Condition graphs are not without their own problems, however.
When faced with code where the compiler deduplicated code, not a
condition, or when the original author used a goto
statement to
manually combine two or more paths, the condition graph resolver
can have trouble producing readable conditions. Take the following graph
of a switch statement with an extra edge coming in from elsewhere into
the block of code making up the default case:
flowchart TB
X["if (x > 0)"] -- true --> A["if (x == 4)"]
X -- false --> Y["switch (y)"]
Y --> c0["case 0: A"]
Y --> c1["case 1: B"]
Y --> c2["case 2: C"]
Y --> c3["case 3: D"]
Y --> c4["case 4: E"]
Y --> c5["case 5: F"]
Y --> c6["case 6: G"]
Y --> c7["default: H"]
c0 & c1 & c2 & c3 & c4 & c5 --> I
A -- true --> c7
c7 --> I
When generating the condition for H
in the default case, the condition
graph produces a large condition:
x == 4 || (x <= 0 && y != 0 && y != 1 && y != 2 && y != 3 && y != 4 && y != 5 && y != 6)
The more cases there are, the larger the condition. If the switch statement
is enclosed in another if
statement, the conditions compound and produce
an extremely hard to read condition. Large, complex software with optimizations
often creates complex graphs that yield conditions that are orders of magnitude
worse than the one shown here.
Binary Ninja has heuristics built in to try and avoid producing these large
conditions when possible. In this case, it will redirect the if (x == 4)
node with a goto
statement to reduce the graph to use trivial conditions.
Sometimes the graph is so complex that the condition graph resolver struggles to make sense of it all. When you see warnings like the following in the log, Binary Ninja has encountered this issue:
[Default] Function 0x31d5684 has HLIL condition that exceeds maximum reduction iterations, condition not optimized (step 7)
[Default] Function 0x321244 at 0x3212d4 exceeds 'analysis.hlil.maxIntermediateConditionComplexity' limit with: 1552969, omitting condition (step 45)
We have an internal condition complexity measurement that measures the size of a condition, and we call these issues “condition complexity explosion” as it tends to be exponential in nature. Given unlimited time and memory, it might be possible to optimize or transform them into something usable, but that isn’t a luxury we have.
Redirection with goto
statements was an escape hatch to make sure
that Binary Ninja doesn’t spend an inordinate amount of time and/or memory
trying to decompile functions. Unfortunately, we recently discovered
that the goto
redirection has problems with soundness. Additionally, we discovered
some bugs
with the data flow accuracy when resolving the condition graph.
We were also having issues with bug fixes uncovering new bugs that
were masked by the original bug. The decompiler is a large, complex system, and
fixing bugs or trying to add improvements was causing difficult regressions in
places that were previously working well.
It turns out that condition graph resolution may not be sound when any
goto
statements are introduced into the code. This means that there can’t
be any escape hatches, and you’d effectively need unlimited amounts of time
and memory to resolve correct control flow when faced with large, complex
modern software like browsers and operating system kernels.
We needed a new approach to control flow resolution. Our goal was to preserve the advantages of the condition graph approach while avoiding the problems with conditional complexity explosion and the issues with keeping the results sound in a performant way.
First we need to look at the traditional approach to decompilation.
Traditional approaches take the control flow graph and iteratively apply
a set of rules to collapse patterns in the graph into control flow
structures. When it can’t find a rule to apply, a goto
statement is usually
inserted to simplify the structure of the graph.
Once the graph is resolved into structured control flow, another set of rules is applied to simplify the code into something more readable. The approach looks like the following:
flowchart LR
subgraph cfg [Initial Graph]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph goto [Irreducible, Insert Goto]
gA["if (A)"] -- true --> gB["B"]
gA -- false --> gC["C; goto label"]
gB["if (B)"] -- true --> gD["D"]
gB -- false --> gE["label: E"]
end
subgraph if1 [If/Else Pattern]
iA["if (A)"] -- true --> iB["<div style="text-align: left;"><pre>
if (B) {
D
} else {
label:
E
}</pre></div>"]
iA -- false --> iC["C; goto label"]
end
subgraph if2 [If/Else Pattern]
jA["<div style="text-align: left;"><pre>
if (!A) {
C
goto label;
} else {
if (B) {
D
} else {
label:
E
}
}</pre></div>"]
end
subgraph simplify [Simplify]
kA["<div style="text-align: left;"><pre>
if (!A) {
C
goto label;
}
if (B) {
D
} else {
label:
E
}</pre></div>"]
end
cfg --> goto
goto --> if1
if1 --> if2
if2 --> simplify
The advantage of this approach is that it is flexible. The developer can add as many rules as they need to cover new or uncommon patterns found in the latest compilers.
The disadvantage is that you need to maintain a large set of these rules to get good decompilation results. Products using these techniques have been iterating on their control flow resolution rules for decades.
We would like to preserve the advantages of the condition graph approach, but gain the flexibility and maintainability of the traditional approach.
A key insight into making this possible is that condition graphs are really just another way of transforming the control flow graph. You can re-express the entire condition graph resolution algorithm as a set of transformations on the control flow graph itself. These representations are equivalent:
flowchart LR
subgraph cfg [Control Flow Graph]
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph equal [These are Equivalent]
direction LR
subgraph cond [Condition Graph]
direction LR
cX["if (!A)"] --> cC["C"]
cZ["if (!A || !B)"] --> cE["E"]
cY["if (A && B)"] --> cD["D"]
end
subgraph xform [Transformed Control Flow Graph]
direction TB
xX["if (!A)"] -- true --> xC["C"]
xC --> xZ
xX -- false --> xZ
xZ["if (!A || !B)"] -- true --> xE["E"]
xE --> xY
xZ -- false --> xY
xY["if (A && B)"] -- true --> xD["D"]
end
cond <--> xform
end
cfg --> equal
This means that the algorithms from the condition graph resolver can be done on the control flow graph itself, and they can be done selectively. The ability to selectively apply the technique allows us to avoid situations where it can’t be applied while maintaining correct output, and also lets us expand our options on how to resolve graphs.
Representing the problem as graph transformations means that we can make an algorithm that is a superset of both the traditional approach and the condition graph approach, and even allows us to add new approaches that aren’t used in either method.
The overall structure of the new control flow recovery algorithm looks a lot like the existing condition graph approach. The graph is broken into regions with a single entry and a single exit destination, and the graphs that make up those regions are resolved in isolation. The innermost regions are resolved first, allowing the graphs to resolve to be as simple as possible. Loops are resolved in the same way as before, where the loop body is extracted into a subgraph to be resolved.
Once a region is identified for resolution, the algorithm changes significantly. Instead of constructing a condition graph, the new algorithm first creates a subgraph containing the nodes within the region.
The end goal for resolving a region is to make a graph where there are no nodes with more than one incoming edge. Graphs that only have at most one incoming edge for every node are trivially resolvable into control flow structures.
In the example control flow graph we’ve been using, the highlighted node E
is one of
these nodes with more than one incoming edge. This is the node that would have a
goto
label inserted when using the traditional approach:
flowchart TB
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
subgraph highlight[" "]
E["`**Node with more than one
incoming edge**
E`"]
end
When nodes have more than one incoming edge, we have a set of transformation primitives that can be applied to the graph to reduce the number of incoming edges. We apply transformations until the goal is reached, then structured High Level IL is emitted for the resulting graph. The region is then replaced with a single code containing the IL.
&&
and ||
TransformationThe simplest transformation takes nested conditionals and turns them into
a single conditional. Here, node D
has two incoming edges and must be
resolved. The transformation combines the A
and B
conditionals into
a single node, and now there are no longer nodes with more than one
incoming edge:
flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> D
B["if (B)"] -- true --> C
B -- false --> D
end
subgraph output[Output]
direction TB
cX["if (A && B)"] -- true --> cC["C"]
cX -- false --> cD["D"]
end
input --> output
There are four combinations of this pattern, depending on whether the innermost
node is reached by the true
or false
edges. The resulting condition will be
either an &&
or an ||
depending on which edges are traversed.
Condition duplication is effectively a re-representation of the condition graph resolution method using graph transformation. This transformation duplicates one or more conditions to reduce the complexity of the graph:
flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph output[Output]
direction TB
cA["if (A)"] -- true --> cB
cA -- false --> cC["C"]
cC --> cB["if (A && B)"]
cB -- true --> cD["D"]
cB -- false --> cE["E"]
end
input --> output
If you represent the condition graph resolution algorithm from Binary Ninja 4.0 in this new context, this is the only transformation that was available. The rest of the algorithm was trying to simplify the resulting graph.
Notice that there is still a node with more than one incoming edge after performing this transformation. However, now there are two subgraphs that have a single entry and a single exit destination. We can resolve the graph fully with the next transformation.
After applying transformations to simplify the graph, it is possible to end up with new regions of the graph that have a single entry point and a single exit destination. These regions are resolvable independently. If one of these sub-regions appears in the graph, we extract it into a new graph and resolve it using the same algorithms. The result is a node with structured IL, and this replaces the entire region with a single node. The process then continues with the simplified graph.
flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> C
C --> B["if (A && B)"]
B -- true --> D
B -- false --> E
end
subgraph regions[Regions Identified]
direction TB
subgraph a[" "]
direction TB
cA["if (A)"] -- false --> cC["C"]
end
subgraph b[" "]
direction TB
cB["if (A && B)"] -- true --> cD["D"]
cB -- false --> cE["E"]
end
a --> b
end
subgraph output[Output]
direction TB
oA["<div style="text-align: left;"><pre>
if (!A)
C</pre></div>"]
oB["<div style="text-align: left;"><pre>
if (A && B)
D
else
E</pre></div>"]
oA --> oB
end
input --> regions
regions --> output
The next transformation duplicates code instead of conditions to reduce graph complexity.
While this idea was developed independently, during review (thanks Zion, and make sure to checkout his decompilation wiki2 as well as related SAILR work3!) it was pointed out that it is quite similar to prior work4 from the rev.ng team.
Going back to the graph of a switch case statement with
an extra edge to the default case, we can duplicate the code in node H
to make a resolvable graph:
flowchart TB
subgraph input[Input]
direction TB
X["if (x > 0)"] -- true --> A["if (x == 4)"]
X -- false --> Y["switch (y)"]
Y --> c0["case 0: A"]
Y --> c1["case 1: B"]
Y --> c2["case 2: C"]
Y --> c3["case 3: D"]
Y --> c4["case 4: E"]
Y --> c5["case 5: F"]
Y --> c6["case 6: G"]
subgraph h[" "]
c7["`default: H
**Problematic incoming
edges from optimizations**`"]
end
Y --> c7
c0 & c1 & c2 & c3 & c4 & c5
A -- true --> c7
end
subgraph output[Output]
direction TB
oX["if (x > 0)"] -- true --> oA["if (x == 4)"]
oX -- false --> oY["switch (y)"]
oY --> oc0["case 0: A"]
oY --> oc1["case 1: B"]
oY --> oc2["case 2: C"]
oY --> oc3["case 3: D"]
oY --> oc4["case 4: E"]
oY --> oc5["case 5: F"]
oY --> oc6["case 6: G"]
subgraph oh[" "]
oc7["default: H"]
oH["H"]
end
oY --> oc7
oA -- true --> oH
end
input --> output
This is a common optimization done by compilers when the source contains duplicated code. This transformation undoes that optimization and produces a clean, easy to read decompilation.
When lifting jump tables, Binary Ninja will emit IL that contains a
switch
statement with a flag indicating that the default case can never
be reached. This is known by the analysis because it verifies the data
flow of the jump table before emitting the IL for it.
Because of the way that compilers emit jump tables, these switch
statements
are almost always preceded by an if
statement that ensures that the jump
table will never be accessed out of bounds. This if
statement still shows
up in the control flow graph, but it often produces extra edges to what
should be the default case, as shown below:
flowchart LR
subgraph input[Input]
direction TB
A["if (x >= 5)"] -- true --> E
A -- false --> F["`switch (x)
**Unreachable default case**`"]
F ---> B["case 1: B"]
F ---> C["case 3: C"]
F ---> D["case 4: D"]
F ---> E["case 0, 2: E"]
end
subgraph output[Output]
direction TB
cF["switch (x)"] ---> cB["case 1: B"]
cF ---> cC["case 3: C"]
cF ---> cD["case 4: D"]
cF ---> cE["default: E"]
end
input --> output
This transformation looks for switch
statements with the “unreachable default”
flag and checks the surrounding conditionals. If there is a node where the edge
from the condition covers all possible cases except for the other edges in the
switch
statement, it rewrites the graph to turn that node into a default
case
and removes the conditional, as it is now implicit.
After some transformations, it is possible to end up with two blocks of code adjacent in the graph.
flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
B ---> C
A -- false --> D
C ---> E
D ---> E
end
subgraph output[Output]
direction TB
cA["if (A)"] -- true --> cB["B; C"]
cA -- false --> cD["D"]
cB ---> cE["E"]
cD ---> cE
end
input --> output
Nodes B
and C
in this graph can be combined into a single node containing
the code from both. This doesn’t reduce the number of incoming edges, but it does
allow the rest of the transformations to be recognized more easily.
If there are no transformations that can be reasonably applied to the graph, we
fall back to redirecting nodes using goto
statements. The goto
statement removes
an edge from the graph and allows it to be resolved.
flowchart LR
subgraph input[Input]
direction TB
A["if (A)"] -- true --> B
A -- false --> C
B["if (B)"] -- true --> D
B -- false --> E
C --> E
end
subgraph output[Output]
direction TB
cA["if (A)"] -- true --> cB
cA -- false --> cC["C; goto label"]
cB["if (B)"] -- true --> cD["D"]
cB -- false --> cE["label: E"]
end
input --> output
This is the least desirable transformation, but may be chosen over others if the other transformations would have significant negative effects. For example, it would be chosen instead of repeated condition duplication transformations if that would cause the condition complexity explosion issue that the condition graph resolution method commonly ran into.
All possible graphs can be transformed using this method to reach the goal of a graph with at most one incoming edge on every node. This allows the algorithm to always complete.
With this new algorithm, any number of additional transformation primitives can be added to the decompiler. Any type of algorithm can be added, as long as we properly define the rules for when it can be safely applied. As we discover new techniques for resolving control flow, we can add them to the overall algorithm and get the benefits of everything combined.
This gives us the flexibility that the traditional methods provide, but without giving up the advantages of alternative strategies we are already employing or might find in the future.
One significant challenge of this approach is deciding which transformation to apply. In many cases, more than one transformation is available for a given graph. In fact, the goto insertion transformation is always available regardless of whether there are additional transformations that can be applied.
During the early days of development of this new algorithm we will be exploring the heuristics necessary to determine which is the “best” transformation to apply in various cases. This will take time to perfect, and there isn’t necessarily a unique “right answer” to many control flow constructs. We welcome any feedback on control flow recovery that you feel is suboptimal.
When reimplementing a major component of the decompiler, it is important to have some way of knowing if the results are correct and improved. To do this, we implemented a new test generator framework in the Rust programming language using our Rust API.
We made several test generators to test the various control flow constructs found in typical programming languages:
The test generators all directly output Aarch64 machine code to avoid the compiler optimizing the code and reducing the complexity and coverage of the tests. The generated tests are run directly on the CPU, then the known correct result is compared against a Low Level IL, Medium Level IL, and High Level IL simulator that interprets the IL and returns the result.
The tests are designed to trigger any problems with the soundness of control flow recovery and are not designed to be representative of normal code. These tests are what we use to verify that new control flow transformation code produces correct results.
The test framework also outputs various readability statistics to catch any
changes to the number of goto
statements inserted, the number of for
loop
constructs recovered, and the number of switch
statement cases recovered.
Running this test framework on Binary Ninja 4.0 and comparing to the new algorithm, we see a drastic improvement in accuracy:
xychart-beta horizontal
title "Binary Ninja 4.0 Test Suite Accuracy"
x-axis ["Simple Control", "Complex Control", "Data Flow", "Simple Loops", "Complex Loops", "Simple Switch", "Complex Switch"]
y-axis "Accuracy %" 0 --> 100
bar [99.2, 75.7, 76.4, 98.7, 41.2, 91.9, 45.6]
xychart-beta horizontal
title "New Graph Transformation Resolver Test Suite Accuracy"
x-axis ["Simple Control", "Complex Control", "Data Flow", "Simple Loops", "Complex Loops", "Simple Switch", "Complex Switch"]
y-axis "Accuracy %" 0 --> 100
bar [100, 100, 100, 100, 99.8, 100, 100]
It is worth stating again that the test generator is in no way representative of normal code. It is specifically designed to find weaknesses in control flow recovery algorithms, and the 7000 test cases generated are very effective at uncovering bugs in the old algorithm. Normal, everyday code is not going to trigger these bugs so effectively.
We wrote the test suite before starting work on the new control flow resolution algorithm. The first test was to implement the simplest possible version of the graph transformation strategy: use only the goto insertion transformation and nothing else. This yielded 100% accuracy on the tests, and we had a baseline to work from. Improvements to control flow recovery could now be verified as we made them.
We quickly improved the output, making changes that had been previously discarded in earlier iterations of the decompiler because of difficulties in making them work well with the rest of the system. The results were striking in some cases.
Take the following function from a challenge from the Cyber Apocalypse CTF 2022. This is the output from Binary Ninja 4.0:
if (allocated == 0xf)
puts(str: "Nothing more!")
else
allocated += 1
printf(format: "Choose an index: ")
int64_t index
__isoc99_scanf(format: "%lu", &index)
if (weapons[index].detail_size != 0 || weapons[index].details != 0 || index u> 0xe)
puts(str: "[-] Invalid!")
else
printf(format: "\nHow much space do you need for it: ")
uint64_t n
__isoc99_scanf(format: "%lu", &n)
if (n u<= 0x1f || n u> 0x38)
puts(str: "[-] Your inventory cannot provide this type of space!")
else
weapons[index].detail_size = n
weapons[index].details = calloc(n, elem_size: 1)
if (weapons[index].details == 0)
puts(str: "[-] Something didn't work out...")
exit(status: 0xffffffff)
noreturn
puts(str: "Input your weapon's details: ")
read(fd: 0, buf: weapons[index].details, nbytes: n + 1)
Here is the same function with the new graph transformation algorithm:
if (allocated == 0xf)
puts(str: "Nothing more!")
return
allocated += 1
printf(format: "Choose an index: ")
int64_t index
__isoc99_scanf(format: "%lu", &index)
if (weapons[index].detail_size != 0 || weapons[index].details != 0 || index u> 0xe)
puts(str: "[-] Invalid!")
return
printf(format: "\nHow much space do you need for it: ")
uint64_t n
__isoc99_scanf(format: "%lu", &n)
if (n u<= 0x1f || n u> 0x38)
puts(str: "[-] Your inventory cannot provide this type of space!")
return
weapons[index].detail_size = n
weapons[index].details = calloc(n, elem_size: 1)
if (weapons[index].details == 0)
puts(str: "[-] Something didn't work out...")
exit(status: 0xffffffff)
noreturn
puts(str: "Input your weapon's details: ")
read(fd: 0, buf: weapons[index].details, nbytes: n + 1)
This reads like what you’d expect out of the original source code.
Let’s look at another example, this time from WinMain
in a 32-bit Windows
application. Here is the output from Binary Ninja 4.0:
void lpStartupInfo
GetStartupInfoA(&lpStartupInfo)
int32_t pNumArgs
PWSTR* eax_1 = CommandLineToArgvW(lpCmdLine: GetCommandLineW(), &pNumArgs)
data_40c2c8 = GetModuleHandleA(lpModuleName: nullptr)
int32_t result
if (sub_404691() == 0)
MessageBoxA(hWnd: nullptr, lpText: "Unable to register window classe…", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
result = 1
else
char var_24
enum SHOW_WINDOW_CMD eax_4
int16_t var_20
if ((var_24 & 1) == 0)
eax_4 = SW_SHOWDEFAULT
else
eax_4 = zx.d(var_20)
if (sub_40474f(eax_4, pNumArgs, eax_1) != 0)
HACCEL hAccTable = LoadAcceleratorsA(hInstance: data_40c2c8, lpTableName: 0x66)
while (true)
void lpMsg
if (GetMessageA(&lpMsg, hWnd: nullptr, wMsgFilterMin: 0, wMsgFilterMax: 0) == 0)
break
if (TranslateAcceleratorA(hWnd: data_40c2c4, hAccTable, &lpMsg) == 0)
TranslateMessage(&lpMsg)
DispatchMessageA(&lpMsg)
uint32_t uExitCode
ExitProcess(uExitCode)
noreturn
MessageBoxA(hWnd: nullptr, lpText: "Unable to create main window.", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
result = 0
return result
Here is the same WinMain
function with the new algorithm:
void lpStartupInfo
GetStartupInfoA(&lpStartupInfo)
int32_t pNumArgs
PWSTR* eax_1 = CommandLineToArgvW(lpCmdLine: GetCommandLineW(), &pNumArgs)
data_40c2c8 = GetModuleHandleA(lpModuleName: nullptr)
if (sub_404691() == 0)
MessageBoxA(hWnd: nullptr, lpText: "Unable to register window classe…", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
return 1
char var_24
enum SHOW_WINDOW_CMD eax_5
int16_t var_20
if ((var_24 & 1) == 0)
eax_5 = SW_SHOWDEFAULT
else
eax_5 = zx.d(var_20)
if (sub_40474f(eax_5, pNumArgs, eax_1) == 0)
MessageBoxA(hWnd: nullptr, lpText: "Unable to create main window.", lpCaption: "Initialization Failed", uType: MB_ICONHAND)
return 0
HACCEL hAccTable = LoadAcceleratorsA(hInstance: data_40c2c8, lpTableName: 0x66)
while (true)
void lpMsg
if (GetMessageA(&lpMsg, hWnd: nullptr, wMsgFilterMin: 0, wMsgFilterMax: 0) == 0)
break
if (TranslateAcceleratorA(hWnd: data_40c2c4, hAccTable, &lpMsg) == 0)
TranslateMessage(&lpMsg)
DispatchMessageA(&lpMsg)
uint32_t uExitCode
ExitProcess(uExitCode)
noreturn
As you can see, the new algorithm tends to produce much more natural output, looking more like what you’d expect to see in the original source code.
The various transformations described in this post are implemented, but may not be
optimal yet. As such, we may still have some regressions in decompiler output quality,
producing goto
statements where there weren’t any before.
However, we now have several transformations that were not available in Binary Ninja 4.0 that can significantly improve output. There is also a drastic reduction in the number of functions that don’t successfully decompile, and there shouldn’t be many, if any, warnings about conditions left in the logs.
If you’re curious if the new algorithm is already better than the previous one for your workflow, you can try it today by switching to the development channel.
The new algorithm will be improved up to the Binary Ninja 4.1 release and beyond. It is now the new default control flow recovery algorithm in 4.1, taking care of several accuracy bugs in the process.
We now have a much better framework for improving decompiler output quality going
forward. We can be more confident that our changes don’t cause regressions in accuracy,
and have measurements to check for improvements in readability. However, it’s hard to
know what control flow constructs are the most important to cleanly resolve next. If
you find some goto
statements that you don’t think should be there (or optimally, you
have source code to show the original construct), or see control flow that is hard to
understand, let us know on Slack or by submitting an issue on GitHub.
The following resources may be helpful for understanding additional approaches to decompilation, provided motivating examples, or directly inspired work described here. Additionally, one of the primary motivations for this improvement was a privately reported decompilation flaw from Zao Yang and Dr. Stefan Nagy of the FuTURES³ Lab. Keep an eye on their forthcoming research and we’re grateful for their notification!