Introducing Tantō: Slicing Functions for Faster Finds
2022-6-20 21:37:0 Author: binary.ninja(查看原文) 阅读量:26 收藏

Not everything in a function is equally important. Sometimes, especially with large functions, you want a way to hide all that extra conditional cruft so you can focus on just the execution path that matters to you. Enter Tantō: a brand-new official plugin for Binary Ninja that splits functions into smaller chunks (or “slices”) to help you understand functions faster.

Example of slicing a function with Tantō

What’s a slice?

Slices, for our purposes, are similar to function traces, but broader in context: they need not represent a single linear flow of instructions, a real instruction path, or even contain all the instructions in a similar execution trace.

Tantō provides two types of function slices: block slices and variable slices.

A block slice is the combined graph of all traces through a function that reach a given basic block. A variable slice is similar, but takes some extra state into account. We’ll cover block slices first as they provide a simpler illustration of what Tantō is doing.

Block slices

Block slice of sh_unset_nodelay_mode from Bash 5.0.0(1)

I’ll be using Bash version 5.0.0(1) (, ) for all the screenshots in this blog post; it’s a decent-sized open-source binary that provides good coverage of our analysis pipeline. In the screenshot above, you can see the original function from the binary on the left (sh_unset_nodelay_mode), and a possible block slice from it on the right.

In the original control flow graph (CFG, on left), there is a selected “target” block highlighted in green. Every basic block in the function that could flow through that target block is highlighted in blue. Basic blocks that can’t reach, or are unreachable from, the target block are then highlighted in red.

On the right is the same function, represented as a reduced-CFG (or slice), that has had the basic blocks highlighted in red on the left pruned. This causes Binary Ninja’s flowgraph algorithm to change the layout of the function, so the target block is highlighted in green (as it is on the left) as a visual aid.

This is just about the most basic function slice imaginable, but I hope it illustrates the basic premise. There are, of course, more creative function slices that can accelerate day-to-day reverse engineering tasks. One possible use I’d like to highlight is reducing a large switch statement (top) to the single case you want to analyze (bottom).

Block slice of print_redirection from Bash 5.0.0(1)

The switch statement in question above (on the top) is from the function print_redirection and contains 19 possible cases (with a total of 88 basic blocks). On the bottom you can see we’ve selected case 0xC (in green as before), bringing the resultant CFG basic-block-count down to just 12.

In some cases, you can reduce your scope even more dramatically. Below, just the original CFG of the function parameter_brace_expand which has over eight hundred basic blocks, where the red/blue/green highlighting shows what the slice would be (reduced CFG not shown):

Block slice of parameter_brace_expandbase from Bash 5.0.0(1)

Creating a block slice

To start slicing a function, you first need to open a Tantō pane. Go to View -> Show Slices - Tantō or search “Tanto” from the command palette.

With a Tantō pane open, select a line in the block you want your slice to be based on, then go to Tools -> Tanto -> Block Slices and select Add Block to Slice (or search for it in the command palette). You can also assign a keyboard shortcut through the Keybindings menu (under Preferences).

Advanced block slices

What gets included in a function slice is based on reachability, which can sometimes cause undesired loop contents to be included in a slice. To cut through that, Tantō provides the ability to blacklist blocks from a slice.

Block slice of stream_on_stack from Bash 5.0.0(1)

The function above is stream_on_stack. It appears to be searching through a linked list to see if it contains the value in arg1. It returns false early if the linked list hasn’t been initialized, and will return true from the loop as soon as the value is found. It will also return false if it reaches the end of the list without finding the value. In this minimal example, we only care about the first two of these three returns as we do not want to see when the search fails (makes for great screenshots for your boss; if the function can’t return false, you’ll never have an error, right?). We’d also like to hide the loop.

Tantō can easily handle adding multiple target blocks. However, we’ll run into a problem when selecting the return 1 block: there exists possible program flow through the loop to get to that block. Thankfully, Tantō provides an easy workaround: a basic-block blacklist! By blacklisting the do while block (as indicated by the white highlight), we finally get the block slice as shown above. If you were to blacklist the rax = *rax block instead, we would still see do while block in the slice because the target block (return 1) would still be reachable from the do while block.

Slices do not need to flow through the entry block of a function. Slices represent all possible paths to reach a given set of target blocks that don’t contain basic blocks from the specified blacklist.

The blacklist can be especially helpful for functions that have large switch statements inside of a loop or for when you want to center-slice out a portion of a function for analysis.

About the missing basic blocks…

So far, I haven’t shown you any screenshots using the default settings for Tantō. When it comes to functions Tantō could actually be useful for—ones that are large and may have their graphs drastically reorganized—it can become easy to lose your place between it and the slice. Enabling Show Address (as has been in all the prior images) helps by providing concrete locations, but by default Tantō has additional features to help you map between the original CFG and the slice.

Missing Children Highlight option in Tantō

Any basic block that in the original CFG could branch to a basic block that’s been eliminated (on the left, in red) by the slice is highlighted in yellow on both sides. This can be toggled in the function slice by going to Tools -> Tanto and selecting “Toggle Missing Children Highlight”. In those menus you’ll also see a few other options to toggle different highlights on and off to match your preferences.

There’s one additional option for Tantō called “Subgraph Replacement”. In some cases, instead of completely eliminating unreachable basic blocks, it can be more helpful to “collapse” them instead.

Subgraph Replacement option in Tantō

In this view, eliminated portions of the CFG (eliminated subgraphs) are collapsed into a single red basic block. All of the control flow out of the eliminated subgraph is then applied to that block. This could make for some very strange looking CFGs, but also might help you understand a slice’s context within the larger function.

Other perspectives

If CFGs aren’t your thing, then you might be interested in the “Toggle Line Highlights” option. When this is enabled, Tantō will apply highlights by IL address. This lets you set targets in any of our IL forms or views and see the resultant slice in any other IL form as a graph. This allows you to slice a function in Binary Ninja’s High Level Intermediate Language (HLIL) and see exactly which assembly instructions are executing along that path*.

* Gaps in the highlighting may represent instructions eliminated in the IL you selected a target block in or areas where IL mappings are incomplete.

How it helps

All of this is intended to help you understand a function more quickly. By identifying key points of interest in a function, Tantō can simplify what you spend your time analyzing. With the various highlighting options, you can choose what best helps you understand how that slice of the function fits in to the larger program.

However, basic block-based function slices are somewhat primitive; programs aren’t simply the sequence of executed instructions, they’re the manipulation of data. For that, we need variable slices.

Variable slices

The thought that inspired Tantō was: leveraging variables, can we make a visualization or interaction that enables users to gain new insight into the behavior of a function? I believe the answer is “yes” and variable slices are the solution to that problem.

Variable slices combine the concept of reachability from basic block slices that with variable liveness. In short: a given variable is considered “live” if it has both been defined and will be used. A variable that is not live is free to be garbage collected, overwritten, or manipulated in any way. Dead-stores are instances where values are stored and never used, or stored and then overwritten (and thus also never used).

Variable slice of absolute_pathname from Bash 5.0.0(1)

In the image above, in the function absolute_pathname, rax is the only variable selected for the slice. All the instructions that use rax are highlighted in green and the control-flow required to reach those instructions that don’t use rax are highlighted in yellow. Note that this starts at the first assignment of rax rather than the entry of the function. On the right, you can see this reduced control flow graph; instructions that don’t use rax are faded in gray.

(At the time of writing, variable slices only work on HLIL (SSA and non-SSA). Tantō could be extended to support variable slicing on MLIL, too, but that didn’t make it into the initial release.)

An intuitive understanding of variable slices might be to think of it as a dataflow graph (DFG). But, compared to “real” DFGs, Tantō’s variable slices encode control flow dependencies instead of dataflow dependencies. At a high level, variable slices represent dataflow/data-lifetimes inside the CFG, or “a function from the variable’s perspective.” There are some interesting consequences of this, but I’m not sure I’ve completely figured those out, and that goes beyond the scope of this blog post.

Much like the basic block function slices, any number of target variables can be combined in a slice, allowing you to reconstruct dataflow through multiple variables, or view the life cycle and interactions among many variables.

Potential uses (and some speculation)

Originally, I made variable slices to solve the problem of “this function is too large and I have no idea where this variable was last used”. One possible solution would be to add keybindings to navigate forward/backward between variable uses and definitions, though variable slices might be able to provide a similar amount of context without having to reorient yourself in a CFG after each jump.

The most straight-forward use for this type of function slice would be for dataflow analysis. Combining multiple variables together in a slice would allow you to see how one variable interacts with/flows through another and better understand how data is propagating through a function. As you analyze a function, you could add variables to your slice and build a view of the function that only contains the data/data-interactions that you care about.

What I’ve found most interesting, and have been speculating about, is using variable slices like these (though not directly through the UI) in the function fingerprinting space. I’ve long suspected that a dataflow-based approach to fingerprinting might prove to be robust against compiler optimizations and versions, as well as source code changes that don’t completely redefine the implementation of a function. Treating each variable slice as a record of what happens to data within a function, a similarity score for two slices could be generated from the count of matching operations, matching constant interactions (2 + var_a), and matching variable interactions (var_f + var_a). Considering all slices, a confidence metric could be derived for whether two functions match. Significant research would be required to answer these questions concretely… and, if you could solve subgraph isomorphism at the same time, that’d be great!

Conclusion

We believe that visualizations are one of the most important tools to help humans make sense of what’s going on inside code. Everything from default themes that are easy on your eyes, to stack view, variables view, linear view, the feature map, bytes view, and so on, are provided to make the information that our users need as accessible as possible. We are working hard to improve the results of our analysis and how users interact with Binary Ninja itself. New visualizations like these are important for the future of Binary Ninja, as well as reverse engineering tools at-large.

If you’d like to give Tantō a spin, you can download it through the Plugin Manager (Preferences -> Manage Plugins) or install it yourself from GitHub. Like almost all of our plugins, it is open source. If you encounter any issues, please file an issue report or let us know on the Binary Ninja Slack. And, if you’d like to help us make Tantō better, we’d love to review any pull requests you send our way.


文章来源: https://binary.ninja/2022/06/20/introducing-tanto.html
如有侵权请联系:admin#unsafe.sh