SVF: Interprocedural Static Value-Flow Analysis in LLVM

SVF is a static analysis framework implemented in LLVM that allows value-flow construction and pointer analysis to be performed in an iterative manner (sparse analysis – analysis conducted into stages, from overapproximate analysis to precise, expensive analysis). It uses (default) points-to information from Andersen’s analysis and constructs an interprocedural memory SSA (Static-Single Assignment) form where def-use chains of both top-level and address-taken variables are included. From the design perspective, the SVF can be seen as two modules: pointer analysis and memory SSA. To be capable of allowing another points-to analysis than Andersen’s, the design of pointer analysis is split into three parts: Graph, Rules, and Solver. To give control of scalability and precision, the memory SSA is designed to allow users to partition memory into a set of regions. Comparing to previous work, SVF takes a big leap by constructing an interprocedural value-flow graph. To identify bugs, it is an important requirement to be able to analyze a program across its procedural boundaries.

The latest version of SVF is implemented on LLVM-7.0. SVF takes LLVM IR code (a bitcode file) as input (LLVM IR module) and can analysis both C/C++ programs. To use SVF, the source needs to be compiled with Clang to generate the bitcode (clang -emit-llvm source.c -o source.bc) and later links all (.bc) file into a large single (.bc) file through LLVM Gold plugin (llvm-link source1.bc source2.bc -o source.bc).

As mentioned before, the pointer analysis consists of three loosely coupled components: Graph, Rules, and Solver. The graph is a higher level abstraction, shows where a pointer analysis should take place. A rule defines how to derive the points-to information from a statement in the graph. The ‘rules’ is the location where anyone can implement a different set of points-to analysis mechanism than Andersen’s. As the pointer-analysis will take place interprocedural, it is an important requirement to design a completely separate part to indicate which order should the points-to analysis work on the graph, i.e. the Solver. As an example, for Andersen’s points-to analysis, it uses transitive closure rules and a solver named wave. Similarly, a flow-sensitive pointer analysis uses a set of flow-sensitive strong/weal update rules with a points-to propagation solver on a sparse value-flow graph.

There are four steps in the value-flow graph construction. First, it performs a ‘Mod-Ref Analysis’ to capture interprocedural reference and modification side effect for each variable (a set of indirect defs). The ‘Mem Region Partitioning’ separates the memory partition into regions to let the user have the capability of deciding scalability and precision trade-offs. The ‘Memory SSA’ take place once the indirect uses and defs are known. It follows a standard SSA conversion algorithm to structure the program in memory SSA form. Finally, the ‘VFG Construction’ module links the uses of a variable to its defs and construct the value-flow graph.

The following statements are important to construct a complete value-flow graph for address-taken variables:

  1. ADDROF is known as an allocation site (a stack object [alloca instruction] or a global object [an allocation site or a program function name] or a dynamically created object at its allocation site). Note: For flow-sensitive pointer analysis, the initializations for global objects take place at the entry of main().
  2. Copy denotes either a casting instruction or a value assignment from a PHI instruction in LLVM.
  3. PHI is introduced at a confluence point in the CFG to select the value of a variable from different control-flow branches.
  4. LOAD is a memory accessing instruction that reads a value from an address-taken object.
  5. STORE is a memory accessing instruction that writes a value into an address-taken object.
  6. CALL denotes a call instruction, where the target can be either a global variable (for a direct call) or a stack virtual register (for an indirect call). In a cell site, a return value or in the entry of a procedure, the parameter could be defined as direct or indirect.

One featured client of SVF is source-sink analysis (detecting security bugs). An example bug detector is a memory leak detector where it is important to find out whether a memory allocation (source) will reach a free site (sink) or not. SVF’s value-flow graph could be used by a source-sink analysis client to achieve it. The other client could be a selective instruction client which will allow dynamic analysis tool to look only at suspicious sites. It could also be used for developing a debug tool to guide the developer.

Besides their 40K lines of code for SVF, the group also open-sourced a pointer analysis mirco benchmark, PTABen.