GRIFFIN: Guarding Control Flows Using Intel Processor Trace
Summary: The author only attempt to prove the performance overhead optimization using Intel PT for online verification. They claim to verify the enforcement policy for both backward and forward indirect control transfer with different strictness of policy when they completely discard the discussion regarding how they achieve these policies to verify with.
Design: To efficiently extract the information from Intel PT trace log, they propose a fast lookup design. The basic block to derefered pointer lookup is basically a mechanism similar to a cache. They hardly describe anything related their Coarse-grained and Fine-grained policy, but as they only trace TIP packet in Intel PT trace, they at-most can use indirect control transfer information based policy. They simply omit enough discussion on stateful policy for forward control transfer by mentioning they use a similar approach to PathArmor.
Advantage: They briefed about advanced topics like how to handle context switch, process fork etc. when they omit much needed basic discussion.
Disadvantage: The writing is a complete hide and seek game. It misses much of the basic discussion. It completely discards anything about how they get the policies. However, they put enforcement in a parallel process and enforce strict restriction in sensitive syscall entry. This overall concise the whole idea of control flow integrity. Hence, it is not meaningful how the performance overhead they achieve.
Discussion: An offline debugging hardware should not be considered for online verification, it will be just a waste of time.
Transparent and Efficient CFI Enforcement with Intel Processor Trace
Source Code: N/A
Summary: The project is aiming to protect indirect control transfer through coarse-grained indirect control flow graph using Intel PT only at security sensitive system call point. The system uses a fast and slow check hybrid method to achieve efficiency. The fast check doesn’t require to decode the trace and only available if the dynamically fuzzing based training CFG have the exact trace. Otherwise, a slow path has to decode the trace and verify it against a static analysis based CFG (TypeArmor).
Design: The design consists of four tasks. First of all, there is a static analysis based CFG. Then, they will rank the CFG edges based dynamic fuzzing training CFG. There will be a system call interceptor in the kernel to warn security sensitive system call. The flow checker will collect trace and based on its trace, will match with the fast path or slow path. The CFG edges are only from indirect control transfer as the runtime will only use Intel PT TIP packets.
Advantage: The performance overhead is acceptable, but the security guarantee is very low. The paper tries to describe every step.
Disadvantage: The design is very concise. First of all, they only monitor security sensitive indirect control flow transfer. The fast path only available for edges with fuzzy CFG (depends on code coverage). The slow path uses static CFG which is overapproximate. The CFG itself only consists of indirect control transfer edges as Intel PT only use with TIP packets.
Discussion: This is the 3rd paper who also tries to use an offline debug purpose hardware, Intel PT, to use for online verification. Their limiting monitor on security sensitive system call once again prove that this hardware is not good choice for this purpose.
PT-CFI: Transparent Backward-Edge Control Flow Violation Detection Using Intel Processor Trace
Source Code: N/A
Summary: Intel PT is an Intel hardware support for offline debugging. It can capture compressed data packets for indirect control flow transfer, conditional branch taken/not taken etc. PT-CFI attempts to use the hardware feature for the backward edges through enforcing a shadow style protection. It leaves forward indirect control transfer out of its scope due to the unavailability of complete point-to analysis to generate the required CFI policy. It also limits its verification to critical syscall to undo the performance overhead.
Design: PT-CFI has four components: 1) packet parser will generate the TIP sequence and fed to 2) TIP graph matching (build upon training). If not match, invokes 3) Deep inspection to decode and construct shadow stack. If a match with shadow stack, add the new TIP sequence to TIP graph; otherwise 4) the syscall hooker will be informed to terminate.
Advantage: The paper describe the basic knowledge on Intel PT well.
Disadvantage: The paper fails to clarify crucial parts. For example, in deep inspection, they construct a shadow stack based on Intel PT traces and claim to match with shadow stack (what shadow stack?). They leave forward indirect control transfer out of consideration. The performance overhead is still very high (e.g. gcc spec with 67%).
Discussion: Intel PT is introduced for offline analysis, using it for online validation is not overall a good idea. The CFI policy generation with training concept is not well described. Mostly, the deep inspection is hugely misleading.
Enforcing Unique Code Target Property for Control-Flow Integrity
Source Code: https://github.com/uCFI-GATech
Summary: The project has tried to achieve an ambitious goal: based on their execution history, enforce a CFI policy that will allow only one valid target for an indirect jump/call. For decades, researchers have tried to design a strict enforcement, a strong CFI policy. But the performance overhead and complex real-world program fail them every time. This project is also no exception to them. They have failed to evaluate spec benchmark like gcc, xalancbmk, omnetpp etc. which are the most important spec benchmark to evaluate considering their complexity with implementing CFI defense (they also have the most number of indirect calls). The research hugely depends on Intel PT hardware support which usually fails to capture complete information if they have frequently encountered. Basically, Intel PT is designed for offline debug purpose, so there is no chance to fix this issue for an online analysis.
Design: The project can be divided into three parts: 1) constrain data identification 2) arbitrary data collection and 3) online analysis. Constrain data is a program variable which is not a function pointer but has a significant impact to fix where a function pointer will target. For example: if there is an array of function pointer, and someone uses variable i as the index for that array to call an indirect function; then i is going to be a constraint data. They use compile-time analysis to identify first instructions related to function pointer calculation, second, they check the operands of those instructions if they are variable. Once they know constraint data, they have to collect the data at runtime (2). They use compile-time instrumentation for this purpose. Now, as they are planning to use Intel PT to track execution history, so they have to figure out a way how to force the instrumention to entry the arbitrary value in Intel PT trace. Hence, they come out with a trick: they call their defined write function with constraining data value as the parameter and from there, they create multiple indirect function call with return only body where the target will be fixed by a chunk of the passed value. To reduce the number of packet trace, they also do the same for the sensitive basic block where they also assign a unique id for each of them. With all of these, they can now get rid of checking TNT packets (for conditional control transfer) while only checking TIP packets (indirect control transfer). For online analysis(3), they decide to it in separate processor parallel to the original program. They have defined a read method to reconstruct the arbitrary data from Intel PT packet trace and validate it with a mapping.
Advantage: If you can perfectly identify every constraint data for every sensitive indirect control transfer, then if you have a perfectly working hardware to trace the history, then if the separate validation does not harm your security guarantee, then it is a very good system.
Disadvantage: Unfortunately, not any of them is possible to work perfectly. The static analysis with the complex real-world application will never be successful with such a simple algorithm. There will be always special cases and you have to keep updating your analysis mechanism time by time. Next, a missing perfect hardware is eventually reported by themselves. And everyone knows, you need to know before you jump. Pause your program in sensitive system call-points can easily cause a DoS attack by creating a real long pending list of validation check through frequently created indirect control flow transfer. About performance overhead claim, they have mentioned 10% average performance overhead. But, for perlbench and h264ref they report 47% and 24% performance overhead when they have not evaluated similar benchmark gcc, omnetpp, and xalancbmk. If we even consider 25% overhead for each of them, the overall overhead will go up at least 15%. They have evaluated nginx and vsftpd but they have not mentioned what test suites they have used. Moreover, their system is creating more indirect call, maybe even more than that exists in the target program.
Discussion: To emphasize their security improvement, they have mentioned a case study from h264ref spec benchmark. They show there is a structure that has a member function pointer. This structure is later used to create an array of size 1200. With that, they simply assume, there are 1200 valid targets for that member function pointer (unique one for each of them) and they have reduced it to 1. But the truth is that member function has only taken three valid functions in the entire source code. The project is too ambitious and hence not practical.