MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

Source Code: N/A

Summary: Syzkaller is one of the most popular kernel fuzzer. It generates a sequence of random system calls. Due to the randomness, most of them are unrealistic cases. They lose the efficiency because they don’t consider dependency (both implicit and explicit) among system calls. There are some other kernel fuzzer who uses a large number of benchmark program and use their sequence of system call to their fuzzer. In this project, they use a filtering algorithm that can identify dependent syscalls from real-world applications and use these syscall blocks into the fuzzer to the randomness of the blocks.

Design: The project basically extends the syzkaller and replace their random syscall sequence generator with random syscall block sequence generator. In the first step, they pour down the real-world application and observe their explicit and implicit syscall dependency. One example could be a file read; a file first must open, then read and usually they also close the file. If we think deeper, before a file read, the application also usually allocate some memory as the buffer to store the read data. In kernel perspective, these dependencies could be implicit or explicit. An explicit dependency is if a return of a syscall is used as an argument to another syscall. An implicit dependency is if two different syscalls use the same data structure in the kernel. When an explicit syscall is straightforward to resolve, the implicit dependency requires a kernel level static analysis where it collects if two different syscalls may use the same global variable in the kernel where one is a write and another is a read.

Advantage: The motivation is realistic, the design is simple, hence the project is significant and practical. They have found 17 new vulnerabilities of which 7 of them can be also found through explicit dependency check.

Disadvantage: The system don’t try to prove exploitability like every other. But, as the goal of the system is to generate more logical fuzzing input than complete randomness, their probability of exploitability is higher. They have confirmed 9 out of 17 have already fixed.

Discussion: The motivation is genuine. Although the workload is not heavy, it is the impact which let it be a significant work.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.