HexType: Efficient Detection of Type Confusion Errors for C++
Source Code: https://github.com/HexHive/HexType
Summary: In a type-based programming language, typecasting is a common phenomenon. With the object-oriented programming paradigm, this feature turns into a dangerous attack surface. When a derived class object cast to parent class object (upcast), it is usually safe, considering parent class is a subset of the derived class. On the other hand, casting a parent class object to the derived class pointer (downcast) can raise a serious issue (if the derived class has an extra item than parent class). Moreover, these casting as often indirect (an object pass to a function as a class pointer type). The research work attempts to resolve the issue with runtime checking (similar checking as dynamic casting, but expensive and limited to classes with virtual function). The research expands the coverage with a more efficient runtime checking with their defined data structure.
Design: The design can be divided into three parts. 1) Type Relationship Information: This information is available to the compiler at compile time to verify static cast, but they have to instrument it to the binary to use it at runtime. 2) Object Type Tracing: Whenever an object is allocated, a table will update the object information with its original class type. To do that they instrument every allocation site. The table they use is a combination of a hashmap with each hash item represents an RB-tree. The hashmap check considers as a fast-path check while RB-tree traverse considers as slow-path check. The hashmap will frequently update to recent object verification. 3) Type Casting Verification: In every type casting location, instrumentation will ask for runtime verification of that casting object. The verification will collect runtime information and verify against (1)[if casting path allowed according to the tree] and (2)[original type of the object is casting of]. They expand this verification coverage to dynamic cast too by replacing it to theirs. However, they also optimize the performance by discarding safe object from being tracked and safe casting from being asked to verify (detect the safe object and casting through static analysis).
Advantage: The design is simple to be scalable. They also detect couples of type confusion bugs using this. Performance overhead is better than CaVer but worst than TypeSan. They also achieve better code coverage.
Disadvantage: Static analysis based optimization is not described completely. Traditional analyses are not reliable. Handling reinterpret cast is also dubious.
Discussion: Overall, type confusion is an interesting issue to cover. Detection through runtime verification is hopeful. C also could have a similar bug, they should all be covered.