Binary Relational Querying For Structural Source Code Analysis

Stc
Date: 2008-12-11

Time: 11:45

Room: BBL room 471

Speaker: Peter Rademaker

Title: Binary relational querying for structural source code analysis (thesis defense)

Abstract

Software analysis involves the retrieval of large numbers of relations from the source code of the program. These include call, inheritance, inclusion, and database access relations. The querying of these relations for the purpose of obtaining high-level overviews of internal architecture and information flow is a non-trivial task.

The graph library used by the Software Improvement Group (SIG) for this task has limitations regarding its expressiveness, abstraction facilities, and offered functionality. This makes it difficult and time-consuming to write queries that retrieve the necessary information. The goal of this thesis project was to find a new solution that removes these limitations.

First, we have compared existing source code query tools. The tools were compared with respect to ten criteria concerning language and tool features. The language features were compared by implementing four archetypical source code queries in each of the tools. Although the available solutions are promising, we found that only JRelCal offers the combination of good abstraction and extension facilities together with an API.

JRelCal is a prototype Java library that is based on Tarski's binary relational calculus. Binary relational calculus allows us to express source code analysis queries in a concise and declarative manner. JRelCal, however, has significant shortcomings: the performance of some of its operations is poor, it lacks some essential features, and its implementation contains several bugs. To overcome these shortcomings we have created a new implementation of JRelCal. The new implementation includes an optimised transitive closure operation, support for predicates, and an efficient representation for relations. We validated the new JRelCal in an extensive case study.