Answering Reachability Queries On Very Large Directed Graphs

Stc
Date: 2010-09-29

Time: 15:00

Room: BBL 083

Speaker: Bas van Schaik

Title: Answering reachability queries on very large directed graphs -- introducing a new data structure using bit vector compression (thesis defense)

Abstract

Answering reachability queries on graphs has been subject of extensive research for the last couple of decades. Reachability data structures and algorithms provide an answer to probably one of the easiest sounding questions in graph theory: can some vertex i reach another vertex j (possibly using intermediate vertices), or not? Such queries can be answered in many different ways, for example by using data structures based on a representation of the transitive closure of a graph.

Starting in the 1950's, computer scientists and mathematicians have presented multiple ways to process a graph, extract reachability information and represent the transitive closure. Data sources -- and the graphs representing them -- are vastly growing, forcing researchers to look for new ways to efficiently represent a transitive closure, which grows quadratically in the number of vertices of a graph. Additionally, the amount of time required to process a graph and build a transitive closure data structure should be limited, as well as the amount of time required to answer reachability queries using the data structure.

This thesis provides an introduction to transitive closure computation and introduces the concept of bit vector compression to reduce the amount of memory required to represent a transitive closure. A new data structure (based on bit vector compression) is presented, together with both a theoretical and experimental analysis to compare its performance -- in terms of memory usage, construction time and query response time -- to data structures presented in publications at major conferences.