The TOP project deals with the development of tool and library support for building compilers
which deal with improving user feedback during program analysis. We have plenty of experience
with type inferencing higher-order polymorphic functional language, which we have also built into
compiler which we use for educational purposes. The compiler supports
the Haskell 98
language almost fully. The accepted proposal
for the NWO Open 2004 Spring edition
Motivation from type inferencing
When confronted with the user feedback given by compilers for higher-order functional languages,
one often has to stop and think what is meant. An expert programmer is used to getting this kind
of feedback from his compiler, but the messages do form a large hurdle for novice programmers.
This problem is even larger in programming languages with high level of abstractions, because
error messages tend to be cast in these abstractions, and not necessarily with the same syntactic
sugar as they are offered to the programmer. We advocate staying as close as possible to the input
source code (notwithstanding the extra
effort this may bring for the compiler builder).
We do not stop there: every human being has his own style of programming, a different way of
arriving at his solution. Now, if his solution is type-erroneous, we would like to be able
to adapt the type inferencing process to the programming style. At this point this takes the form
of passing certain parameters to the compiler.
Going even further, a programmer can adapt the type inference for certain sets of expressions,
and provide tailor-made error messages.
This is especially useful for embedded domain specific language, to override the messages
which usually only reflect the underlying encoding in the language, and do not highlight
the concepts from the domain.
Constraint based type inferencing in Helium
We have used our library for the type inferencer which is part of the Helium
It has the following properties:
- We generate precise position information and preserve type synonyms in error messages.
- The programmer can choose the type inference strategy of his liking (M and W and other greedy variants, and the unbiased type graph based implementations have been implemented).
- The type graph implementation uses quite a number of heuristics to decide what is the most likely source of the error.
- A logging facility is available in Helium which has given us a large amount of correct and erroneous Haskell programs which can be used to improve our type inferencer. In the future these programs can also be used for benchmarking optimizations and many other purposes. The programs have been anonimized, but the relation between programs by the same programmer has been kept intact. Various questions can then be answered: Do our hints help? Are they used? It is easy to come up with many interesting questions. Currently we have about 300 MB of sources from a single functional programming course.
- A major innovation is the ability for a programmer to develop his domain specific type rules for a combinator library he might be writing. In addition, he may specify that his experiences are that certain functions are often mixed up. As a result, a compiler may give the hint that (++) should be used instead of (:), because (++) happens to fit in the context.
The domain specific type inference rules are automatically checked for soundness, and a programmer does not have to be familiar with the process of type inferencing as it currently takes place within the compiler.
An article on this facility can be found in the ICFP '03 proceedings (reference below).
An extension of our work which copes with type classes and introduces special type class
directives is described in a paper at PADL 2005. Some aspects have been included in the
Going beyond type inferencing
At its most general, Top
is a framework for constructing abstract interpretations which focuses on
giving good feedback on why an abstract interpretation fails.
In a mathematical notation this is usually made explicit by returning the top
element of a
(complete) lattice. Our work can be understood as how to refine this single top element
into a number of elements, each of which has information about why abstract interpretation failed.
Additionally, we want to offer heuristics for ending up in the 'right' top element. These heuristics
can be generally applicable (like a majority heuristic which blames that part of the program
for which we have the least evidence that it is correct), or highly problem domain specific
(since students tend to confuse : and ++ in Haskell, we have a heuristic that detects this