Below is a list of topics for experimentation projects in the general theme of programming/software technology. It is possible to do these projects even if you don't do the Programming Technology line, as long as you have the needed background. It is possible that after sometime the list becomes out-dated, but it should still give you a hint as to the general research areas of our researchers. You are welcome to approach them to ask if they have new projects.
Projects (in arbitrary order)
Using EHC's backend as a backend for Helium.
Currently, Helium's backend is an interpreter written in C and because the original developer left,
hard to maintain. EHC has a backend of its own, and supports quite a bit more of Haskell
than Helium currently does. In this project, you try to connect the two together via
the core language. The main complication will be that the object files of Helium also store
some type inference information, and this must be preserved in some way (probably through
, Bastiaan Heeren, Jeroen Fokker, Jurriaan Hage
Using Nix for building EHC.
Implementing and testing a generic plagiarism detection tool for computer programs
Marble was written by JurriaanHage
in Perl, mainly for plagiarism detection for Java programs.
I'd like to extend this to other languages, but without copying Marble and making a new
instantiation. You should investigate in what respect existing programming languages differ and
how they can be parameterized to yield a system that also support plagiarism detection for languages
similar or not so similar to Java.
Other extensions to the existing system are:
- dealing with programming assignments in which parts are supplied by the lecturer
- addition of new heuristics that can serve to add to the power and precision of the system (two heuristics know more than one).
If you think the tool can and should be implemented in a different language: this is certainly
open to discussion. Haskell is a good option as far as I am concerned.
Deliverables: the source code for the system, a manual and an architectural description.
Prerequisites: Grammars and Parsing, and Compiler Construction.
Repetitive Reduction Patters in Lambda Calculus with
with that title is being worked on. It describes an optimisation on λ-calculus with
(which means that it is applicable to Haskell) that lifts certain 'constant' parameters out of recursive positions. This transformation leads to a form that needs fewer β-reductions to evaluate. This is, however, a rather theoretical result, since the amount of β-reductions is not a good measure for efficiency on practical evaluators such as the run-time systems of existing Haskell implementations. Therefore besides the primary task of this project to implement the optimisation (analysis + transformation) it would also be interesting to determine in which cases exactly the optimisation actually increases run-time efficiency.
Code generation for type classes in Helium
Jeroen Weijers has added type classes to Helium's frontend.
We, however, still lack some facilities in the backend for
executing programs that have type classes.
Prerequisites: at the very least Compiler Construction. Advanced Compiler Construction would be helpful.
Adding logging facilities to GHC
This is a relatively lightweight thing to do.
The idea is that when a programmer compiles with --make
then he may elect to log the results of the compile
to a server somewhere. I do not think internal knowledge of the
compiler is necessary, programming will be through the GHC API.
Since this is a relatively small task, there's bound to be a need for doing
something extra, that may not be directly related to this particular project.
Develop a website for cross-comparing type error messages in Java
Nabil el Boustani and I developed a system for generating improved type
error messages for generic method invocations. We would like to offer
this as a service to programmers and students. The idea is to
have people upload a Java program, we compile it with our own compiler,
with Sun's javac and Eclipse's ejc and show all error messages.
We log the programs we thus obtain, and ask the people who use
the site to indicate what they think of the messages.
Information should be stored in a database for later querying.
Further extensions to PHP Validator
Patrick Camphuijsen developed a tool to perform all kinds of analyses
on PHP code, in Java. A new version was implemented by Henk Erik van der Hoek
in Haskell, which was again extended by Marcelo Sousa.
Your task is to further extend the system. For this you will need knowledge
of dataflow analysis (which you may learn during the course Automatic Program Analysis).
Making type directives for Helium more mature
We introduced type inference directives in a paper at ICFP 2003.
These have been implemented into the Helium compiler.
In this project, you shall implement additional features
to make type inference directives more user-friendly.
These may include also the type class inference directives
described in our paper at PADL 2005.
, Bastiaan Heeren
One way to give feedback to a programmer when his program contains a type error, is to show
possible ways to correct the program. Some years ago, Arjen Langebaerd made a first investigation into
this issue, see the thesis
We want to take this further by offering a domain specific language to programmers to allow them
to tune the repair process to their liking, for example by allowing them to tune the priorities, or by allowing
them to specify repair heuristics themselves and providing a general method inside the compiler to
find the best reparation and execute.
The project takes place within the context of Helium
a substantial part of your work will be to program an extension to the existing extension made by
Arjen Langebaerd and to test it thoroughly to obtain experimental results.
If succesful, this work can lead to a publication.
and Bastiaan Heeren
Design/build a Wiki system using Haskell
Experimentation projects in automated testing
Contact: Wishnu Prasetya
Automating Blackbox Debugging of Flash-based Applications
Imagine a Flash-based application that have been instrumented, so that user interactions are logged. The resulting log is thus essentially a sequence of entries, and each entry represents a user interaction/event (such as clicking on a button). When an error occurs, we can in theory replay the sequence of interactions to reproduce the error, which then allows a programmer to further inspect the execution. When an error is observed after an event e, it is not necessarily so that e itself is the cause of this error. It could be another event b that precedes a. Debugging here is defined as trying to identify the event that causes the error. A simple strategy to debug is to remove one event x from the log, e.g. we choose the last event, and then execute it. If the error does not
disappear then x is the the cause. Then we repeat the procedure, selecting a different event y to try. If after removing x the error does disappear, then x could be the cause. But it is also possible, that removing x causes the application to take a different execution path; one that avoids the actual cause of the error. Since we don't know this upfront, we will have to manually inspect x.
Since the log can be very long, the above approach can be very inefficient. To improve this, you will get set of equations, e.g. ab = b, or ab=ba, saying that from a certain abstraction the sequence ab will behave as b, or will behave as ba. For example, imagine the sequence:
The event d is called the witness event. It is the event that exposes the error, denoted by x, so that you notice it. Note that d is not necessarily the cause of the error. If you were given the equations ab=b and bb=b, the above sequence can be reduced to
which would then be easier to debug. Another example:
If we know cd = dc, it can be rewritten to:
But since d was the witness, it is likely that when we execute the latter sequence, the error will again surface after d (and not after a c). This implies that it is then only necessarily to debug
Notice that what we essentially do is that we remove events of which we know that they are more likely not to contribute towards the error. Events that we cannot remove have to debugged in the traditional way as described before. The new scheme may seem to be more efficient, but there is also a challenge there. The equations were determined based on some abstraction. So, applying them to reduce a log may produce a new log that is actually not executable. So, only reductions that produce executable logs should be applied. But to check whether a reduced log is executable, we have to execute it and see if it does not break in the middle. This costs time. So, some strategy would be needed here.
, which we will use to control a Flash-based application from a web browser. You need to produce a modular implementation that can be easily extended later. You need to implement at least one strategy as meant above, and then conduct some benchmarking on the strategy.
Inferring, Injecting, and Testing Specifications
Currently we have tools to infer pre and post-conditions of methods from the logs they generate during the testing phase. The strength is limited (this holds for any approach of dynamic inference), which means that the generated specifications are only partial. Nevertheless, we find a way to strengthen it by parameterizing the specifications with scenarios. Unfortunately we do not yet have the facility to measure how strong the resulting specifications are.
This can be solved by building a tool to inject the inferred specifications back into the methods, and providing a mutation test capability to measure their strength. Special attentions may have to be given on how to encode scenarios parameters, and how to inject mutations that do not break them. Additionally, mutation test is notoriously expensive. Creative ideas to cut down the computation cost will be greatly appreciated.
Logging Exceptional Behavior
Currently we have a quite sophisticated infrastructure to log ActionScript?
programs. However, we are still unable to log exceptional behavior. We really would like to improve this because in practice it is such behavior that usually leads to failure, and thus worth logging. A target method may show exceptional behavior if it throws an exception. It then enters a handler, if one is provided, or the method exits. This is currently not logged. Worse, in the second case the forced exit will case an important logging invariant to be broken, namely if the method entrance is logged, then there should be a corresponding exit log entry. We need you to extend the existing framework.
Design/build a web based interface for EHC
Generate a pretty printed visualisation (LaTeX and/or Html) of type derivation trees for EHC
Parser Combinator Related
With the new parser combinator library (under development) several issues are to be implemented
Generate Syntax Diagrams
Give a redefinition of the combinators (and probably extend them) such that syntax diagrams (e.g. using Tikz) can be generated. Change the error messages in such a way that they contain a reference (url) to these digarams, such that students can see why their input is erroneous.
Re-implement the abstract interpretation part
for the new library, such that parsing speed increases. In order to keep the library modular this may be done by giving an attribute-grammar based description of the library.
In principle we know how we animate
the execution of a function. For this we have however to write code. Can we use the Hat tracer to avoid having to write the specialised cases over and over again?