Welcome to Dataflow and Abstract Interpretation project.
Note that the description below can be understood as describing
a family of projects: variants are possible, and may be necessary
in order to deal varying group sizes.
The following aspects are always negotiable
I do insist that you use (embellished) monotone frameworks to solve your problem.
- what language to analyze
- which analysis to perform
- in which language to make the implementation
- what parts of the chosen language will be omitted (and possible other restrictions).
In the end, I want not only the code, but also: a range of examples, and documentation
containing a talk through of one of the examples, the restrictions you imposed,
what your lattice is, how join is defined, whether you use widening and what the operator
is, the architecture of the implementation (what can be found where in your code),
how to compile, how to run, what packages and tools do I need,
major design choices (flow sensitive or not, context sensitive or not), etc.
I reward students that come up with their own challenging assignments (and succeed), but
the practice of teaching APA over the last few years is that students that are not part
of the Programming Technology track/studielijn of the COSC master have a hard time
dealing with the issues that arise in this case. My hope is that students who want to excel
choose a realistic language with a realistically advanced analysis, and that it motivates
them to work hard on it.
Below I first discuss a default assignment for those not too familiar with programming languages
and the problems that arise when parsing and dealing with a realistic language.
This default assignment is like a safe route to a passing grade, if you are not too
sure about yourself. You can always come to me for advice on this.
Default assignment: The While Language
In the recommended book for APA (see Literature), the standard language to analyze
is the While language (and we include procedures!). Instead of dealing with the intricacies
of a modern programming language, you can also spend more time dealing with
the complexity of analyzing programs instead.
The idea is ``simply'' to implement embellished monotone frameworks
an elegant fashion in the implementation language of your choice.
You should then instantiate that framework is at least three ways to
attain that number of analysis: one of the four (Available Expressions,
Live Variables, Busy Expressions and ), Constant Propagation, one other analysis
(invented by you, or taken from the exercises in the book).
Do the above very well (well = correct, usable, well-documented and cleanly coded),
and you are on route to an 8. You can expand on this assignment (and attain a higher
grade), by choosing a complicated analysis (e.g. the shape analysis of Chapter 2),
or by adding complicated new language constructs (continue, breaks, simultaneous assignments,
function calls etc.). Essentially, in the latter case, you are growing towards
a realistic programming language.
A restriction for this assignment is that you can only do it alone or in pairs.
Real-life assignment: Soft Typing for PHP
The language under analysis is PHP (typically the version that you happened to find a parser for).
Soft typing aims to discover at every program point, and for every variable which types
the variable may have at that program point.
Note that these sets may be infinite, so at some point we shall need to apply
widening to obtain a computable analysis.
The main ingredients of the assignment are to parse PHP and build an AST, to contruct
a control-flow graph for the program, to generate constraints between types,
and to combine these in an implementation of monotone frameworks that is rich
enough to deal with the parts of PHP that you want to support.
Restrictions on the language
There can be many ways to restrict the problem:
- expressions do not have assignments (but may have side-effecting function calls)
- no eval
- no static variables, and every variable is either local or a global to a function, but never both.
- no classes
- simple or no import statements
- no nested functions
You typically need a way to deal with imported functions. I can give you a list of
functions and variables defined in PHP with their respective (polymorphic) types.
Easiest is to do this like Patrick Camphuijsen did.
I advise to start from a small language (i.e., no subroutines)
and make that work first. Only then add new features.
Restrictions on the analysis
Do a context-insensitive analysis.
Note that what Patrick Camphuijsen attempted to do is a flow-insensitive analyses for
global variables (one set per variable), and a flow-sensitive one for local variables (one set per variable
per program point). This makes his algorithms bit a less elegant, because this is hard-coded.
The topic of Patrick Camphuijsen was to generate warnings. You can do that too, but
at this point I prefer you to make the computation of the sets of types your priority.
Further reading and material
Reading material: the master thesis of Patrick Camphuijsen
be of help. I do think the generated constraints should be redesigned, and would like the implementation
to be made in Haskell. There is also something of a paper/technical report:
, that is somewhat more concise.
Chris Eidhof pointed me to a Haskell parser for PHP: FaceBook PHP parser
Other languages to analyze
It would be wise to avoid closures at this point.
Perform forward or backward static slicing for PHP
This project is largely like the one described at the top, but the subject analysis is now slicing.
The basic idea of backwards slicing is, for a given expression, to ``remove'' the parts
of the program that do not contribute to the value of the expression.
Similarly a forward variant can be formulated.
To read up on slicing, read:
For small groups a taintedness analysis (as in Perl) can be performed on PHP.
A more complicated analysis, would be to consider protocol checking. For example,
to make sure that a file is always open when it is read from, or a database connection
is known to exist when a query is sent there. To make this work for PHP, we certainly
need some restrictions. The thesis of Patrick Camphuijsen has
some information on this kind of analysis a well.
- 10 May 2010