Assignment Run Time Code Generation
Pt
The paper `
Runtime code generation in JVM and CLR' by
Peter Sestoft describes how run-time code generation can be achieved efficiently in the JVM anD CLR.
The approach builds on APIs for bytecode generation.
While the approach is practical, writing algorithms in bytecode API calls is not very readable.
The goal of this assignment is to develop a syntactic embedding of Java in Java and `assimilate' that embedding to sequences of calls to the GNU Bytecode API.
Run-Time Code Generation with a Bytecode Library
Consider the following example from `
Runtime code generation in JVM and CLR'. The
Power method is defined
as follows:
public static int Power(int n, int x) {
int p;
p = 1;
while(n > 0)
{
if(n % 2 == 0) {
x = x * x;
n = n / 2;
} else {
p = p * x;
n = n - 1;
}
}
return p;
}
The
PowerGen method generates a specialization of the
Power method for fixed values of n
using the
gnu.bytecode library:
public static void PowerGen(Method mo, int n) {
CodeAttr jvmg = mo.getCode();
Scope scope = mo.pushScope();
Variable varx = jvmg.addLocal(Type.int_type, "x");
Variable varp = jvmg.addLocal(Type.int_type, "p");
jvmg.emitPushInt(1);
jvmg.emitStore(varp); // p = 1;
while (n > 0) {
if (n % 2 == 0) {
jvmg.emitLoad(varx); // x is arg_0
jvmg.emitLoad(varx);
jvmg.emitMul();
jvmg.emitStore(varx); // x = x * x
n = n / 2;
} else {
jvmg.emitLoad(varp); // load p
jvmg.emitLoad(varx); // load x (arg_0)
jvmg.emitMul();
jvmg.emitStore(varp); // p = p * x
n = n - 1;
}
}
jvmg.emitLoad(varp);
jvmg.emitReturn(); // return p;
mo.popScope();
}
For
n is 16 this produces the following sequence of bytecodes:
0: iconst_1
1: istore_1
2: iload_0
3: iload_0
4: imul
5: istore_0
6: iload_0
7: iload_0
8: imul
9: istore_0
10: iload_0
11: iload_0
12: imul
13: istore_0
14: iload_0
15: iload_0
16: imul
17: istore_0
18: iload_1
19: iload_0
20: imul
21: istore_1
22: iload_1
23: ireturn
Syntactic Embedding
It would be attractive to be able to write concrete Java syntax instead of bytecode API calls.
To achieve this we create an embedding of Java in Java that allows us to quote pieces of Java code.
A basic syntactic embedding is defined by the following SDF definition:
module Java-15-in-Java-15
imports Java-15 Java-15-Prefixed
exports
context-free start-symbols CompilationUnit
context-free syntax
"genclass" "|[" JavaClassDec "]|" -> Expr {cons("QuoteClassDec")}
"genbstms" "|[" JavaBlockStm* "]|" ";" -> Stm {cons("QuoteBlockStms")}
"#genbstms" "|[" BlockStm* "]|" ";" -> JavaStm {cons("EscapeFromStm")}
"#var[" JavaId "]" -> Id {cons("MetaVar")}
A complete syntax definition and parse table can be created using the following Makefile:
JAVAFRONT = $(HOME)/.nix-profile/
JAVAFRONTSDF = $(JAVAFRONT)/share/sdf/java-front
SDFINCLUDES = -I $(JAVAFRONTSDF) -Idef $(JAVAFRONTSDF)/Java-15-Prefixed.def
all : Java-15-in-Java-15.tbl Java-15-in-Java-15.str
Java-15-in-Java-15.def : Java-15-in-Java-15.sdf
pack-sdf -i $< -o $@ $(SDFINCLUDES)
Java-15-in-Java-15.tbl : Java-15-in-Java-15.def
sdf2table -i $< -o $@ -m Java-15-in-Java-15
Java-15-in-Java-15.rtg : Java-15-in-Java-15.def
sdf2rtg -i $< -o $@ --main Java-15-in-Java-15
Java-15-in-Java-15.str : Java-15-in-Java-15.rtg
rtg2sig -i $< -o $@
An example of the use of the embedding is the following generator for the
power function:
public static ClassType GenPowerClass(int n) {
return genclass |[
public class MyClass {
public static int MyPower(int x) {
int p;
p = 1;
#genbstms|[
while (n > 0) {
if (n % 2 == 0) {
genbstms|[ x = x * x; ]|;
n = n / 2;
} else {
genbstms|[ p = p * x; ]|;
n = n - 1;
}
}
]|
return p;
}
}
]|;
}
The quotation
genclass|[ class declaration ]| creates a
ClassType object and
fills it with bytecodes according to the declaration.
The anti-quotation
#genbstms|[ statements ]| escapes to the meta-level to execute the
statements at generation-time.
In the example the anti-quotation has the effect of unrolling the loop.
Assimilation Scheme
Given a syntactictic embedding we need an assimilation that translates the embedded code to an appropriate
implementation. The idea of the Java in Java embedding that we are considering here is that a quoted Java
fragment should produce a sequence of calls to the gnu.bytecode API in order to create a class file implementing
the Java code.
A problem to consider is how different piece of generating code can interact with each other.
For instance, how do the quoted code fragments in the antiquoted while loop above end up in the specialized
Power method?
In order to do this smoothly we adopt a couple of conventions, which are illustrated by the following assimilation of
GenPowerClass
(and discussed afterwards):
public static gnu.bytecode.ClassType GenPowerClass(int n)
{
ClassType thisClass;
thisClass = new ClassType("MyClass");
thisClass.setSuper("java.lang.Object");
thisClass.setModifiers(Access.PUBLIC);
Method thisMethod = thisClass.addMethod("MyPower");
thisMethod.setSignature("(I)I");
thisMethod.setModifiers(Access.PUBLIC | Access.STATIC);
thisMethod.initCode();
CodeAttr thisCode = thisMethod.getCode();
Variable var_0 = thisCode.addLocal(Type.int_type, "x");
thisCode.pushScope();
Variable var_1 = thisCode.addLocal(Type.int_type, "p");
thisCode.emitPushInt(1);
thisCode.emitStore(var_1);
while(n > 0)
{
if(n % 2 == 0)
{
{
thisCode.emitLoad(var_0);
thisCode.emitLoad(var_0);
thisCode.emitMul();
thisCode.emitStore(var_0);
}
n = n / 2;
}
else
{
{
thisCode.emitLoad(var_1);
thisCode.emitLoad(var_0);
thisCode.emitMul();
thisCode.emitStore(var_1);
}
n = n - 1;
}
}
thisCode.emitLoad(var_1);
thisCode.emitReturn();
thisCode.popScope();
return thisClass;
}
The gnu.bytecode library provides an API for creating elements of a class file.
A complete class is represented by a
ClassType object and is a container
for class member declarations.
For instance, a method is added using the
addMethod method, wich returns a
Method object.
A
Method has a
CodeAttr, which stores the bytecodes for the
method. New bytecode instructions can be added by applying
emit methods to
the
CodeAttr.
The idea of the assimilation now is that a code fragment knows where to add code by referring to the appropriate
this object.
For instance, a method always lives in an environment where there exists a
thisClass.
Similarly, a statement always lives in the context of a
thisCode.
These simple conventions make it possible to assimilate quoted code fragments in isolation.
The programmer has to be aware of these conventions.
For example, factoring out the generation of the body of a method requires
that the appropriate environment is passed to the method generating the
method body. In the following example, the
PowerGen method
is called with the
thisCode environment. Furthermore, any
variables in that are in scope in the body should be passed as well.
That is the purpose of the
#var[identifier] quotation:
public static ClassType GenPowerClass(int n) {
return genclass |[
public class MyClass // extends java.lang.Object
{
public static int MyPower(int x) {
#genbstms|[ PowerGen(thisCode, n, #var[x]); ]|;
}
}
]|;
}
public static void PowerGen(CodeAttr thisCode, int n, Variable #var[x]) {
genbstms|[
int p;
p = 1;
#genbstms|[
while (n > 0) {
if (n % 2 == 0) {
genbstms|[ x = x * x; ]|;
n = n / 2;
} else {
genbstms|[ p = p * x; ]|;
n = n - 1;
}
}
]|
return p;
]|;
}
Consider the result of assimilating these methods:
public static gnu.bytecode.ClassType GenPowerClass(int n)
{
ClassType thisClass;
thisClass = new ClassType("MyClass");
thisClass.setSuper("java.lang.Object");
thisClass.setModifiers(Access.PUBLIC);
Method thisMethod = thisClass.addMethod("MyPower");
thisMethod.setSignature("(I)I");
thisMethod.setModifiers(Access.PUBLIC | Access.STATIC);
thisMethod.initCode();
CodeAttr thisCode = thisMethod.getCode();
Variable var_0 = thisCode.addLocal(Type.int_type, "x");
thisCode.pushScope();
PowerGen(thisCode, n, var_0);
;
thisCode.popScope();
return thisClass;
}
public static void PowerGen(gnu.bytecode.CodeAttr thisCode, int n, gnu.bytecode.Variable var_1)
{
{
Variable var_2 = thisCode.addLocal(Type.int_type, "p");
thisCode.emitPushInt(1);
thisCode.emitStore(var_2);
while(n > 0)
{
if(n % 2 == 0)
{
{
thisCode.emitLoad(var_1);
thisCode.emitLoad(var_1);
thisCode.emitMul();
thisCode.emitStore(var_1);
}
n = n / 2;
}
else
{
{
thisCode.emitLoad(var_2);
thisCode.emitLoad(var_1);
thisCode.emitMul();
thisCode.emitStore(var_2);
}
n = n - 1;
}
}
thisCode.emitLoad(var_2);
thisCode.emitReturn();
}
}
Assignment
The goal of the assignment is to create a fairly complete embedding and assimilation for Java in Java.
We measure completeness by means of the examples provided by Sestoft.
- Create concrete syntax versions of the Sestoft examples
- Extend the syntactic embedding if necessary
- Develop an assimilator that translates embedded Java to gny,bytecode API calls
- Test the embedding and assimilator against the examples, and develop your own testsuite
Experimental Setup
We cannot use the Java parser from the Dryad library, since that is for plain Java.
Instead we use the parser for the embedding that we created above. The following
Makefile defines a couple of actions to apply to Java-in-Java files:
JARS=../gnu-bytecode/kawa-1.8.jar
javainjava = ../embedding/Java-15-in-Java-15.tbl
# parse Java file with parse table for embedding
%.ajava : %.java
sglri -p $(javainjava) -i $< | pp-aterm -o $@
# assimilate embedded Java code to bytecode API calls
%.assim.ajava : %.ajava ../src/java-bytecode-assimilation
../src/java-bytecode-assimilation -i $< | pp-aterm -o $@
# flatten uses of expression blocks
%.eflat.ajava : %.assim.ajava ../src/java-bytecode-assimilation
core-lift-eblocks -i $< | pp-aterm -o $@
# pretty-print
%.txt : %.ajava
pp-java -i $< -o $@
# rename to proper Java file
# convention FooGen.java uses concrete syntax, but defines
# a class names FooGenerated.java
%erated.java : %.eflat.txt
cp $< $@
clean:
rm -f *.ajava
# compile Java code
%.class : %.java
javac -cp $(JARS) $<
# run the Java code with appropriate command-line arguments
%.run : %.class
java -cp $(JARS) $* $(RUNARGS_$*)
RUNARGS_RTCG4Generated = 1024
RUNARGS_RTCG4bGenerated = 1024
test : RTCG4Generated.class
Outline of the Assimilator
The assimilator proper uses the dryad library to reclassify ambiguous names.
This is stretching the proper use of that operation, but seems to work if
we assume that the generated code lives in the same (package/import) environment
as the generator
module java-bytecode-assimilation
imports libdryad Java-15-in-Java-15 Java-EBlock
strategies
main =
init-observables
; xtc-multi-io-wrap(
observables-wrap(
map(read-from)
; map(define-compilation-unit)
; dryad-reclassify
; map(get-ast)
; alltd(assimilate-in-method)
)
)
init-observables =
where(
<set-config> (ObservableClasses(), [
<xtc-find> "rt.classes",
"<yourpathhere>/gnu-bytecode/kawa-1.8.jar"
])
)
rules
// find quoted code fragments in methods
assimilate-in-method :
MethodDec(head1, block1) -> MethodDec(head2, block2)
where {| LocalVar
: <alltd(declare-meta-param)> head1 => head2
; <alltd(assimilate-quotes)> block1 => block2
|}
// declare parameters marked as meta-var;
// use dynamic rule to propagate to uses of variables
declare-meta-param :
Param([], t, MetaVar(Id(x))) -> Param([], t, var)
where <newjavaid> "var" => var
; rules( LocalVar : x -> var )
// assimilated quote fragments
assimilate-quotes =
assimilate-quote-block-stms
<+ assimilate-quoted-class
<+ assimilate-meta-var
// a class fragment is an expression that produces a ClassType
assimilate-quoted-class :
QuoteClassDec(cdec) ->
expr|[ {| ClassType thisClass; ~bstm*:<assimilate-class> cdec | thisClass |} ]|
assimilate-quote-block-stms :
QuoteBlockStms(stms) -> Block(<mapconcat(assimilate-block-statement)>stms)
assimilate-meta-var :
MetaVar(Id(x)) -> var
where <LocalVar> x => var
assimilate-block-statement =
fail
assimilate-class =
fail
strategies // utils
// Create a new Java identifier given a string.
newjavaid =
!Id(<newname>)
Note that in order to parse Stratego with embedded Java you need to declare a .meta file (java-bytecode-assimilation.meta), which declares the syntax to use, in this case the
Java-EBlock in Stratego embedding.
Meta([Syntax("Stratego-Java-EBlock")])
# Makefile for compiling java-bytecode-assimilation.str
# declare path for pkg-config
export PKG_CONFIG_PATH := $(HOME)/.nix-profile/lib/pkgconfig:$(PKG_CONFIG_PATH)
JAVAFRONT = $(HOME)/.nix-profile
DRYAD = $(HOME)/.nix-profile
XTCFLAGS = `pkg-config --variable=strcxtcflags dryad`
STRCFLAGS = `pkg-config --variable=strcflags dryad`
# assume that syntax definition is in ../embedding
STRINCLUDES = \
-I ../embedding \
-I $(JAVAFRONT)/share/sdf/java-front/ \
-I $(JAVAFRONT)/share/java-front/ \
-I $(DRYAD)/share/dryad
STRRUNLIBS = -la stratego-lib
all: java-bytecode-assimilation
java-bytecode-assimilation : java-bytecode-assimilation.str Makefile
strc -i $< $(STRINCLUDES) $(STRCFLAGS) $(STRRUNLIBS) $(XTCFLAGS)
References
Paper describing run-time code generation with examples
The bytecode API to be used in the assignment
Another bytecode library
If you need to know how to assimilate some Java source code to Java bytecode, then you can use
javap -p to decompile a sample Java source file to bytecode. You can also use the tool
class2aterm, which is provided by Dryad. Also, there are loads of resources available on the web on Java bytecode basics. Just search the web for something that suits you.
Resources
- Makefile: makefile for syntactic embedding
- maak: script to test the Java examples