Light Speed Project Report
The Light Speed Team
Strives to produce a lightning fast ray tracer; electric signals in computers circuits travel at the speed of light, and we try to take advantage of those speeds to produce a raytracer which will strike you!
Members: Erik, Dragos, Bas, Frans-Willem
We have successfully implemented the Tier-3 features from the ICFP 2000 contest specification describing GML, the Generative Modelling Language. In this document we briefly give an overview of our code, what libraries we used, and which extra features we have added.
This document also contains a breakdown of what every team member has accomplished during the development of this project.
- parsec for implementing the parser.
- cpphs for catching cpp include statements and comments in the GML files.
- cmdargs for catching command line arguments.
- AC-PPM for writing PPM files.
- Vec and vector (Data.Vector) for graphics vector and matrix calculations, and anti-aliasing optimization respectively.
- transformers and mtl for monad transformers and different monads.
- GLUT and OpenGL for previews.
- Ranged-set for representation of intersections.
- time for timing in the GUI.
- wx wxHaskell was the library used to build the gui.
- stm for making parts of the raytracer concurrent.
- GUI with options to move the output image, change the field of view, tracing depth and more. See the LightSpeedGUI page for more information
- Anti-aliasing through 2x supersampling
- Parsing of cpp-includes and comments in GML files
- Multi-threadedness (and fastness) of the code
- Bounding boxes
- Preview language in GML with GLUT rendering
Overview of the Code
Main.hs Entry point for the program, handles command line arguments
+-- Evaluator.hs Implements all 9 gml evaluation rules
+-- Render.hs Forwarding render module with some data modification
+-- AbstractSyntax.hs Syntax definition for GML in data types
+-- DataTypes.hs Data type definition for the evalution (intermediate) values
+-- DegOptTrig.hs Degree-based Optimized Trigonometry
+-- Stack.hs Stack push and pop functions
+-- EvaluatorMod.hs Modification of the evaluator for use with the GUI
+-- LightSpeedGUI.hs Main and only GUI module, contains all controls and layout code
+-- Parser.hs The parser for gml files
+-- AntiAliasing.hs Code for the extra feature: anti-aliasing
+-- Light.hs For the handling of the light sources
+-- Math.hs Various vector and matrix math functions
+-- Scene.hs Definition of scene object data and some manipulation functions
+-- ScenePreview.hs Preview option that shows the scene in glut
+-- Shadow.hs Shadow handling functions
+-- Trace.hs Casting the rays (concurrently) for the ray tracing
+-- BoundedBoxes.hs Functions for creating and manipulating bounding boxes
+-- Intersections.hs Calculating the intersections between the scene objects
+-- LiftTransforms.hs Scene object algebra that lifts all transformations
+-- Bindings.hs Keyboard bindings for scene navigation
+-- Display.hs Display functions for the scene preview
- Proof of concept of the lexer and datatype: Frans-Willem
- Parser and cpp includes: Bas
- Interpreter: Mostly Dragos, also Frans-Willem
- Scene descriptions, intersections etc.: Mostly Frans-Willem, also Dragos
- GLUT scene preview: Dragos
- Anti-aliasing: Dragos
- GUI: Erik
- Build server: Erik
- Comments: All
Choice of Lexer/Parser
As a proof of concept the first start of a parser was a lexer done in Alex. The corresponding datatype defined by Frans-Willem proved to be very useful and close to the final data type we ended up using. This enabled us to start on the interpreter a bit earlier, although testing was harder. The bad error messages and less functional nature of Alex (and Happy) urged Bas to switch to Parsec.
The final choice of parser library was parsec. The implementation of the GML went quite smoothly except for the implementation of the floating point numbers with optional points and exponents. The main problem here was the nature of the Parsec lexeme parsers that would consume trailing whitespace which would not correspond to the grammar defined in the ICFP description. After definition of a set of own lexeme parsers and some extensive left-factoring we were left with a practically non-backtracking parser. The final step was to add some more helpful error messages for the parsers.
The hardest part here was the lack of documentation on the options of the cpp-hs library, which gave us quite some headaches. After finally defining a correct BoolOptions?
the rest was defined quite easily, also enabling us to make commenting easier in the GML files.
GLUT Scene preview
The GML evaluator features a preview command to display the scene primitives using GLUT, as wireframe objects.
Intersection and Difference won't work as intended; instead, they will be displayed as if union was used.
First, we wanted to get some visualization of geometrical primitives and rays to
help us compute intersections of rays and objects, but this plan was abandoned, as the
scene descriptions data structures changed and Frans-Willem was doing good progress on ray/object intersections without the help of visualization.
Then, we thought it would make a nice extra feature to be able to preview all the
scene objects by adding the following GML function, which opens a new window containing a 3d environment, in which it's possible to navigate using the arrow keys:
obj fov wid ht *preview*
Initial attempt: list of entry/exit points
Our initial plan on representing intersections and merging them using union and intersect was a simple list of entry and exit points with distances along the ray. However, we quickly found out that in several cases, this will not provide enough information.
For example, if a ray runs parallel to a plane, and is under it, merging it with any other intersection-list A should return A. But both an always-inside and an always-outside are an empty list.
Second attempt: State at –infinity and list of entry/exits.
Our first attempt to fix this was to add two Booleans to the intersection-list, one representing inside at –infinity, and one at infinity. With some pretty large merging functions, we got union and intersect working, and difference could be implemented by inverting the second list and doing an intersect instead.
Trouble was that even after removing the redundant information like whether an intersection is an entry or exit, or if the ray ends in or outside, and even after a lot of re-structuring, the code looked awful and was pretty difficult to understand.
Third and final attempt: Ranged-sets library
After some thinking, Dragos mentioned that we could also represent the intersections as ranges where the ray is inside a solid. After some googling we found the “Ranged-set” library for Haskell. Somewhat reluctantly (after all, Frans-Willem spent quite some time on the former method), we decided to at least try a little prototype.
After about 2 minutes, we had the entire thing working, including difference, and the large merge functions had been replaced by simple ranged-set operators. This convinced us to use this library, instead of rolling our own solution.
At a much later stage, it turned out that the ranged-sets library was our major bottleneck, but we decided to stick with it because it would keep the code much more concise.
Reference images trouble
After we got some images working, we tried to find some reference images to check if we did everything correctly. Some images looked “ok”, but definitely weren’t the same as the reference images. We were advised that the images needn’t be pixel perfect. Also we weren’t completely sure if the changes were because our ray tracer had an error, or if it was just the JPG artifacts messing with us, or the GIF color palette leading us on.
At first we decided to just move on, but some of us couldn’t really put the differences between the reference and our renders behind us, and spent some time hunting down a working ICFP2000 implementation to make sure it weren’t the JPG or GIF artefacts. It turned out the PLClubIn?
entry, which won that year, was still to be found somewhere, and with some hacking would still compile with the latest OCaml.
It turned out that we did indeed have some errors in the rendering, and having found them at this stage saved us a lot of trouble later with the reflections.
The problems we found using PLClubIn?
as a reference renderer:
- Our sphere intersection miscalculated the normals and distances slightly. With normal single spheres it’d look fine, but with shadowing it would be slightly off, and reflections would be completely off. After switching to the algebraic method instead of the geometric method of finding intersections this was solved.
- We used the wrong matrix to convert surface normals back after transformation. This only gave minor color offsets in basic pictures, but gave quite a bit of errors on (for example) the snowgoon image.
Problems in PLClubIN?
We did run into some problems with PLClubIn?
’s renderer though. One notable error was that for the bottom side of cylinders and cones, they used polar coordinates like the sphere, even though the ICFP document clearly states both u and v should be linearly linked to x and y.
Because of this, our chess.gml looks significantly different than the PLClubIn?
reference render. We are, however, convinced that our render is the proper one, at least on the texturing side ;).
Most new tricks you can learn a ray tracer involve tracing more rays, and thus making the rendering even slower. Because of this, we felt that speed was a high priority before focusing on other extras like anti aliasing. After all our optimizations, we’re nowhere near as fast as PLClubIn?
, nor are we as fast as we could be because of some design choices (e.g. using ranged-sets instead of custom built), but from our very first full renderer we’ve had at least a 15 fold increase in speed using these techniques.
One of the biggest bottlenecks in the first renderer weren’t the actual intersections, but the matrix multiplications in transforms. Each transform would do at least one matrix multiplication per ray, and that alone is good for about 16 floating point multiplications.
We figured that by pre-multiplying those matrixes, we could make sure that for each ray, a minimal number of matrix multiplications was performed.
For example, a single object that is transformed twice, can be made to use just one transformation by multiplying those matrices.
A union that consists of two transformed objects, and is itself transformed, can be made to use just two transformations instead of three.
All the way, easiest
We thought about an algorithm to minimize the number of transforms, but quickly concluded that just “lifting” the transforms all the way to the unit objects was probably easiest, and would gain a significant speed increase already. After all, the only case where this would not be optimal is when a union object consisting of two non-transformed objects is transformed. Here the number of total transformations would go from just one to two. In all other cases, the total number of transformations would decrease or stay the same.
Even though we minimized the number of transforms, they still were the slowest part in our renderer. We used an old ray tracing trick for this, where we pre-calculate a bounding box for all transformed objects. By checking if the ray crosses the bounding box before transforming the ray, we can save that transformation in a lot of rays that won’t ever cross that object, and thus save quite a bit of time.
Later we found that our difference operator was also quite slow, so we pre-calculated the bounding box for the first object, and checked that before doing any further hit-testing.
We felt that we were at this point only really calculating the bare minimum, and apart from the ranged-set library which was out of our hands, the only thing that could possibly be slow was the math itself.
Because from the very start we had split all math related functions in a separate module, it was very easy to try new approaches. We tried hard-coding all calculations, including matrix multiplication, leaving out unneeded calculations altogether, and we tried playing with unboxing and strictness. Nothing of this yielded any significant speed increase, so we opted to stay with the vector library we’d used from the start.
When we felt we had optimized everything we could, it was time to use additional computer power. Up till now, everything was single threaded, and on a dual-core computer, this was effectively only using half of the available power. As each ray can be computed separately, ray tracing is relatively easy to multi-thread, so we figured we should do that.
First we tried the simple `par` and `pseq` operators. It was rather difficult finding places to apply these, especially since our entire renderer is in the IO monad to allow rendering from surface functions. With par and pseq, we were unable to gain any significant performance boost, and it didn’t really look like it was using the second core at all.
ForkIO? all rays
Our second approach was a hand-made solution where all rays were placed in a Chan, and several worker threads would take rays from that channel, calculate them, and write them to another channel. The main thread would then read the output channels, and merge the results back in the original order.
In theory, this should’ve worked. In practice, the overhead of reading and writing to a channel was bigger than the average calculation time of a ray. Our multi-threaded approach was actually slower than the single threaded one, even though it was using 100% CPU power.
Final: Several worker threads and batches.
To fix this, we split up the rays in several packages, and have each worker thread take packages instead of individual rays. For example, in our current set up, there are 4 worker threads, and the rays are split in 32 packages.
This gave a nice 40%-ish performance boost as we had hoped, and this is what we sticked with.
We found that there are more ways of doing antialiasing; the one we have chosen is doing a 2x supersampling of the entire image
because of simplicity.
By shifting the points (rotation of 4 points forming a square around their center), slightly visually better results were obtained.
First, the algorithm implemented using Prelude list functions.
Then, using boxed vectors (Data.Vector) resulted in significant speed gain with only a little work,
while maintaining code readability.
All in all, we can now render most reference GMLs within several minutes, most even within seconds.
Our renderer appears to be true to the specification, rendering near pixel-perfect matches to PLClubIn?
and other reference images, and seems to be relatively fast.
- Chess.gml has minor surface acne on the back of the white horses heads. We believe this is due to it being an intersection between a cylinder and a cone, both of which have different bottom pictures, and because the bottoms are both at the same location rounding errors favour either the cone or the cylinder at different pixels.
- In cones, there is a border-case here which we don't take into account (yet): if ray passes through the cone-apex it could also be partly in the cone, instead of just a singleton hit. This will likely only result in a slightly different pixel for very specific rays, and will not cause any big problems.