Optimization & Vectorization

Universiteit Utrecht - Information and Computing Sciences

academic year 2017/18 – 1st period

title image title image title image



Lectures & Slides

Examination & Grading

Course Overview


Literature & Links

Newsback to navigation

Recent news

October 16:

October 13:

  • Grades for assignment 2 can be downloaded here.

October 12:

October 9:

October 5:

  • Grades for the caching assignment can be downloaded here.

October 4:

Older news is still available here.

P4 Rankingback to navigation


Course Overview back to navigation

bunny logo image

Course: INFOMOV is a practical course on optimization: the art of improving software performance, without affecting functionality. We apply high level and low level optimizations, in a structured manner. Especially for the low level optimizations, we must intimately understand the hardware platform (CPU, GPU, memory, caches) and modify our code to use it efficiently.

Vectorization: Modern processors achieve their performance levels using parallel execution. This happens on the thread level, but also on the instruction level. Being able to produce efficient vectorized code is an important factor in achieving peak performance.

GPGPU: Graphics processors employ a streaming code execution model, taking vectorization to extremes, both in the programming model and the underlying architecture. Leveraging GPU processing power is an important option when optimizing existing code.

Context: Optimization is a vital skill for game engine developers, but also applies to other fields.

Lecturer: Jacco Bikker (j.bikker@uu.nl)

Comms: join us on Slack: INFOMOV2017.


  • Mondays, 09:00h - 12:45h
    Room BBG-165
  • Wednesdays, 09:00h - 12:45h
    Room BBG-165

The lecture will be given in English. Warning: A decent level of C/C++ is expected.

Files back to navigation


Cross-platform version of the C/C++ template, by Kevin van Mastrigt and Mathijs Lardinoije.
SIMD tutorial document.
Gravity example, with SIMD code.
GPGPU tutorial document.
OpenCL template, to be used with assignment 3. New: now with updated project files.

Additional resources will be made available during the course.

Lecture Slides & Recommended Readingsback to navigation

Below is a list of all lectures with a very brief summary of the topics, slides downloads, and recommended readings to prepare for the lecture.
This list is tentative.

Lecture 01
Mon Sep 11

Topic: Introduction This lecture serves as an introduction to the course. And: Profiling With or without knowledge of optimization, it proves hard to 'guess' application performance bottlenecks. Profiling is a vital first (and often repeated) step in a structured approach to optimization.

Suggested readings:

Designing for Performance, Scalability & Reliability: StarCraft II's Approach

lecture1 - introduction

Lecture 02
Wed Sep 13

Topic: Low Level Optimization In this lecture, we explore various low level factors that determine application performance.

Suggested readings:

Michael Karbo, Inside the CPU (Chapter 30 from PC Architecture)

lecture2 - low level

Lecture 03
Mon Sep 18

Topic: Caching (1) Considering the huge latencies involved in fetching data from RAM, caches play a crucial role in 'feeding the beast'. We explore various cache architectures and investigate implications in software development.

Suggested readings:

What Every Programmer Should Know About Memory


lecture3 - caching (1)

Lecture 04
Wed Sep 20

Topic: Caching (2) Continuation of the topic of the previous lecture.

Suggested readings:

Game Programming Patterns - Data Locality


lecture4 - caching (2)

Lecture 05
Mon Sep 25

Topic: SIMD (1) With CPU clock speeds reaching practical limits, parallelism becomes the main source of further advances in hardware performance. In this lecture, Intel's approach to SIMD programming (SSE) is introduced.


lecture5 - simd (1)

Lecture 06
Wed Sep 27

Topic: SIMD (2) Building on the concepts of lecture 6, we investigate advanced SIMD topics such as gather / scatter and masking.

Suggested readings:

Looking for 4x speedups? SSE to the rescue! (warning: by Intel)


lecture6 - simd (2)

Lecture 07
Wed Oct 4

Topic: Data-Oriented Design Where Object-Oriented Design focuses on ease of development and maintainability, Data-Oriented Design focuses on data layout that is optimal for performance. Targeting the computer rather than the developer has significant impact on software architecture, but also on performance.

Suggested readings:

Data-Oriented Design (Or Why You Might Be Shooting Yourself in the Foot With OOP)

lecture7 - data oriented

Lecture 08
Mon Oct 9 

Topic: GPGPU (1) For certain problems, a streaming processor is a good (and powerful) alternative to the CPU. In this lecture, we briefly explore GPU architecture and the concept of GPGPU.


lecture8 - gpgpu (1)

 Lecture 09
Mon Oct 16

Topic: GPGPU (2) Building on the previous lecture, we investigate the implementation of a Verlet fluid simulator on the GPU.


Verlet physics - stages

lecture9 - gpgpu (2)

Lecture 10
Wed Oct 18

Topic: GPGPU (3) - GPGPU-specific algorithms for common problems.
And: Optimizing for GPU Like CPU software, GPGPU code benefits from hardware-specific optimizations. Several examples for AMD and NVidia are explored.

Suggested readings:

A Survey of General-Purpose Computation on Graphics Hardware


will be made available after the lecture

Lecture 11
Mon Oct 23

Topic: Fixed Point Math Floating point calculations can often be done with integer arithmetic, and there are good reasons for doing so. In this lecture, the 'lost art' of fixed point arithmetic is introduced.

Suggested readings:

The Neglected Art of Fixed Point Arithmetic


will be made available after the lecture

Lecture 12
Mon Oct 30

Topic: Snippets - Various examples of optimizations that worked and didn't work.


will be made available after the lecture

Lecture 13
Mon Oct 30

To be determined.

Lecture 14
Wed Nov 1

Topic: Process & Grand Recap Final lecture, recap of the structured optimization process, recap of various concepts, exam preparation.

will be made available after the lecture


Course Schedule back to navigation

Period 1 Schedule (tentative)

Week Date Lecture / Exams Practicum Deadlines
Mon Sep 11
Lecture 1:
Introduction & Profiling
  First practicum: profiling tutorial.
Document, project files.
Wed Sep 13
Lecture 2:
Low-level optimization

Project files for glass example.
Mon Sep 18
Lecture 3:
Caching (1)
  P1 will be introduced.
Wed Sep 20
Lecture 4:
Caching (2)

Mon Sep 25
Lecture 5:
SIMD (1)
Wed Sep 27
Lecture 6:
SIMD (2)

P2 will be introduced.
Mon Oct 2
No lecture (extended LAB)
Tue Oct 3: Deadline Assignment 1
Click here for P1 details
Wed Oct 4, 23:59:
Extended Deadline (-1 pt)
Wed Oct 4
Lecture 7:
Data-Centric Design

Mon Oct 9
Lecture 8:
Wed Oct 11
No lecture (extended LAB) Thu Oct 12: Deadline Assignment 2
Click here for P2 details.
Fri Oct 13, 23:59:
Extended Deadline (-1 pt)
Mon Oct 16
Lecture 9:

P3 will be introduced.
Wed Oct 18
Lecture 10:

Mon Oct 23
Lecture 11:
Fixed Point Math
Wed Oct 25
Lecture 12:
No lecture (extended LAB)
Thu Oct 26: Deadline Assignment 3
Click here for P3 details.
Fri Oct 27, 23:59:
Extended Deadline (-1 pt)
Mon Oct 30
Lecture 13:
Tue Nov 2, 23:59:
Deadline Final Assignment
Wed Nov 1
Lecture 14:
Process & Grand Recap

P4 final results / showcase


Tue Nov 7, 17:00:
Final Exam in
BEATRIX 7th floor


Assignment P1 - Caching

"We've Seen this Before" - Cache simulator.
Details can be found in the P1 assignment description.
The project can be downloaded here. EDIT: a test application that doesn't crash can be downloaded here. EDIT2: fixed version of original application.
Deadline: Tuesday October 3rd, 23:59. Extended deadline (1pt penalty): Wednesday October 4th, 23:59.

Assignment P2 - SIMD

"Use your Brain" - Neural Network.
Details can be found in the P2 assignment description (v2) .
The project can be downloaded here.
Deadline: Thursday October 12th, 23:59. Extended deadline (1pt penalty): Friday October 13, 23:59.

Assignment P3 - gpgpu

"It's Alive" - Conway's Game of Life.
Details can be found in the P3 assignment description.
The project can be downloaded here. The OpenCL template is available from the files section.
Deadline: Thursday October 26th, 23:59. Extended deadline (1pt penalty): Friday October 27th, 23:59.


Assignment P4 - tanks

"This Is Not a Drill" - Final assignment. Note: assignment P4 is available at the start of the course and is designed to overlap the other three assignments.
Details can be found in the P4 assignment description.
The project can be downloaded here.
Deadline: Tuesday November 2nd, 23:59. Note: no extended deadline.

Alternative Assignment P4

Instead of working on the standard P4 assignment, you may also propose your own project. This is intended for people that want to use the INFOMOV course to optimize a specific application. Please contact me for details before proceeding.

Exam & Grading back to navigation


Programming assignments: Your practical grade P is based on three programming assignments P1, P2, P3 (20% each) and one final assignment P4 (40%).

Exam: Your exam / theory grade T is based on a single final exam.

Final grade: Your final grade is (3P + T) / 4. You must score at least 4.0 (before rounding) for the exam to pass this course.


Retake: To qualify for a retake, the final grade must be at least 4 (before rounding). You may repair your final grade by redoing one of the four assignments, or the exam. Exact terms will be discussed individually.


Literature & Links back to navigation

Overview of literature used during this course:

Previous editions


News Archive back to navigation

Old posts

October 2:

September 27:

September 25:

September 20:

September 14:

September 14:

September 13:

September 11:

  • Lecture 1 slides now available.
  • Cross-platform version of the template can be downloaded from the files section.

September 8:

August 28:

  • Tweaked grading parameters, reordered lectures, P4 now overlaps P1..P3.
  • Added a Slack team: INFOMOV2017.

August 1:

  • 2017/2018 site online.