Optimization & Vectorization

Universiteit Utrecht - Information and Computing Sciences

academic year 2016/17 – 1st period

title image title image title image

Navigation

News

Lectures & Slides

Examination & Grading

Course Overview

Schedule

Literature & Links

Newsback to navigation

Recent news

November 12:

  • Updated gradelist (with late submissions). Please check carefully.
  • Please remember to leave feedback on Caracal!

November 11:

  • Grades for P3 (including final grades) can now be downloaded.
  • Click here for review notes on P3.

November 9:


November 3:

  • Slides for lecture 13 and 14 are now available.

November 1:

  • FYI: the exam will be held in EDUC-ALFA on Tuesday, November 8, 17.00.


Click here for older news.

 

Course Overview back to navigation

bunny logo image

Course: INFOMOV is a practical course on optimization: the art of improving software performance, without affecting functionality. We apply high level and low level optimizations, in a structured manner. Especially for the low level optimizations, we must intimately understand the hardware platform (CPU, GPU, memory, caches) and modify our code to use it efficiently.

Vectorization: Modern processors achieve their performance levels using parallel execution. This happens on the thread level, but also on the instruction level. Being able to produce efficient vectorized code is an important factor in achieving peak performance.

GPGPU: Graphics processors employ a streaming code execution model, taking vectorization to extremes, both in the programming model and the underlying architecture. Leveraging GPU processing power is an important option when optimizing existing code.

Context: Optimization is a vital skill for game engine developers, but also applies to other fields.

Lecturer: Jacco Bikker (j.bikker@uu.nl)

Venue:

  • Mondays, 09:00h - 10:45h
    Room BBG-161
  • Mondays, 11:00h - 12:45h (LAB),
    Room BBG-169
  • Wednesdays, 09:00h - 10:45h
    Room BBG-023  (exception: week 41: BBG-209)
  • Wednesdays, 11:00h - 12:45h (LAB),
    Room BBG-223

The lecture will be given in English. Warning: A decent level of C/C++ is expected.


Files back to navigation

Downloads

Will be made available during the course.

Lecture Slides & Recommended Readingsback to navigation

Below is a list of all lectures with a very brief summary of the topics, slides downloads, and recommended readings to prepare for the lecture.

Lecture 01
Mon Sep 12

Topic: Introduction This lecture serves as an introduction to the course. And: Profiling With or without knowledge of optimization, it proves hard to 'guess' application performance bottlenecks. Profiling is a vital first (and often repeated) step in a structured approach to optimization.

Suggested readings:

Designing for Performance, Scalability & Reliability: StarCraft II's Approach

Slides:
lecture1 - introduction.pdf

 
Lecture 02
Wed Sep 14

Topic: Low Level Optimization In this lecture, we explore various low level factors that determine application performance.

Suggested readings:

Michael Karbo, Inside the CPU (Chapter 30 from PC Architecture)

Files:
Slides:
lecture2 - low-level.pdf

 
Lecture 03
Mon Sep 19

Topic: Caching (1) Considering the huge latencies involved in fetching data from RAM, caches play a crucial role in 'feeding the beast'. We explore various cache architectures and investigate implications in software development.

Suggested readings:

What Every Programmer Should Know About Memory

Slides:

lecture3 - caching (1).pdf


 
Lecture 04
Wed Sep 21

Topic: Low Level Optimization (2) As a follow-up on lecture 2, we will discuss the various practical low level optimizations that can be applied to the provided code in a hands-on session. We will go as deep as time permits.

Slides:
lecture4 - low level (2).pdf



 
Lecture 05
Mon Sep 26

Topic: Caching (2) Continuation of the topic of the previous lecture.

Suggested readings:

Game Programming Patterns - Data Locality

Files:

Slides:

lecture5 - caching (2).pdf


Lecture 06
Wed Sep 28

Topic: SIMD (1) With CPU clock speeds reaching practical limits, parallelism becomes the main source of further advances in hardware performance. In this lecture, Intel's approach to SIMD programming (SSE) is introduced.

Files:

Slides:

lecture6 - simd (1).pdf

 
Lecture 07
Wed Oct 5

Topic: SIMD (2) Building on the concepts of lecture 6, we investigate advanced SIMD topics such as gather / scatter and masking.

Suggested readings:

Looking for 4x speedups? SSE to the rescue! (warning: by Intel)

Slides:

lecture7 - simd (2).pdf


Lecture 08
Mon Oct 10 

Topic: Data-Oriented Design Where Object-Oriented Design focuses on ease of development and maintainability, Data-Oriented Design focuses on data layout that is optimal for performance. Targeting the computer rather than the developer has significant impact on software architecture, but also on performance.

Suggested readings:

Data-Oriented Design (Or Why You Might Be Shooting Yourself in the Foot With OOP)

Slides:
lecture8 - data oriented.pdf




 Lecture 09
Wed Oct 12

Topic: Fixed Point Math Floating point calculations can often be done with integer arithmetic, and there are good reasons for doing so. In this lecture, the 'lost art' of fixed point arithmetic is introduced.

Suggested readings:

The Neglected Art of Fixed Point Arithmetic

Slides:

lecture9 - fixed point.pdf

 
Lecture 10
Wed Oct 19

Topic: GPGPU (1) For certain problems, a streaming processor is a good (and powerful) alternative to the CPU. In this lecture, we briefly explore GPU architecture and the concept of GPGPU.

Files:

Slides:

lecture10 - gpgpu (1).pdf



Lecture 11
Mon Oct 24

Topic: GPGPU (2) Building on the previous lecture, we investigate the implementation of a Verlet fluid simulator on the GPU.

Slides:
lecture11 - gpgpu (2).pdf



 
Lecture 12
Wed Oct 26

Topic: GPGPU (3) - GPGPU-specific algorithms for common problems.
And: Optimizing for GPU Like CPU software, GPGPU code benefits from hardware-specific optimizations. Several examples for AMD and NVidia are explored.

Suggested readings:

A Survey of General-Purpose Computation on Graphics Hardware

Slides:

lecture12 - gpgpu (3).pdf

 
Lecture 13
Mon Oct 31

Topic: Applied Optimization - During this lecture, we will explore various high level and low level optimizations in small-scale applications. This is a code-intensive lecture.

Files:

Slides:

lecture13 - practical examples.pdf


 
Lecture 14
Wed Nov 2

Topic: Process & Grand Recap Final lecture, recap of the structured optimization process, recap of various concepts, exam preparation.

Slides:
lecture14 - process & recap.pdf


 
 

Course Schedule back to navigation

Period 1 Schedule

Week Date Lecture / Exams Practicum Deadlines
Literature / practice
37
Mon Sep 12
09:30-11:15
Lecture 1:
Introduction & Profiling
  Please watch: "Designing for Performance, Scalability & Reliability: StarCraft II's Approach". Have a look at the run-time profiling mechanism in water.zip. Consider how this could be efficiently extended to a generic run-time profiling system, and how this can be extended beyond measuring just time.
Wed Sep 14
11:00-12:45
Lecture 2:
Low-level optimization

Optimize the provided glass ball rendering code for single-core CPU. Bring your results to lecture 3 on Wednesday September 19.
38
Mon Sep 19
09:00-10:45  
Lecture 3:
Caching (1)
   
Wed Sep 21
09:00-10:45
Lecture 4:
Low level optimization (2)

 
39
Mon Sep 26
09:00-10:45
Lecture 5:
Caching (2)
   
Wed Sep 28
09:00-10:45
Lecture 6:
SIMD (1)

 
40
Mon Oct 3
09:00-10:45
No lecture (extended LAB)
Tue Oct 4: Deadline Assignment 1
Click here for P1 details

Wed Oct 5
09:00-10:45
Lecture 7:
SIMD (2)

 
41
Mon Oct 10
09:00-10:45
Lecture 8:
Data-Centric Design
   
Wed Oct 12
09:00-10:45
Lecture 9:
Fixed Point Math

 
42
Mon Oct 17
09:00-10:45
No lecture (extended LAB)
Tue Oct 18: Deadline Assignment 2
Click here for P2 details.
 
Wed Oct 19
09:00-10:45
Lecture 10:
GPGPU (1)

 
43
Mon Oct 24
09:00-10:45
Lecture 11:
GPGPU (2)
   
Wed Oct 26
09:00-10:45
Lecture 12:
GPGPU (3)

 
44
Mon Oct 31
09:00-10:45
Lecture 13:
Applied Optimization
   
Wed Nov 2
09:00-10:45
Lecture 14:
Process & Grand Recap

 

45


Tue Nov 8: Final Exam -
EDUC-ALFA, 17.00
Thu Nov 10, 23:59:
Deadline Final Assignment
Fri Nov 11, 23:59:
Extended Deadline Final Assignment (-1 pt)

 



Assignment P1

Develop a cache simulator.

Requirements (for a 6):

  • implement a correct N-way set associative cache within the supplied demo application;
  • implement a reasonable eviction policy. Note: this requires some research.

Optional (towards a perfect 10):

  • cover the full L1-L2-L3 cache hierarchy;
  • experiment with various eviction policies;
  • do a real-time visualization of cache efficiency;
  • implement read/write of 16 and 32-bit data types.

See slides for lecture 3 for more details.
Downloads:  fractal.zip,   OS X / Linux:  fractal_gen.zip

Deadline: Tuesday, October 4, 23.59.
Late delivery: Wednesday, October 5, 23.59 (1pt penalty).

Assignment P2

Apply the structured optimization process to a small game.

For a passing grade:

  • Execute profiling-guided optimizations in a logical order.
  • Mind the strict time budget; allocate effort wisely.
  • Provide proof of your deliberate approach.

Scores above the passing grade are awarded for:

  • high level optimizations beyond the obvious ones;
  • highly efficient code;
  • 'deep' optimizations, e.g. those taking into account CPU and cache hardware features.

Note that multi-threading and GPGPU will not significantly increase your grade: these is a labor-intensive optimizations which are considered 'obvious' and 'orthagonal'. Please focus on skills that fit the scope of the course instead.

Download: rts.zip

Deadline: Tuesday, October 18, 23.59.
Late delivery: Wednesday, October 19, 23.59 (1pt penalty).
 

Assignment P3

"This Is Not a Drill" - Optimize an application of choice using the structured optimization process. You can use any existing application for this. The ideal optimization target has a substantial size, is written in C++, and has optimization potential, preferably in all areas: high level, low level, SIMD, GPGPU. Projects in other programming languages are allowed; be aware of limits in terms of low level access in that case. Projects that force you to focus on one or a few areas are also allowed; be aware that either you need to go very deep, or accept a lower score. Contact me when in doubt.

Six projects have been prepared for you. These projects compile in Visual Studio, and represent varying  ambition levels.
  • goliath : procedural world generation. Details pop in with a delay; get rid of the delay. Challenge: setting up objective timing. Medium to high difficulty.
  • manifolds : implementation of a research paper. Small scale, but with lots of opportunities for GPGPU, low-level and high-level optimization. Low to medium difficulty.
  • pseuthe : game. Increase the number of entities to increase the workload. Medium difficulty.
  • sumatra : full-featured PDF viewer. Optimize for complex PDFs. Difficult.
  • windirstat : improve the time it takes to visualize folder contents. Medium difficulty.
  • yafe : fractal viewer. Port this to the GPU. Small scale application, low difficulty.

Improved project files:

Also check out the list of open source game clones.

Deliver the optimized project, with a full report, on or before the deadline.

Deadline: Thursday, November 10, 23.59.
Late delivery: Friday, November 11, 23.59 (1pt penalty). 

Exam & Grading back to navigation

GRADING

Programming assignments: Your practical grade P is based on two programming assignments (25% each) and a final assignment (50%).

Exam: Your exam / theory grade T is based on a single final exam.

Final grade: Your final grade is (2P + T) / 3.

RETAKES AND REQUIREMENTS

Retake: To qualify for a retake, the final grade must be at least 4 (before rounding). You may repair your final grade by redoing one of the three assignments, or the exam. Exact terms will be discussed individually.

 

Literature & Links back to navigation

Overview of literature used during this course:


Previous editions

 

News Archive back to navigation

Old posts


October 31:

  • Slides for GPGPU(3) now available.
  • Merged manifold / OpenCL template project, and notes on the process.
  • Optimized DotCloud files.
  • No slides yet; due to power outage we'll continue on Wednesday.

October 28:


October 25:


October 24:


October 19:

  • Some projects for P3 have been uploaded.
  • Added CUDA template code for lecture 10.
  • Slides for lecture 10 on GPGPU(1) now available.

October 13:


October 10:

  • Slides for the lecture on DOD are now available.

October 6:

  • Grades for the first assignment can be downloaded here.

October 5:

  • Slides for the second lecture on vectorization can now be downloaded.
  • Assignment 2 details can be found in the relevant section.

September 29:

  • Slides for the first lecture on vectorization can now be downloaded.
  • Note: next Monday is a lab-only session because of the October 4th deadline.

September 26:

  • Slides for Caching 2 are online.
  • Updated game.cpp for the cache assignment, so it skips the cache for visualization.

September 21:

  • Slides for the optimization session (lecture 4) are online.

September 20:


September 19:


September 15:


September 14:


September 12:

  • Slides for lecture 1 are now online.


September 7:

  • Schedule update: first lecture starts at 9.30 on September 12 in BBG-161.


August 16:

  • Website for 2016/2017 uploaded.