Apr, 2018: Zukunftstag 2018. Two pupils of the Gymnasium Horn came to visit the University of Bremen and created their own website with wordpress.

Apr, 2018: We got invited by KUKA to showcase our demo at the European Robotics Forum 2018 in Tampere, Finland

Feb, 2018: The House of Science, Bremen hosts an exhibition about local scientists and science projects with collaborators around the world. One of the featured exhibits is a demo of our Autonomous Surgical Lamps, developed by Jörn Teuber of the Computer Graphics and Virtual Reality group. The exhibition will be open until the 21st of April (photos).

Feb, 2018: The University of Bremen participates in the opening of a research laboratory in Bangkok.

Nov, 2017: 2017 VRST Best Poster Award Winner. Michael Bonfert, Melina Cahnbley, Inga Lehne, Ralf Morawe, Gabriel Zachmann and Johannes Schöning are winning the award for the poster titled "Augmented Invaders: A Mixed Reality Multiplayer Outdoor Game."

Nov, 2017: Organizers of the French VR conference and trade show Laval Virtual immersed themselves into a variety of different virtual environments where they learned about current projects of the Computer Graphics & Virtual Reality lab at the University of Bremen (full report, in German).

Sep, 2017: Founding Everyday Activity Science and Engineering (EASE). EASE is a interdisciplinary research center at the University of Bremen that investigates everyday activity science & engineering. For more Information click here.

Jun 17, 2017: Haptic and hand tracking demos at the Open Campus 2017.

Feb-Apr 2017: David Vilela (Mechanical Engineering Laboratory, University of Coruna, Spain) visited our lab. He is working on benchmarks to compare different intersection calculation methods in collisions, and also different force models.

Feb 2017: G. Zachmann and J. Teuber visited the Mahidol University in Bangkok, Thailand as part of a delegation from the University of Bremen. The goal of the visit was to foster the cooperation between the two universities and lay ground-work for future colaborations.

Jun 2016: Radio Bremen visited our lab to film the works of the Creative Unit "Intra-Operative Information" for a news magazine on the local TV station. Click here for the film at Radio Bremen. And Click here for the same film on our Website.

May 16, 2016: Patrick Lange was honored with the SIGSIM Best PhD Award at the ACM SIGSIM PADS Conference 2016.

Jun 19-21, 2015: G. Zachmann gives invited talk at the DAAD-Stipendiatentreffen in Bremen, Germany.

Jun 2015: Haptic and hand tracking demos at the Open Campus 2015.

Dec 08-10, 2014: ICAT-EGVE 2014 and EuroVR 2014 conferences at the University of Bremen organized by G. Zachmann.

Sep 25-26, 2014: GI VR/AR 2014 conference at the University of Bremen organized by G. Zachmann.

Sep 24-25, 2014: VRIPHYS 2014 conference at the University of Bremen organized by G. Zachmann .

Feb 4, 2014: G. Zachmann gives invited talk on Interaction Metaphors for Collaborative 3D Environments at Learntec.

Jan 2014: G. Zachmann got invited to be a Member of the Review Panel in the Human Brain Project for the Competitive Call for additional project partners

Nov 2013: Invited Talk at the "Cheffrühstück 2013"

Oct 2013: PhD thesis of Rene Weller published in the Springer Series on Touch and Haptic Systems.

Jun 2013: G. Zachmann participated in the Dagstuhl Seminar Virtual Realities (13241)

Jun 2013: Haptic and hand tracking demos at the Open Campus 2013.

Jun 2013: Invited talk at Symposium für Virtualität und Interaktion 2013 in Heidelberg by Rene Weller.

Apr 2013: Rene Weller was honored with the EuroHaptics Ph.D Award at the IEEE World Haptics Conference 2013.

Jan 2013: Talk at the graduation ceremony of the University of Bremen by Rene Weller.

Oct 2012: Invited Talk by G. Zachmann at the DLR VROOS Workshop Servicing im Weltraum -- Interaktive VR-Technologien zum On-Orbit Servicing in Oberpfaffenhofen, Munich, Germany.

Oct 2012: Daniel Mohr earned his doctorate in the field of vision-based pose estimation.

Sept 2012: G. Zachmann: Keynote Talk at ICEC 2012, 11th International Conference on Entertainment Computing.

Sep 2012: "Best Paper Award" at GI VR/AR Workshop in Düsseldorf.

Sep 2012: Rene Weller earned his doctorate in the field of collision detection.

Aug 2012: GI-VRAR-Calendar 2013 is available!

Massively Parallel Algorithms - SS 2018


About this Course

There are big changes afoot. The era of increased performance from faster single cores and optimized single core programs has ended. Instead, highly parallel GPU cores, initially developed for shading, can now run hundreds or thousands of threads in parallel. Consequently, they are increasingly being adopted to offload and augment conventional (albeit multi-core) CPUs. And the technology is getting better, faster, and cheaper. It will probably even become a general computing processor on mobile devices, because it offers more processing power per energy amount.

The high number of parallel cores, however, poses a great challenge for software and algorithm design that must expose massive parallelism to benefit from the new hardware architecture. The main purpose of the lecture is to teach practical algorithm design for such parallel hardware.

Simulation is widely regarded as the third pillar of science (in addition to experimentation and theory). Simulation has an ever increasing demand for high-performance computing. The latter has received a boost with the advent of GPUs, and it is even becoming -- to some extent -- a commodity.

There are many scientific areas where the knowledge you will gain in this course can be very valuable and useful to you:

In this course, you will get hands-on experience in developing software for massively parallel computing architectures. For the first half of the lecture, there will be supervised exercises to familiarize yourself with the CUDA parallel programming model and environment. The exercises will comprise, for instance, image processing algorithms, such as you might find in Photoshop. In the second half, you can decide whether to continue with exercises, or whether to work on a small self-assignment for the rest of the cource.

Prerequisites are:

  1. A little bit of experience with C/C++ ; note that we will need just plain old C during this course, but the concept of pointers should be familiar.
  2. Liking for algorithmic thinking.

Useful, but not required, is a computer/notebook with an Nvidia GPU that is CUDA capable. You can find a list of all supported GPU's here. If you don't have access to such a computer, you are welcome to work on your assignments in our lab.

Not required are

Some of the envisioned topics (these can change during the semester):

  1. Simple parallel program models and laws
  2. Introduction to CUDA and massively parallel programming
  3. Dense matrix algorithms
  4. Parallel prefix-sum
  5. Parallel sorting
  6. Dynamic Parallelism
  7. Introduction to Thrust
  8. Parallel sphere packing
  9. Parallel Hashing
All topics are accompanied and illustrated by lots of application examples.


The following table contains the topics in more detail, the accompanying slides, and the assignments (it will be filled step-by-step after the respective lectures).

Week Topics Slides Assignments Frameworks
1. Introduction: More Moore, stream programming model, Flynn's taxonomy, overall speedup, Amdahl's law, Gustafson's law.
Fundamental Concepts of Parallel Programming 1: terminology, control flow, transfering data,
2. Fundamental Concepts of Parallel Programming 2: blocks, data flow in general multi-dimensional blocks and grids, classes in CUDA, constant memory, simplistic raytracer, warps, thread divergence, more details on the GPU architecture, warps, measuring performance, GPU/CPU synchronization, shared memory, parallel computation of the dot product PDF Assignment 1 Examples cmake
3. Fundamental Concepts of Parallel Programming 3: barrier synchronization, race conditions, document similarity, utilizing lockstepped threads instad of explicit synchronization, heat transfer simulation, double buffering pattern, texture memory, edge detection (Sobel operator), GPU's memory hierarchy, parallel histogram computation, atomic operations, pipelining host-GPU calls PDF
4. Dense matrix algorithms: matrix vector product, column major storage, AoS versus SoA, coalesced memory access, auto-tuning, arithmetic intensity, matrix-matrix multiplication, blocked matrix multiplication, All Pairs Shortest Path as matrix multiplication, PDF Assignment 2
5. Parallel prefix-sum 1: definition, simple examples, Hillis-Steele algorithm, depth & work complexity, Blelloch's algorithm, optimization, Brent's theorem & application to prefix-sum, theoretical speedup based on Brent, introduction to sequential radix sort, radix sort on the GPU, split operation, stream compaction, compressed sparse row (CSR) storage, sparse matrix-vector multiplication PDF
6. Parallel prefix-sum 2: summed area tables, increased percision for SATs, complexity of computing SAT's, depth-of-field rendering (gathering & scattering), anisotropic texture filtering, face detection with Viola-Jones. PDF Assignment 3 MatrixMul
7. Parallel sorting: comparator networks, the 0-1 principle, odd-even mergesort, bitonic sorting, work and depth complexities, (digression: adaptive bitonic sorting) PDF
8. Applications of parallel sorting: BVH construction (space filling curves, Morton codes); deformable collision detection by sweep plane approach, PCA transformation, and clustering; faster ray-tracing by coherent ray packets. PDF Assignment 4 LineOfSight
9. Dynamic Parallelism: general usage, execution model, simple example, caveats, Mariani-Silver algorithm for the Mandelbrot set, the "W" pattern for multigrids.
Random Forests 1: classification problem, simple solutions, concept of decision trees, information, entropy, information gain,
10. Random Forests 2: conditional entropy, construction of decision trees, problems of decision trees, wisdom of crowds, random forests, bootstrapping/subsampling, randomization during construction, variants of RF's, out-of-bag error estimation, parallel construction of RFs, handwritten digit recognition, body tracking using depth images. PDF
11. Introduction to thrust.
Lab meeting
PDF (NVidia) Assignment 5 Raytracer
12. Fine-grained dynamic load balancing, example application: Reyes rendering pipeline.
Parallel sphere packings: collision detection basics, inner sphere trees, bacht neural gas, parallel bounding volume hierarchy construction,
13. Parallel Hashing: application intersection of point clouds, application geometric hashing, cuckoo hashing.
Lab meeting

Some very simple examples that can serve as a starting point. They contain a makefile and should compile out-of-the-box at least under Max OS X and Linux.


The following textbooks can help review the material covered in class:

Please note that the course is not based on one single textbook! Some topics might even not be covered in any current textbook! So, I suggest you first look at the books in the library before purchasing a copy.

If you plan on buying one of these books, you might want to consider buying a used copy -- they can often be purchased for a fraction of the price of a new one. Two good internet used book shops are Abebooks and BookButler.

Grades and Points achieved by the Assignments

For taking part in a so-called "Fachgespräch" (mini oral exam), you need a grade from the assignments >= 4.0 . You can get this by successfully completing at least 40% of all asignments.

Additional Literature

Other Interesting Bits and Pieces

Gabriel Zachmann
Last modified: Wed Jul 04 15:25:42 CEST 2018