Sunday, February 17, 2008

Creating OpenGL Objects in a Second Thread - Mac, Linux, Windows

This blog entry explains how to set up OpenGL to create new OpenGL "objects" (vertex buffer objects, display lists, and textures) from a second thread. The benefits of this technique are:
  • It moves CPU-intensive driver operations to a second core even if the driver is not multi-threaded internally*. CPU-based on-the-fly texture compression would be an example where you can get a win.
  • It allows you to move the entire process of building up and loading a scene-graph to a second thread, regardless of the use of OpenGL loading commands (like building of VBOs).
  • Because there is no use of the main thread, the main rendering loop can continue to run at full speed (assuming the user has a dual-core machine).
The basic strategy is to build an OpenGL context for each worker thread that will do loading such that the contexts share objects (display lists, etc.) with the main rendering thread's OpenGL context.

(This information comes from in-field tests; if you work on the GL implementation and spot something wrong, or if you find in deployment that this doesn't work, please let me know. Also a warning: the latest version of this code hasn't been in-field very wrong - in the past edge cases have appeared that have caused us to redesign threaded loading more times than I'd like to admit.)

A warning on thread safety: OpenGL's asynchronous nature (sequential deferred execution within a context) leads to some tricky threading bugs that you wouldn't normally see - see this blog post for examples of what can go wrong.

Terminology

While the structures differ between AGL/CGL (Macintosh), WGL (Windows) and GLX (Linux/X11) there are some common features in the way the window manager and the GL interface.
  • "Pixel Format." All three operating systems define a pixel format as an opaque description of a series of properties for a rendering context. In my tests, I use the same pixel format for all contexts.
  • "Drawable." Basically any window, full screen destination, or off-screen hardware accelerated buffer is a "drawable", and it defines where the result of GL commands to a context end up.
Macintosh

The Macintosh is probably the easiest OS to set this up on, because you don't need to use a drawable for the loader contexts.
  1. Create new contexts for each worker thread, using the main rendering context as a source for shared objects.
  2. In the start of the worker thread, set the worker thread's context to be current.
Windows

On Windows, you'll need to do all of the creation work inside the worker thread:
  1. Get a DC from the main window. This should create a new DC that your worker thread can use. (There is apparently nothing wrong with requesting multiple DCs from a single window.)
  2. Create a new context for the worker thread.
  3. Set the new context to share objects with your main context.
  4. Set the worker context to current using the DC you acquired.
Note that your main thread will have to temporarily drop the main context as its current context - otherwise you'll get a "resource busy" error when you try to share contexts - the main thread effectively has that context locked when in use, so you'll want to set the main thread's current context to null, then set it back when done.

Also note that you can't use window-owned or class-owned DCs in this design because either of those will cause you to get back one DC every time you call GetDC and you really need a unique DC for each thread. Be sure to release the DC on the same thread you use to acquire it!

Linux

Linux is similar to the Macintosh, except that the worker threads need a drawable. For this I used p-buffers.
  1. The main thread creates a new context, shared with the main context.
  2. The worker thread makes a new small pbuffer for itself.
  3. The worker thread sets the worker context as current, using the pbuffer as its drawable.
I tried using the main window as the drawable for all threads, but this causes some kind of locking or thrashing between threads and hurts performance of the main renderer. Extra hidden windows were unreliable for reasons I was not able to determine.

Alternatives

The alternative to this design is to use an RPC-like mechanism to execute all OpenGL activity in the main thread, e.g. once per frame pick up any OpenGL work to be done from worker threads, run it, then let them continue. This has some advantages and disadvantages:
  • On the pro side, it doesn't require multiple contexts - you may find that some drivers do better executing everything on one context.
  • OpenGL work is done at the same time in every frame, possibly resulting in more controlled operation, particularly on single-threaded machines.
  • On the con side, loader code that needs to create OpenGL objects before proceeding (to keep memory footprint down, for example) may operate slowly since it has limited access to the GL.
* This assumes that the driver is not multi-threaded internally, but is capable of performing these CPU-based operations without locking out the render thread via a mutex...a big assumption that I cannot validate or invalidate at this point.

4 comments:

  1. Nice article, Benjamin. I've played around with this some time ago with and created a small example for X11/glX-based systems, in case somebody wants a starting point.

    ReplyDelete
  2. Thanks for the article! Great, now I have to go change my design... :)

    ReplyDelete
  3. Thanks for the great information! Have you tried to use two threads (one write, one read) to access one drawable on Linux? I've seen the locking/performance problem as well. Do you have any suggestion?

    ReplyDelete
  4. Very useful information in your article. Thanks for that. Could you post a Windows example that shows the solution. I am trying to load my display list on the worker thread. However, the display list object is created on the main thread. Would appreciate more feedback on this.

    ReplyDelete