- It moves CPU-intensive driver operations to a second core even if the driver is not multi-threaded internally*. CPU-based on-the-fly texture compression would be an example where you can get a win.
- It allows you to move the entire process of building up and loading a scene-graph to a second thread, regardless of the use of OpenGL loading commands (like building of VBOs).
- Because there is no use of the main thread, the main rendering loop can continue to run at full speed (assuming the user has a dual-core machine).
(This information comes from in-field tests; if you work on the GL implementation and spot something wrong, or if you find in deployment that this doesn't work, please let me know. Also a warning: the latest version of this code hasn't been in-field very wrong - in the past edge cases have appeared that have caused us to redesign threaded loading more times than I'd like to admit.)
A warning on thread safety: OpenGL's asynchronous nature (sequential deferred execution within a context) leads to some tricky threading bugs that you wouldn't normally see - see this blog post for examples of what can go wrong.
Terminology
While the structures differ between AGL/CGL (Macintosh), WGL (Windows) and GLX (Linux/X11) there are some common features in the way the window manager and the GL interface.
- "Pixel Format." All three operating systems define a pixel format as an opaque description of a series of properties for a rendering context. In my tests, I use the same pixel format for all contexts.
- "Drawable." Basically any window, full screen destination, or off-screen hardware accelerated buffer is a "drawable", and it defines where the result of GL commands to a context end up.
The Macintosh is probably the easiest OS to set this up on, because you don't need to use a drawable for the loader contexts.
- Create new contexts for each worker thread, using the main rendering context as a source for shared objects.
- In the start of the worker thread, set the worker thread's context to be current.
On Windows, you'll need to do all of the creation work inside the worker thread:
- Get a DC from the main window. This should create a new DC that your worker thread can use. (There is apparently nothing wrong with requesting multiple DCs from a single window.)
- Create a new context for the worker thread.
- Set the new context to share objects with your main context.
- Set the worker context to current using the DC you acquired.
Also note that you can't use window-owned or class-owned DCs in this design because either of those will cause you to get back one DC every time you call GetDC and you really need a unique DC for each thread. Be sure to release the DC on the same thread you use to acquire it!
Linux
Linux is similar to the Macintosh, except that the worker threads need a drawable. For this I used p-buffers.
- The main thread creates a new context, shared with the main context.
- The worker thread makes a new small pbuffer for itself.
- The worker thread sets the worker context as current, using the pbuffer as its drawable.
Alternatives
The alternative to this design is to use an RPC-like mechanism to execute all OpenGL activity in the main thread, e.g. once per frame pick up any OpenGL work to be done from worker threads, run it, then let them continue. This has some advantages and disadvantages:
- On the pro side, it doesn't require multiple contexts - you may find that some drivers do better executing everything on one context.
- OpenGL work is done at the same time in every frame, possibly resulting in more controlled operation, particularly on single-threaded machines.
- On the con side, loader code that needs to create OpenGL objects before proceeding (to keep memory footprint down, for example) may operate slowly since it has limited access to the GL.