Here's the fine print:
- Calls to OpenGL from the same thread on the same context are executed in-order.
- Calls from multiple contexts (and multiple threads always means multiple contexts) can execute out of order from the order they were issued.
- All calls before a flush are executed before all calls after a flush, even across contexts, as long as the renderer is the same. (At least I think this is true.)
What ends up happening is that the VBO create code is rendered after the VBO draw code, because in-order execution is not guaranteed! This usually causes the driver to complain that the VBO isn't fully built. The solution is to flush after the VBO is created, which forces VBO creation to happen before any future calls.
In the second example, a PBO (buffer of imag data) is filled on the rendering thread, then sent to a worker thread to extract and process. We have the same problem: because we're across threads, the OpenGL extract operation can execute before the fill operation. Once again, flushing after the first thread-op synchronizes us.
Generally speaking, OpenGL isn't a great API for threading. It has very few options for synchronization - glFlush has real (and often quite negative) performance implications and it's a real blunt tool.
There is a 'fence' operation that allows a thread to block until part of the command-stream has
finished but it isn't real useful because:
- It's vendor specific (all Macs, NV PCs) so it's not everywhere. It's not fun to use two threading models in an app because the threading primitives only exist in one.
- It's not cross-thread! The fence is not shared between contexts, so we can't block a worker thread based on completion in a rendering thread - we have to block the renderer itself, which sucks.
Teh result is that async queries (PBO framebuffer readbacks, occlusion queries) have a full frame before we ask for them, which helps prevent blocking.