So first of all, if libdispatch and X-Plane's thread pool got into a fight, libdispatch would thoroughly kick our asses. The X-Plane thread pool is a simple cross platform message queue and pool system - totally CS101. libdispatch is highly performance tuned, provides a wide variety of dispatch semantics, and tries a lot harder to be low overhead.
I'm feeling okay about that. The X-Plane thread pool took about two days to code, and pretty much just works the way we want. In particular, the overhead may be higher than libdispatch, but we know what we put in our thread pool, and we know our work items are fairly big; thread pool overhead doesn't show up in a profile.
There is one trick that we can do that libdispatch doesn't do (and probably shouldn't): inspecting the work queue. Our pool lets a client "edit" the queue, going in and pulling out any queued items that it has a grudge against. The main thread pool is locked for editing, so editing has to be fast and worth it.
Why allow such a weird operation? Well, X-Plane's work items are fairly expensive. And there is one case where this is a win. When it comes time to throw out a whole hunk of scenery, we need to wait until all of its enqueued mesh-building work tasks are done. We also, coincidentally, happen to throw out scenery at the same time that we shift the coordinate system, which is something you don't want to do while meshes are being built. The whole fire drill goes something like this:
- Insert a "barrier" into the queue. This is a custom object that basically clogs all of the worker threads. Once the barrier is in place we know that our worker threads are idle.
- "Edit" the queue removing any enqueued (but not running) tasks that belong to scenery that is going to get thrown out. This is strictly an optimization, but a nice one.
- Wait for the barrier to fall into place. Basically we need to wait for each core to finish whatever it was doing.
- At this point, the scenery system is "frozen". We can throw out any old scenery, and shift the coordinate system. (In fact, our barrier let's us reuse the hijacked, sleeping worker threads, so they do the shift.
- Pull the barrier out and let things resume as normal.
- Ref-count all of the scenery. The "tasks" to operate on the scenery add a ref-count, effectively deferring deallocation of each mesh until it's task has completed. This sucks for one reason: memory is usually rather tight, and we might end up "ballooning" memory as new scenery is loaded while old scenery hasn't gone down to zero references and been thrown out.
- Simply wait for all of the tasks we care about to complete before we throw out old scenery. If the shift and load is waiting on this (see above about memory) then this sucks too - our tasks might be at the back of a very long queue.
- Option 3 is our "editing" trick. By first inserting the barrier and then editing out any tasks in the queue, we guarantee that by the time our barrier stops threaded operation, every task referencing the scenery we want to throw out has either been (a) run to completion or (b) edited out of the queue before it could run.
- When we detect that we'd like scenery to shift, we mark the tiles we don't like as "dead" and tag each one with a completion task - the tile we'd like loaded when it is destroyed.
- The dead tile stops queuing async work.
- When the last task for the tile completes, it signals the tile, which queues itself for an asynch-destroy operation.
- When the tile destroys itself the last thing it does is queue an async load of the new tile.