## Thursday, September 03, 2009

### OpenGL Matrix Tricks: The Fastest Culling Path?

In my last post I suggested that a clever way to cull would be:
• Translate the view clip planes back into eye space.
• For each cull, translate the sphere center into eye space, then test against the planes.
This idea is not my own - I first saw the idea of using plane tests against spheres in SSG - and I do not know who first thought of the idea. I can certainly say it's a great way to do culling when your camera orientation can be arbitrary.

But we have to ask: if we can translate our clip planes back to eye space, can be translate them back to model-view space and save some time?

The short answer is: I don't know - to really tell you'd have to run statistical analysis on real data; culling has a lot of early exit cases so you need to know how many early exit cases you take to know the average "cost" of all operations.

To compare the computational cost, note that if we know we are using a frustum we can make a few optimizations to the cull. (X-Plane does take these optimizations.)
• We can drop the "D" constant on our left, right, top and bottom planes because we know they pass through the origin, saving an add on each plane check.
• Because we know that the near and far clip planes are parallel to the XY plane, we can simply do a single add and compare instead of a plane equation.
• The cost of translating from model-view to eye space (3 adds, and 3 multiplies) is not paid for every time - we can transform only the Z coordinate, do a fast test against the near and far clip planes, and then only calculate X and Y in model-view space if we really need it.
If we had clip planes in model-view space:
• We can skip 3 multiplies and 3 adds per coordinate as we translate from mode-view to eye space. We never need these.
• We have to do a full four-part plane equation (3 multiplies, 3 adds, plus the add and compare for the sphere volume) for all six planes, since their orientation is now arbitrary.
Adding it all up, they are very similar in the worst case: model-view clip planes saves 9 adds and multiplies (the transform) but picks up six multiplies and ten adds due to more expensive plane equations.

But the early exit case looks better to me in the SSG case: 3 multiplies, 5 adds, and two compares to get through both the near and far clip planes vs. 6 multiplies, 8 adds and two compares for the model-view case.

There is one other possible issue with using model-view clip planes: you need a combined model-view + projection matrix to find the clip planes. Depending on how often your model-view matrix is changed and how it is calculated, this might mean calculating a combined matrix that you would not normally have to calculate.

(Whether this is really the case is going to depend a lot on how your program shadows OpenGL state, handles transforms, and what your shaders look like, as well as the ratio of culls to view transforms. For what it's worth, if you are transforming the camera a lot relative to culls, you may have other, non-culling related performance bottlenecks.)