So this is what I learned today: when it comes to geometry shaders and the 8800, triangle strips matter. Now after you read the details, this will seem so obvious that you can only conclude that I am a complete dufus (something I will not necessarily dispute). But the 8800 (like most modern cards) is so bloody fast that triangle strips are actually not a win in almost all other configurations.
The test: a mesh of 1000 x 1000 quads (each in turn is two triangles), being rotated. Using a single static vertex buffer with static indexes, this runs at around 50-55 fps. Each vertex has 8 components (XYZ, normal, texture ST).
Now some numbers:
- The baseline is around 54 fps.
- Cutting the geometry to a 500x500 mesh brings us to around 204 fps, which is what we expect for a vertex-bound operation. The pixel shading has been kept intentionally simple to achieve this result.
- Using a geometry shader which simply passes through the geometry has no affect on fps.
- Cutting the mesh to 500x500 and using a geometry shader that splits one triangle into four by emitting 12 vertices and 4 primitive (e.g. tris) ends runs at a creeping 25 fps.
- Cutting the mesh to 500x500 and using a geometry shader that splits one triangle into four by emitting eight vertices and 2 primitives (e.g. strips) runs at 68 fps.
- When using this strip-based geometry shader, sorting the mesh indices by strip format (e.g. 0 1 2 2 1 3 2 3 4 4 3 5) improves fps to 73 fps or so. When not using the geometry shader, this strip sorting has no impact
Let's tease that mess apart and see what it means. Basically my goal was to test the performance of "dynamically created" geometry (e.g. creating more vertices from less using a geometry shader) vs. "mesh updating" (e.g. periodically re-tessolating the mesh and saving the results to new VBOs. The later technique's best performance is simulated by the 1000x1000 VBO in VRAM; the former by the geometry shader.
As you can see, geometry shaders can outperform straight VBO drawing, but only if they are set up carefully. In particular, you can't have multiple-separate-triangle primitives in a geometry shader output, so if we want to draw distinct triangles, we have to end a lot of primitives. There is also no vertex indexing out the back of a geometry shader, so strips are a win.
(Contrast this to drawing out of a VBO - with indexing and multiple triangles per call, and a huge cost to restarting primitives, GL_TRIANGLES indexed is usually faster than strips.)
What's surprising here is not that strips are faster in the geometry shader, but that they are so much faster! With strips we've cut down the geometry data by about 30% (from 12 vertices to 8), but we get an almost 3x improvement in throughput. My theory is that emitting fewer primitives is what wins; we've cut down geometry and cut the number of primitives in half.
The moral of the story is: it pays to find a way to strip-ify the output of geometry shaders.