The good news with X-Plane is that we don't have a complex legacy material system to support. If a game is well-batched, the forward renderer can associate separate shaders with batches by material, and each 'material' can thus be radically different in how it handles/transfers light. With a G-Buffer, the lighting equation must be unified and thus everything we need to know to support some common lighting format must go into the G-Buffer. (One thing you can do is pack a material index into the G-Buffer.) Fortuantely, since X-Plane doesn't have such a multi-shader beast we only had one property to save: a shininess ratio (0-1, about 8 bits of precision needed).
What we did have to support was both a full albedo and a full RGB emissive texture; additive emissive textures have been in the sim for over a decade, and authors use them heavily. (They can be modulated by datarefs within the sim, which makes them useful for animating and modulating baked lighting effects.) X-Plane 10 also has a static shadow/baked ambient occlusion type term on some of the new scenery assets that needs to be preserved, also with 8 bits of precision.
The other challenge is that X-Plane authors use alpha translucency fairly heavily; deferred renderers don't do this very well. One solution we have for airplanes is to pull out some translucent surfaces and render them post-deferred-renderer as a forward over-draw pass. But for surfaces that can't be pulled out, we need to do the least-bad thing.
Thus we pack our G-Buffer into 16 bytes:
- RGBA8 albedo (with alpha)
- RG16F normal (Z vector is reconstructed)
- RG16F depth + shadow-merged-with-shininess
- RGBA8 emissive (with alpha)
(As a side note, if we had a different layout, we could blend the shininess ratio, for example, when we keep a physical position fragment, to try to limit shininess on translucent elements.)
Note that on OS X 10.5 red-green textures are not available, so we have to fall back to four RGBA_16F textures, doubling VRAM. This costs us at least 20% fill-rate on an 8800, but the quick sanity test I did wasn't very abusive; heavier cases are probably a bit worse.
So far we seem to be surviving with a 16-bit floating point eye-space depth coordinate. It does not hold up very well in a planet-wide render though, for rendering techniques where reconstructing the position is important (e.g. O'Neil-style atmospheric scattering). A simple workaround would be to calculate the intersection of the fragment ray with the planet directly by doing a transform of the planet sphere from model-view space. (E.g. if we know that our fragment came from a sphere, why not just work with the original mathematical sphere.)
16F depth does give us reasonably accurate shadows, at least up close, and far away the shadows are going to be limited in quality anyway. I tried logarithmic depth, but it made shadows worse.
Packing Shadowing and Shine
For the static shadow/AO ("shadow") term and the level of specularity ("shine") we have two parameters that need about 8 bits of precision, and we have a 16-bit channel. Perfect, right? Well, not so much.
- NVidia cards won't render to 16-bit integer channels on some platforms.
- ATI cards don't export a bit-cast from integer to floating point, making it hard to pack a real 16-bit int (or two 8-bit ints) into a float.
- If we change the channel to RGBA8 (and combine RG into a virtual 16F) we can't actually use the fourth byte (alpha) because the GL insists on dumping our alpha values. Extended blend would fix this but it's not supported on OS X and even on Windows you can't use it with multiple render targets.
So we simply encode 256.0 * shadow + shine. The resulting quantity gives shininess over 8 bits of precision without shadow, and reduces shininess to around 2 bits in full shadow. If you view the decoded channels separately you can see banding artifacts come in on the shine channel as the shadows kick in. But when you view the two together in the final shader, the artifacts aren't visible at all because the shadow masks out the banded shininess.
(What this trick has done is effectively recycle the exponent as a 'mixer', allocating bits between the two channels.)
A future extension would be to add a fifth 4-byte target. This would give us room to extend to full 32-bit floating point depth (should that prove to be useful), with shadow and shine in 8 bit channels with one new 8-bit channel left. Or alternatively we could:
- Keep Z at 16F and keep an extra channel.
- Technically if we can accept this 'lossy' packing we can get four components into an RG16F, while we can only get 3 pure 8-bit components. (This is due to how the GL manages alpha.)
- If we target hardware that provides float-to-int bit casts, we could have four 8-bit components in the new channel.