The Hacks of Life: 11/01/2012

Monday, November 19, 2012

Deferred Weirdness: What Have We Learned?

To paraphrase the famous Nike Tiger Woods ad, "...and, did you learn anything?"

X-Plane is more of a platform than a game - we support content that comes from third parties, is not updated in synchronization with the engine itself, is sometimes significantly older than the current engine code, and is typically created by authors who have limited or no access to the engine developers.

The easiest, quickest, most efficient way to make deferred rendering work is to set the rules first and customize the content creation process around them. X-Plane's platform status makes this impossible, and most of the nuttiness in our deferred pipeline comes from bending the new engine to work with old content.

We have a number of design problems that are all "difficult" in a deferred engine:

A wide Z buffer range means we need to render twice.
Content is often permeated with alpha but still needs to be part of a deferred render.
Some content must be linearly blended, some must be sRGB blended, and there isn't any wiggle room.

Coping with any one of these problems is quite doable within a deferred renderer, but coping with all three at once becomes quite chaotic. Having to maintain two blending equations (sRGB and linear) is particularly painful.

(This sits on top of the typical problems of being geometry bound on solid stuff and fill-rate bound on particles.)

So if there's a "learn from our fail"* in this, it might be: try to limit your design problem to only one "rough edge" in the deferred renderer if possible - it's not hard to put a few hacks into a deferred renderer but keeping several of them going at once is juggling.

* I should say: I'm just being snarky with "learn from our fail" - I consider our deferred rendering solution a success in that it provides our users with the benefits of deferred rendering (huge numbers of lights, better geometric complexity, soft particles, and there are other techniques we're just starting to experiment with) while supporting legacy content with very little modification.

The code is under stress because it does a lot, and while it's cute for me as a graphics programmer to go "oh my life would be easier if I could ignore alpha, color space, and go drink margaritas on the beach", the truth is that the code adds a lot of value to the product by provingg a lot of features at once.

I just don't want to have to touch it ever again. :-)

Sunday, November 18, 2012

Deferred Weirdness: When Not to be Linear

In my previous post, I described X-Plane 10's multipass scheme for deferred rendering. Because we need to simultaneously operate in two depth domains with both deferred and forward rendered elements, the multi-pass approach gets very complicated, and much stenciling ensues.

But the real wrench in the machinery is sRGB blending. It was such a pain in the ass to get right that I can safely say it wouldn't be in the product if there was any way to cheat around it with art.

Color Space Blending

First: when I say "sRGB" blending, what I mean is: blending two colors in the sRGB color space. sRGB is a vaguely perceptual color space, meaning equi-distant numeric pixel values appear about equi-distant in brightness - if you saw a color ramp, it would look "even" to you. It's not! The sRGB stripe looks even to humans because we have increased visual sensitivity in the darker brightness ranges. (If we didn't, how could we stumble for the light switch and trip over the dog at night?)

In the sRGB world you mix red (255,0,0) and green (0,255,0) and get "puke" (128,128,0).

Linear color spaces refer to color spaces where the amount of photons or physical-whatever-thingies are linear. Double the RGB and you get twice as many photons. Linear spaces do not look "linear" to humans, see above about dark and dogs.

In linear space if you mix red (255,0,0) and green (0,255,0) you get yellow (186,186,0 if we translate back to sRGB space).

When to Be Linear

There are two cases where we found we just absolutely had to be linear:

When accumulating light from deferred lights, the addition must be linear. Linear accumulated lights look realistic, sRGB accumulated lights look way too stark with no ambiance.
When accumulating light billboards, it turns out that linear also looks correct, and sRGB additive blending screws up the halos drawn into the billboards.

In both cases, linear addition of light gives us something really important: adding more light doesn't double the perceived brightness. (Try it: go into a room, turn on one light, then a second. Does the room's brightness appear to double? No!) For billboards, if a bunch of billboards "pile up" having the perceptual brightness curve taper off is a big win.

When to Be sRGB

The above "linear" blending cases are all additive. Positive numbers in the framebuffer represent light, adding to them "makes more light", and we want to add some kind of linear unit for physically correct lighting.

But for more conventional "blending" (e.g. adding light on top takes away light underneath) linear isn't always good.

Traditional alpha blending is an easy for artists to create effects that would be too complex to simulate directly on the GPU. For example, the windows of an airplane are a witches brew of specularity, reflection, refraction, absorbtion, and the BDRF changes based on the amount of grime on the various parts of the glass. Or you can just let your artist make a translucent texture and blend it.

But this 'photo composited' style blending that provides such a useful way to simulate effects also requires sRGB blending, not linear blending. When an 'overlay' texture is blended linear the results are too bright, and it is hard to tell which layer is "on top". The effect looks a little bit like variable transparency with brightness, but the amount of the effect depends on the background. It's not what the art guys want.

Alpha When You Least Expect It

Besides the usual sources of "blended" alpha (blended meaning the background is darkened by the opacity of the foreground) - smoke particles, clouds, glass windows, and light billboards - X-Plane picks up another source of alpha: 3-d objects alpha fade with distance. This is hugely problematic because it requires the alpha blending you get when you put alpha through the deferred renderer to be not-totally-broken; we can't just exclude all 3-d from the deferred render.

I mention this now because we have to ask the question: what blending mode are we trying to get in the deferred renderer (to the extent that we have any control over such things)? The answer here is again sRGB blending.

Faking Alpha in a GBuffer

Can we fake alpha in our deferred renderer? Sort of. We can choose to blend (or not blend) various layers of the G Buffer by writing different alphas to each fragment data component; we can gain more flexibility by using pre-multiplied alpha, which lets us run half the blending equation in shader (and thus gives us separate coefficients for foreground and background if desired.

Fake alpha in a GBuffer has real problems. We only get one 'sample' per G-Buffer, so instead of getting the blend of the lighting equation applied to a number of points in space, we get a single lighting equation applied to the blend of all of the properties of that point. But blending is better than nothing. We blend our emissive and albedo channels, our specular level, our AO, and our normal. (Our normal map is Lambertian Azimuth equal area and it blends tolerably if you set your bar low.)

The only channel we don't blend for our blended fragments is eye position Z; even the slightest change to position reconstruction causes shadow maps to alias and fail - hell, shadow maps barely work on a good day.

The G-Buffer blending is all in sRGB - the albedo and emissive layers are 8-bit sRGB encoded.

Adding It All Up

The emission and albedo layers of the G-Buffer must be added in sRGB space. This is not ideal (because emission layers contain light) but it is necessary. Consider two layers of polygons being drawn into a G-Buffer. The bottom is heavy on albedo, the top heavy on emissive texture. As we "cross-fade" them with alpha, we are actually darkening the albedo and lightening the emission layer - two separate raster ops into two separate images. This only produces a correct sRGB blend if we know that the two layers will later be added together in sRGB. In other words:

blend(A_alb+A_lit,B_alb+B_lit,alpha)

is only equal to

blend(A_alb,B_alb,alpha)+blend(A_lit,B_lit,alpha)

if the blending and addition all happen in the same color space. The top equation is how "blended" geometry work in a forward renderer (albedo and emissive light summed in shader before being blended into the framebuffer) and the bottom equation is how a deferred renderer looks (blending done per layer and the light addition done later on the finished blend).

Once we add and blend in sRGB space in our deferred renderer a bunch of things do work 'right':

Alpha textures that can't be excluded from the deferred renderer that need sRGB blending work, as do alpha fades with distance.
We can mix & match our emissive and albedo channels the way we would in a forward renderer and not be surprised.
Additive light from spills is still linear, since it is a separate accumulation into the HDR framebuffer.

There is one casualty:

We cannot draw linear additive-blended "stuff" into the deferred renderer.

This last point is a sad casualty - looking at the 11-step train-wreck of MRT changes in the previous post, if we could draw linear-blended stuff into the deferred renderer (even if just by using framebuffer-sRGB) we could save a lot of extra steps. But we would lose sRGB alpha blending for deferred drawing.

I will try to summarize this mess in another post.

Saturday, November 17, 2012

Deferred Weirdness: Collapsing Two Passes

X-Plane's deferred pipeline changed a lot in our 10.10 patch, into a form that I hope is final for the version run, because I don't want to have to retest it again. We had to fix a few fundamental problems.

Our first problem was to collapse two drawing passes. X-Plane needs more precision than the Z buffer provides. Consider the case where you are in an airplane in high orbit, in your 3-d cockpit. The controls are a lot less than 1 meter away, but the far end of the planet below you might be millions of meters away. With the near and far clip planes so far apart (and the near clip plane so close) there's no way we avoid Z thrash.

X-Plane traditionally solves this with two-pass rendering. Because an airplane cockpit is sealed, we can draw the entire outside world in one coordinate space, blast the depth buffer, and then draw the interior cockpit with reset near/far clip planes. The depth fragments of the cockpit are thus farther than parts of the scenery (in raw hardware depth buffer units) but the depth clear ensures clean ordering.

(This technique breaks down if something from the outside world needs to appear in the cockpit - we do some funny dances as the pilot exits the airplane and walks off around the airport to transition, and you can create rendering errors if you know what to look for and have nothing better to do.)

The Dumb Way

So when we first put in deferred rendering, we did the quickest thing that came to mind: two full deferred rendering passes.

(Driver writers at NVidia: please stop bashing your heads on your desks - we're trying to help you sell GPUs! :-)

Suffice it to say, two full deferred passes was a bit of a show-stopper; deferred renderers tend to be bandwidth bound, and by consuming twice as much of it as a normal, sane game, we were destined to have half the framerates of what our users expected.

No Z Tricks

Unfortunately, I didn't find a non-linear Z buffer approach I liked. Logarithmic Z in the vertex shader clips incorrectly, and any Z re-encoding in the fragment shader bypasses early-Z optimizations. X-Plane has some meshes with significant over-draw so losing early Z isn't much fun.

Particle + HDR = Fail

There was a second performance problem that tied into the issue of bandwidth: X-Plane's cloud system is heavy on over-draw and really taxes fill rate and ROPs, and in the initial pipeline it went down into an HDR surface, costing 2x the memory bandwidth. So we needed a solution that would put particle systems into an 8-bit surface if possible.

One Last Chainsaw

One last chainsaw to throw into the mix as we try to juggle them: our engine supports a "post-deferred" pass where alpha, lighting effects, particles, and other G-buffer-unfriendly stuff can live; these effects are forward rendered on top of the fully resolved deferred rendering. We have these effects both outside of the airplane and inside the airplane!

Frankenstein is Born

The resulting pipeline goes something like this:

We have a G-Buffer, HDR buffer, and LDR buffer all of the same size, all sharing a common Z buffer. The G-Buffer stores depth in eye space in half-float meters, which means we can clear the depth buffer and not lose our G-Buffer resolve.
Our interior and exterior coordinate systems are exactly the same except for the near/far clip planes of the projection matrix. In particular, both the interior and exterior drawing phases are the same in eye space and world space.

We pre-fill our depth buffer with some simple parts of the cockpit, depth-only,with the depth range set to the near clip plane. This is standard depth pre-fill for speed; because the particle systems in step 4 will be depth tested, this means we can pre-occlude a lot of cloud particles with our cockpit shell.
We render the outside solid world to the G-Buffer.
We draw the volumes of our volumetric heat blur effects, stencil only, to "remember" which pixels are actually exposed (because our depth buffer is going to get its ass kicked later).
We draw the blended post-gbuffer outside world into the LDR buffer, using correct alpha to get an "overlay" ready for later use. (To do this, set the alpha blending to src_alpha,1-src_alpha,1,1-src_alpha.) This drawing phase has to be early to get correct Z testing against the outside world, and has the side effect of getting our outside-world particles into an LDR surface for performance.
We draw our light billboards to our HDR buffer.
We clear the depth buffer and draw the inside-cockpit solid world over the G-Buffer. We set stenciling to mark another bit ("inside the cockpit") in this pass.
We draw the heat markers again, using depth-fail to erase the stencil set, thus 'clipping out' heat blur around the solid interior. This gets us a heat blur stencil mask for later that is correct for both depth buffers. (Essentially we have used two stenciling paths to 'combine' two depth tests on two depth buffers that were never available at the same time.)
We go back to our HDR buffer and blit a big black quad where the stencil marks us as "in-cockpit". This masks out exterior light billboards from step 5 that should have been over-drawn by the solid cockpit (that went into the G-Buffer). This could be done better with MRT, but would add a lot of complexity to already-complex configurable shaders.
We "mix down" our G-Buffer to our HDR buffer. Since this is additive, light billboards add up the way we want, in linear space.
We draw another stenciled black quad on our LDR buffer to mask out the particles from step 4.
Finally, we render in-cockpit particles and lights directly into the LDR buffer.

Yeah. I went through a lot of Scotch this last patch.

A few observations on the beast:

That's a lot of MRT changes, which is by far the weakest aspect of the design. We don't ever have to multi-pass over any surface except the depth buffer, but we're still jumping around a lot.
The actual number of pixels filled is pretty tame.
Night lighting is really sensitive to color space, and we picked up a few steps by insisting that we be in exactly the right color space at all times. Often the difference between a good and bad looking light is in the 0-5 range of 8-bit RGB values! When lights are rendered to a layer and that layer is blended, we have to be blending in linear color space both when we draw our lights and when we composite the layer later!

In particular, there's one really weird bit of fine print: while spill lights accumulate in our HDR buffer linearly (which is a requirement for deferred lighting), pretty much every other blending equation in the deferred engine runs in sRGB space. That's weird enough that it still surprises me, it makes everything way more complicated than it has to be, and I will describe why we need sRGB blending in the next post.

Friday, November 16, 2012

Deferred Lighting: Stenciling is not a Win

I've been meaning to write up a summary of the changes I made to X-Plane's deferred rendering pipeline for X-Plane 10.10, but each time I go to write up an epic mega-post, I lose steam and end up with another half-written draft, with no clue about what I meant to say. So in the next few posts I'll try to cover the issues a little bit at a time.

One other note from the pipeline work we did: using the stencil buffer to reject screen space for deferred lights is not a win in X-Plane.

The technique is documented quite a bit in the deferred rendering powerpoints and PDFs. Basically when drawing your deferred lights you:

Use real 3-d volumetric shapes like pyramids and cubes to bound the lights.
Use two-sided stenciling to mark only the area where there is content within the light volume. A second pass over the volume then fills only this area.

The stenciling logic is exactly the same as stencil-shadow volumes, and the result is that only pixels that are within the light volume are lit; screen space both in front and behind the volume are both rejected.

For X-Plane, it's not worth it. YMMV, but in the case of X-Plane, the cost of (1) using a lot more vertices per light volume and (2) doing two passes over the light volume far outweigh the saved screen space.

For a few very pathological cases, stenciling is a win, but I really found myself having to put the camera in ridiculous place with ridiculously lopsided rendering settings to see a stencil win, even on an older GPU. (I have a Radeon 4870 in my Mac - and if it's not a win there, it's not a win on a GeForce 680. :-)

The cost of volumes is even worse for dynamic lights - our car headlights all cast spill and the light volume transform is per-frame on the CPU. Again, increasing vertex count isn't worth it.

For 10.10 we turned off the stencil optimization, cutting the vertex throughput of lights from two passes to one.

For a future version will probably switch from volumes to screen-space quads, for a nice big vertex-count win.

Finally, I have looked at using instancing to push light volumes/quads for dynamic objects. In the case of our cars, we have a relatively small set of cars whose lights are transformed a large number of times. We could cut eight vertices (two quads per car) down to a single 3x4 affine transform matrix.

Again, YMMV; X-Plane is a very geometry-heavy title with relatively stupid shaders. If there's one lesson, it's this: it is a huge win to keep instrumentation code in place. In our case, we had the option to toggle stenciling and view performance (and the effect on our stat counters at any time.