The distribution of precision in the

default Z buffer configuration is not ideal. Precision is distributed via a 1 / x curve. This means more precision close to the camera, and less precision far away from it (which is good) but the curve goes asymptotic near the front, so it uses up

*too much* precision. If you've had to try to tune your near clip plane, you know that bringing that near clip plane in just a little bit makes things much worse in the distance.

In order to get correct perspective rendering via a view frustum, our projection matrix has to put our eye-space Z coordinate into our clip space W coordinate. But we don't get to pick and choose - perspective divide is part of fixed function hardware, and all of X, Y and Z in clip space are divided by the same W. So if we are dividing by our eye-space Z for our screen location, we

*have* to do so for depth as well. This is what distributes our depth precision.

####
We Have Shaders - We Can Do Better

Now that we have shaders, we have the power to mess with this scheme. Let's say we want our Z buffer to contain an eye-space linear distribution. No problem! At the end of our vertex shader, we just load whatever Z we want, then multiply by our clip-space W to pre-nullify the perspective divide. Perfect!

You can use this to have a

logarithmic Z buffer, or you can use an eye-space Z buffer in floating point and in both cases you get really great distribution of depth precision. You can have a really, really close near clip plane and still have a huge depth range.

####
The Problem With Not Being Linear

All of these custom Z distributions have one thing in common: the clip space Z values distributed across a triangle are

*non-linear*. That is, if we find a point half-way along the edge of a triangle, the Z value that our formula produces will not be the same as the Z value the

*hardware interpolator* produces based on the two triangle corners.

When are you non-linear? Pretty much any time you do anything to your Z value that can't be described as Ax + By + Cz + D. So for example, for eye-space linear, if we need to multiply by eye-space Z

*again*, we're not linear.

The problem with not being linear is that the Z buffers written by the

*middle* of a triangle may be very different from where it would have been if the triangle was tessellated. If you have a large-ish terrain triangle with a small decal like a rock on top of it, the middle of the large triangle might interpolate to be too close (or too far), causing Z errors. If you have a mesh with layers that requires hidden surface removal and it isn't extremely uniform in vertex density, again, interior interpolation error can lead to Z artifacts. Any polygon that intersects the clip planes is also going to go somewhat haywire, as the clipping changes Z mid-triangle

*while the camera moves*.

####
Let's Try The Fragment Shader

The above problems are almost certainly a show-stopper for any engine that shows artist-made 3-d meshes. The solution is to move the custom Z calculation to the fragment shader, so it can be done per pixel. This works great from a correctness standpoint - every pixel results in the right Z according to your encoding, but it has a nasty side effect: you lose early-Z.

Early Z is an optimization where your GPU runs the depth-stencil test

*before* the fragment shader, instead of after. This is a huge win for any scene with significant hidden surfaces; fragment shading costs for hidden surfaces are completely removed, because the hardware doesn't even dispatch the fragments for shading. Without Early Z, your fragment shader runs, fetching textures, burning bandwidth, consuming ALU, and then throws the result out at the very end when it turns out that, oh noes, you've been drawing a secret base that's behind a closed door.

In order for early-Z to be effective, the Z coordinate out of the rasterizer has to be correct - that is, the fragment shader can't modify it. So any scheme that encodes a non-linear depth buffer in fragment shader defeats this.

The really bad news is: you don't just pay once. Now that your Z buffer is encoded with a special encoding, anything that is Z tested (even if it isn't Z-writing) has to calculate Z in the fragment shader as well. So for example, particle systems, clouds, and other fill-rate-intensive effects become more expensive.

In summary: the need to move a custom Z function to the fragment shader is caused by the very thing that gives it better depth distribution, and this defeats early Z. So you can never have early Z and a super depth buffer distribution at the same time.

There is one work-around I have seen that gets around this: use floating point depth and

reverse the near and far encoding values; since floating point has more precision at 0.0, you are using the higher precision of float where we need it (where 1/Z is imprecise).

####
Split Depth for Now

X-Plane, like many games, uses two separate Z environments to cope with a very close view (inside the cockpit, where millimeters matter) and the world (where you can be millions of meters from the horizon). This isn't ideal - the cost of separating the Z passes isn't zero.

I wrote this post up after having worked out the math behind custom Z encodings (log Z, floating point eye-space, etc.); I had a todo item to investigate whether we could move X-Plane to a single Z pass and save some overhead.

The answer is, unfortunately, no for now. Losing early Z on our particle and cloud effects is a non-starter; we'd need to use a floating point depth buffer. Unfortunately,

ARB_clip_control isn't wide-spread enough for us to count on. We'd also eat bandwidth in moving from a D24_S8 integer depth buffer to a D32F_S8 depth buffer (which pads out to 64 bits per depth sample.

One last note: X-Plane uses a floating point channel of the depth buffer to write floating point depth in eye space. While we split the actual Z buffer and draw in two passes, we build a single unified G-Buffer; our eye-space linear floating point depth has enough range to span both passes, and lets us then shade our G-Buffer in a single pass.

(When we run our two depth passes, we change

*only* the near and far clip planes and nothing else; this means we can reuse our shadow maps and share our G-Buffer. This takes most of the cost out of our two passes.)