Wednesday, August 13, 2008

Fixing Shadow Maps

In the old days of stencil shadow volumes, the code to draw the scene might look something like this:
1. For each light source
2. Clear the stencil buffer
3. For each shadow caster
4. Draw its shadow volume
5. Draw the scene
The big problem here is the "shadow volumes" - that's geometry that may be as complex as the scene, but isn't the scene, and changes as the sun moves. We can get clever and try to write a geometry shader, but in the old day you had to build your shadow volumes. Ouch.

Another problem with stenciled shadow volumes is that "fake" geometry shaped by transparency is ignored.

Shadow Maps

Shadow maps bake distance calculations into a texture, from the light source's point of view. But we have two ways to code this, both pretty painful:
1. For each light source
2. Draw the scene (depth only) from the light's 3.perspective
3. For the scene
4. Draw the scene with a shader that knows about the shadow maps
Of course while that looks nice and fast, that shader features one shadow calculation per light source.

An alternative is to build the scene in passes:
1. For each light source
2. Draw the scene (depth only) from the light's perspective
3. Draw the scene with a shader to accumulate that light's contribution
The first option uses a huge shader and a lot of VRAM; the second one hits the scene graph a lot. The second case might be a win if we can use our scene graph culling code to draw only the part of the scene that could be affected by the light (assuming the culling works that well).

It should be noted that for one directional light (e.g. the sun) the shadow map case is great: we need one texture, one extra pass, and we automatically get shadows on everything. Transparent geometry is not shadowed, and we don't have to ask "how do I make a stencil volume" for everything that might show up in the scene graph. Render time for the shadow pass can be very fast because we can skip texturing and color write-out.

The Problem With Shadow Maps

The problem with shadow maps is shadow resolution. You lose resolution in 3 dimensions: the width/height of the shadow map (which shows up as big blocky pixels in the shadow) and "depth" - if the precision of depth values isn't good enough, "close calls" between the occluder and occludee are not resolved correctly due to rounding errors.

Depth precision is less of a factor now that modern hardware supports formats like R32F (a single 32-bit floating point channel) but horizontal resolution is limited by texture size; even a 2048x2048 texture doesn't provide much detail over more than about 500 meters of "world". For a flight simulator, that's not enough.

To make matters worse, the shadow texture's position is based on the light source, not the viewer's camera. This means that often the most detailed part of the shadow map is somewhere useless (like far away), while the ugly crude part is right in front of the user. There are a number of algorithms, like trapezoidal shadow maps (TSM) that try to reshape the shadow map to use the pixels better, but if your scene graph is huge, this isn't an improvement of the right magnitude.

Splitting the Lights

One way to solve the resolution problem is to break the light source into multiple "lights", each shadowing a different part of the scene. (Think of a directional light as really bounded by a cube, because this is the area the shadow map casts a shadow for. We are making many small cubes that cover the volume of the original light cube, which had to be big enough to cover the whole scene.)

This approach gives us a potentially arbitrary improvement in resolution, but it puts us on the "multiple light" scenario - two passes over (at least part) of the scene graph per light. If you have efficient culling there might be a win, but shadows are cast, so a small light volume can end up shadowing a large area at some light source angles.

Stenciled Shadow Maps With GBuffers

I'm not sure if anyone has done this before - for all I know it is a known production technique and I just haven't found the paper. The technique I describe here uses a G-Buffer to save on iterations over the scene graph, and the stencil buffer to save on shadow map memory. If you already have a G-Buffer for deferred shading, this works well.

What Is a G-Buffer

A G-Buffer is a series of frame buffers which encode everything you could want to know about a given pixel. Typically for a given pixel X, at a minimum you would know the XYZ position in eye space, the normal vector, and the unlit ("raw") color of the source point on the source geometry that filled pixel X.

In other words, a G-Buffer is sort of a spatial index, giving you constant time access to your scene graph, given screen-coordinate X & Y as an input. With deferred shading, you can now create light effects - for a given screen-space pixel, you know what a given light would have done to that pixel, because you still have its normal, position, original color, etc.

G-Buffering is a win for lighting because we can radically reduce the number of pixels for which per-pixel lighting is applied.
  • Because we will apply lights later in screen space (by reading the G-Buffer to get the normals, positions, etc. of the pixls involved) there is never any overdraw. If your scene was created by drawing the screen 4x over, the G-Buffer contains the final result, only the pixels that showed up. So if you have 4x overdraw, G-Buffering represents a 4x reduction in lighting calculations.
  • If a light has limited scope (e.g. it is directional or attenuated), you only have to apply the lighting calculation to the small set of pixels that it could affect. With traditional lights, every pixel in the scene must consider every light (or you must turn the lights off and on very carefully). With G-Buffering, the "second-pass" of accumulating lighting can be done for small screen areas by drawing a trivial bounding volume for the light. This means that small lights are much, much cheaper in terms of pixel fill rate.
Be warned that G-Buffering also has three weaknesses that you have to eat just to get started:
  • To draw the scene itself, you have to fill a huge amount of data - perhaps 12 to 16 floating point values per pixel. That's a much higher fill rate to the framebuffer than you'd have with conventional texturing (or even HDR). So you'll pay a price just to have a G-Buffer. You only win if you save enough lighting calculations later to justify the G-Buffer now.
  • FSAA is not an option; the G-Buffer needs exact values from precisely one scene graph element for each pixel. So you may need to render at a higher res (ouch) or apply post-processing to smooth out geometry. (Post processing is recommended in GPU Gems 2.)
  • Alpha-blending works, um, badly. Basically you only get the final values of the front-most pixel. So you will have to compromise and accept a hack...for example, the front-most pixel's lighting calculations applies to the blend of the texture colors of all contributing pixels. If this hack ruins your content, your only alternative is to fully render the scene before blending, then do a second pass - at that point the G-Buffer is probably not a win.
G-Buffering and Shadow Maps

G-Buffering works well with shadow maps; you can "project" a shadow map onto a scene using only the contents of the G-Buffer, not the original scene itself. (To project the shadow you really only need object position, which is in the G-Buffer.)

But what about shadow map resolution? We can use the stencil buffer to accumulate the shape of the shadow cast by each shadowing element in the scene graph, and then bake the light. The full rendering loop would look like this:
1. Draw the entire scene to the G-Buffer
2. For each light source
3. Clear the stencil buffer
4. For each shadow-casting object (for this source)
5. Draw shadow map for this object
6. Project shadow map into stencil buffer using G-Buffer
7. Accumulate contribution of light, using stencil buffer
Note that for the sun-light case, our time complexity is relatively similar to the original shadow-map algorithm: step 5 at the worst case involves drawing a shadow map for the entire scene. (I'm waiving my hands - we'll be filling a lot more pixels, because we've got a full-sized shadow map for individual elements. It's the equivalent of being allowed to have a truly enormous shadow map.)

But what's interesting about this is that the cost of the steps aren't necessarily as high as you'd think in terms of fill rate.
  • Step 5 (drawing the shadow map for one object) can use as many or as few pixels as we want -- we can choose our shadow map size based on the position of the object in eye space! That is, we can finally spend our shadow fill rate budget on things that are close to us. Since each object gets its own shadow map, our max resolution can be relatively high.
  • Step 6 requires us to draw some kind of "shadow casting volume" onto the screen - that is, to cover the pixels that could be shadowed by a given occluder. This has the potential to be a lot less than the entire screen, if we can draw relatively sane volumes. (But the only penalty for a cruder, larger volume is a few more shader ops.)
  • Step 7 requires us to burn in the light, which (as with all G-Buffering) only requires touching the pixels that the light could illuminate; so we get a win for lights with limited range or tight focuses.
This algorithm has some nice scalability properties too:
  • Because we know where in the final scene's eye space we are working, we can tune quality on a per-shadow basis to spend our computation budget where we need it. (E.g. turn off PCF for far-away shadows, use smaller maps, don't shadow at all.)
  • We can even make content-level decisions about shadows (e.g. shadow only buildings).
  • The algorithm trades quality for time. Traditional shadow maps are limited by maximum texture size, which is a painful way to scale. This lets us simply trade a faster CPU/GPU combo for better results, with no maximum limit.
Future Parallelization

If we ever get GPUs that can dispatch multiple command streams, the inner loop (4,5,6) could be parallelized. Essentially you'd have a few shadow map textures, and you'd pipeline steps 5 on multiple cores, feeding them to step 6. So a few GPUs would be prepping shadow maps, while one would be baking them into the stencil map. This would let us scale up the number of shadow-casting elements.

Monday, August 11, 2008

Perspective Correct Texturing in OpenGL

99.9% of the time, you don't have to worry about perspective correct texturing in OpenGL. If you draw a textured rectangle, it will look correct from any angle. But there are a few cases where you need to manage perspective yourself.

I blogged here on what perspective vs. non-perspective texturing looks like when a rectangle is deformed into another shape. This blog doesn't mention a minor detail: OpenGL will never give you that middle image - instead what you get is something like this. (What you see is the two triangles that OpenGL decomposes the quad into.)

Creating Perspective with the Q Coordinate

If you can easily identify the amount of "squishing" you have applied to one end of a quad, you can use the "Q" coordinate to produce perspective correct textures. We do this in X-Plane: the snow effect is a cylinder that wraps around the plane. In order to assure that the entire screen is covered, we close up one end to make a cone in some cases. When we do this, the quads that make the sides of the cylinders become trapezoids, then triangles. By using the Q coordinate technique, we keep our texture perspective correct.

There are two caveats of this technique:
  1. You can't really make the texture go to infinite distance (that would be a Q coordinate of zero). If you do, all texture coordinates are zero and some of your texture information is lost. To hack around this, we clamp to a very small minimum Q. Don't tell anyone!
  2. This technique will produce perspective changes in both directions of the texture! This is the "correct" perspective thing to do, but may be surprising. In particular, as we make the far end of our cylinder into a cone, because visually this is the equivalent of making it stretch out to much further away, the texture is stretched along the cylinder, making snow flakes that are close to us very tall and ones that are far away very short.
Why does this technique work? Well, basically texture coordinates are 4-component vectors, with the S & T coordinates you are used to divided by the Q coordinate (the 4th component). By having a smaller Q component, the division makes the S & T effectively larger, which means more texture in a small area, which looks "farther away".

But the real magic is in how interpolation happens. When you specify a per-vertex variable that is "varying", it is interpolated across the polygon. In our case, the Q coordinate is interpolated, creating a pattern of shrinking Q's that stretches the texture dynamically. The Y=1/X shape of this curve creates "perspective".

Creating Perspective Via Matrices

This snippet of code is the nuclear weapon of perspective. You simply give it a trapezoid (four corners) in 2-d and it calculates a matrix that applies the texture in a manner that looks like the appropriate perspective. In other words, it's effectively calculating the vanishing points.

We use this in X-Plane for the skewed instruments - when the user drags the instrument corners, we simply calculate a matrix that creates the correct perspective.

The snippet of code contains comments with the derivation - one interesting property: given 8 variables in a 2-d perspective matrix and 8 variables (four 2-d corners) there can exist only one perspective matrix for any given trapezoid.

One warning if you decide to use this with OpenGL: make sure your computed transform matrix doesn't produce negative W coordinates! (To do this, simply apply the resulting matrix to one of your input corners and check the resulting W. If it is negative, negate the entire matrix.)

I use LAPACK to solve the matrix in my version of this code, and LAPACK doesn't make a lot of guarantees for the "constant" term (a constant factor applied to all matrix values). If that constant is negative, you will get output coordinates where X, Y, Z and W are all negative - the perspective division would cancel this negation, but at least on the machines I've used, the negative "W" causes anarchy long before we get that far.

The reason you can use a projection matrix anywhere in OpenGL is because projection fundamentally relies on storing information in the "W" coordinate (the 4th coordinate) to be divided later - the "W" just gets carried along for the ride and then later produces the correct perspective effect.