- They are horribly dependent on the size of the objects in your world. For small objects they produce crisp shadows - for big ones they produce muck.
- Odds are the number of objects floating around (cars, buildings, etc.) is several orders of magnitude larger than the number of CSM layers you might use. I get good results with 8 CSM layers, and can probably reduce that with careful optimization. I usually have more than 8 buildings on screen. (That means a lot of thrash as we create, then use each shadow map, with GL setup each time.)
For each "layer" (that is, a distance-wise partition of the user's frustum that gets a separate shadow map) we normally calculate the shadow map's bounding cube around the corners of the user's view sub-frustum (for that layer).
But that's really a much bigger bounding box than we need. For the price of an iteration over the scene graph* we can calculate a smaller bounding box that is "pinched in" to the edge of the content we need to draw.
Is this a win? Well, it depends...on...
- The size of your scenery content. You won't get a pinch in that's much smaller than the smallest indivisible entities in your world.
- The overall shape of your scenery content - if it really fills the frustum, no win.
- Very little benefit for the nearest layers...they are usually so small (a few dozen meters) that they include much larger scenery entities. (Our ground patches can be several kilometers, so no win.) But for the far layers, we might reduce the bounding box by 25-50% in some dimensions...that's almost like doubling your texture!
- The shape of our content is a win. Since the world is sort of flat and 2-dimensional, usually at least one axis of the bounding box (depending on the sun's angle) is horribly wasted. That's where we get a win.
But in practice, there is no scenery below the ground. So we can "pinch in" the far clip plane of the sun's camera (which is really far from the sun, just in case anything below the surface of the earth is visible), bringing that far clip plane all the way up to the lowest ground point. Similarly, if we're not shadowing clouds (they are handled separately) the near clip plane can be pushed down to the tallest building.
This makes the far layers much, much more useful. Normally if the layer is 10 km away at 60 degrees FOV, the bounding box for the shadow map is going to have 10000 meters from its near to far plane. If we "pinch in", we can reduce this "depth of field" to the difference between the lowest and highest scenery items, which might only be 100 or 200 meters.
(This is of course a best-case scenario...put a mountain in there and angle the sun a bit and the win is much more modest.)
As a side effect, the scene graph traversal lets us completely eliminate layers that contain no content - I am finding I drop at least one layer that way.
EDIT: all of the above data is totally misleading - for shadowing 3-d content, the above is true. But if the terrain mesh is included (it is much larger, and its granularity is larger), the savings all vanish.
* Lazy evaluation can make this a lot faster - simply skip whole sub-trees of the scene graph that are already entirely inside the "pinched" cube)