Tuesday, February 24, 2009

Uber-Shaders: Evolution or Optimization

Let's just imagine that you have an uber-shader, and an uber-interface for it.  The uber-shader can do about 100 cool shading tricks, and is set with a struct like this:
struct shader {
 int tex_mode;
 int tex_ref;
 int want_shadows;
 int want_emissive_tex;
 int lit_ref;
A single function "setup" takes a shader struct and sets all of the OpenGL parameters to make it happen.  This function knows what the GLSL code looks like and does the right thing.

This design has been a win for us with X-Plane because:
  • The encapsulated setup function can deal with hw-specific issues.  For example, if you can approximate the shader state request using the fixed-function pipeline on old hardware, this gets hidden in "setup" and client code doesn't care.
  • Since you have access to all state at the same time, you can do things like pick from a set of customized shaders based on state combinations.  (In other words, you can create a large number of highly optimized shaders for specific cases.)

What do you do if you need to change one parameter of the shader?  The naive answer is:
In other words, you tear down OpenGL state, change the request, then build it up again.

Well, that seems inefficient, doesn't it?  What if there is a fast path?  (For example, if all you are changing is polygon offset, all you really need to do is call glPolygonOffset.

One extension to the uber-shader interface is a series of 'evolution' APIs that change a single parameter, e.g.
Naively this is equivalent to the reset/change/setup code above, but the implementation might do something clever, like only rebind the texture unit but leave the shader object alone.

Is this a win?  It seems reasonable to hope so.  For example, if the state being changed is effectively a uniform passed to the GPU (or GPU state not related to shading), we might be a lot closer to minimal state change.


What happens when your shader gets really big and complex?  One problem is that the logic in client code that sets up the shader gets big and complex.  For example: if the source texture has no alpha channel and there is no overlay texture, you can disable alpha blending.  Disabling alpha blending might be a huge performance win - maybe your app is bottlenecked on raster ops.  But having this logic everywhere in the client code isn't good - it means that you're not sure that you have ideal optimization at every shader point.

One way around this is to write an optimization function as part of the uber-shader code, e.g.
The optimizer goes through all the requested shader state and "harmonizes" it.  Because the optimizer is part of the shader setup code, the knowledge of how the shader really works is now isolated to the one place in the app that should know such things.  Now you can put fairly complex logic in place to detect fast paths and take them every time.

Clash of Optimizations

The problem with optimization vs. evolution is they don't play nice together.  The evolution functions assume that you know the start state of your shader before you change it.  But the optimization API might have changed your shader in an unexpected way.  For example:
  • You set up a shader with a texture and blending.
  • You run the optimizer on it.  The optimizer turns off blending because the texture doesn't actually have an alpha channel.
  • You run the evolution API to change to a texture that does have an alpha channel.
At this point you're screwed: blending has been turned off and is gone.

My solution to this is a bit crude but goes like this:
  • There are no evolution APIs.
  • Changing state requires changing the original shader and re-optimizing.
  • Inside the shader, all state changes are lazy and tracked (e.g. we only change GL state if we really need to).
  • We never reset state while in the middle of shader ops.
So in the above case what's going to happen is:
  1. We calculate the optimal shader for filling.
  2. When we go to change state, the "reset" of the shader actually does nothing.
  3. We calculate a new optimal shader.
  4. When we go to set up that new shader, almost all of the GL state change is a no-op.  In particular, if we could have "evolved" (E.g. really we only need to change the texture) that's all we will really change.
This design isn't perfect - it's burning CPU to calculate ideal GL state change at runtime rather than compile time.  That's the down-side.  The up-side is that we get optimal GL state under all conditions.


  1. Cool, that is an interesting idea. How many lines is your uber-shader? Do you run into limits on older hardware for maximum instructions etc?

  2. if you've got an ubersader system in place I thouth that most simple but efficient way is:
    + make all shader + shader's parameters premutation offline, store each combanation as new compiled programm. You could either validate or optimize resulting programm offline for example put unit tests for blend's aviability of particular programms.
    + at runtime bind whole programm (compiled shader + paramerer block) solid.
    + but bind throu custom state manager wich point is accepting new state(s) and cache one(s) change. Also it drops changed state(s) down to GPU as states block(group) on actual DIP command's arrived.

    Does it seem similar to your post? :)

  3. Chris: not sure, but the shaders are in the X-Plane distro in source. We do run into limitations - you'll see #defines where we eliminate some shading details for the R300. This is _despite_ pre-compiling the combinations. (That is, we overflow the R300 simply by doing too much math and texture fetching.)

    Troika: our way is similar to yours - the main difference being that we don't precompute the combinations - rather we generate them on demand.

    You are right that what we miss is a chance for static offline optimization. Unfortunately we don't have a tool that would automate that...and we have a very large number of shaders. So the only optimizations we get are ones coded into the "master source" (which gets really fugly really quick) and whatever the optimizing compiler can do.

    Fortunately compiling technology is getting better. :-)

    We do something pretty similar to the custom state manager, which abstracts whole shader-block changes vs. uniform changes, and attempts to touch minimum GL state. We don't need to be lazy about pushing state in the state manager because X-Plane already manages this very carefully at the scene-graph level from our old fixed-function days.