The idea of instancing is to pay once in CPU time for the entire set of instances that share the same art assets, rather than paying once per instance. Since CPU time is often the limiting factor on rendering, this lets us get a lot closer to the geometry throughput the card is capable of.
Now the simplest way to draw a lot of stuff is something like this:
This is instancing by matrix transforms, and it is surprisingly faster than you'd think. (That is, I am surprised that the matrix transforms aren't more expensive.) But we are still paying per instance.
Set up all GL state first
For each instance
Before continuing, I cannot say this strongly enough: no state change inside the for loop!!! I say this because if you allow state change but try to minimize it, it can sneak up on you. I went to inventory X-Plane's objects for "inner loop state change", expecting to find about 20% of our urban-area art assets doing this. As it turns out, it's more like 80% that are doing this. We're losing 30% or more fps due to this state change.
Immediate Mode Instancing
Immediate mode instancing goes something like this:
In conjunction, your vertex shader decodes the vertex attributes that the matrix is being passed down in.
Set up all OpenGL state
For each instance
glVertexAttrib (to set matrix transform)
It turns out that this code is at least 30% faster than matrix-transform instancing on OS X. And in hindsight this shouldn't be surprising. I would expect the built in matrix stack to be uniform state, as it isn't expected to change per vertex. And I would expect the update of attribute state to be faster than the update of uniform state.
It may also be that the GL has to do legacy processing with matrices (such as computing inverse matrices) that can be avoided.
Why would you use immediate mode instancing instead of real hardware instancing? Well, if you are on an OS that, despite the availability of hardware instancing for years, doesn't provide the extension, immediate mode instancing provides a useful half-way point.
(In particular, you can pull your immediate mode instance values directly out of the instance array you would have used.)