In a performance tuning situation, sometimes you have to figure out:
- Is the problem that I'm not doing each operation fast enough or...
- Is the problem that I'm simply doing too many operations?
(Typically I will leave the operation slow while I cut down the number of operations, because having slow primitives makes it easy to observe the effects of improving algorithm time in a profiler. Then once I fix the algorithm, I tune the operation itself, possibly even temporarily disabling the algorithm improvement to make it easier to "see" what's going on when profiling.)
For WED (the X-Plane scenry tool) I decided to "code first, tune later" -- given that
the design could change a lot while in progress, it seemed best to not spend a lot of
time on code that might get thrown out. I knew that the fundamental design (a group of airports, each spatially far apart) could easily be tuned to prevent major O(n) parts just using indexing and culling.
The resulting unoptimized code was, well, simply awful. What really surprised me was how much time was being spent in dynamic_cast - pretty close to all of it.
The solution was mostly to fix stupid code, but in some cases to not use dynamic_cast.
- The biggest single problem was a lack of caching. If we have tree structure, the intermediate nodes can't ask the leaf nodes for information on every function call - the result makse time complexity worse, not better!
- When dealing with a large dataset, pass counts matter. For example, avoiding iterating on the hierarchy of airports when drawing a layer that doesn't care was a big win - it takes a long time just to visit all of that data.
I'm not a big fan of visitor, so my original design simply used dynamic cast within the specific UI construct to figure out which datamodel construct it had. It turns out the performance problem with this isn't that we're double-dispatching, but only how we do it, and for a large number of objects, dynamic_cast is simply too expensive. Providing a virtual function to get some kind of "type info" on the generic object is much faster.
So if there's any lesson on dynamic_cast here, I think it's just "dynamic_cast does some serious work" ... don't use it unless you really have to, and don't use it inside a hot loop, or where N is large.