Tuesday, January 15, 2008

Burned by the STL - again!

I have blogged about this before, but it just burned me again.

Vector is usually very fast, but it also makes it really easy to write something really slow.

How about this?
struct forest_goo {
vector stuff;
vector > more_stuff;
};
vector many_forests;

How fast is this?

for(vector::iterator i = many_forests.begin(); i != many_forests.end();)
if(okay_forest(*i)) ++i;
else i = many_forests.erase(i);

Here's a hint: not very. The problem is the mid-vector erase. Vector is a bit stupid - when you erase a vector item, it copies every one of them backward by one spot to close the hole. But copying a vector means allocating memory for the vector (and for a vector of vectors it's worse). So in our case erasing one forest_goo item mid-vector is horribly expensive as long as there is a lot more forest goo after it.

The solution is two-part:
  1. Specialize std::swap() for the forest_goo struct so that you can swap two of them and have them change places without allocating memory. (On most STLs, swap() of vectors is already specialized, so you just have to tell the compiler to swap all the pieces.)
  2. Rewrite the algorithm to swap the to-be-killed forest_goo structs to the end of the vector. The swap is a no-mem-alloc operation. When done, you can then chop the entire end off of the array with clear, which won't require moving forest_goo structs around.

No comments:

Post a Comment