Ever wonder how the STL list works in GCC?
What? No? You haven't?
Where's your sense of curiousity about the way your code works?
What? You have a life?!? What are you reading this blog for then?
Anyway, it's pretty simple. It's a doubly-linked list, where the node has a previous and next pointer. The head and tail of the list are linked together, and the list itself keeps a pointer to the head node.
This means that begin() and end() are fast.
Unfortunately, the list doesn't keep a size counter. This means that size() is O(n). So you can find the back of the list easily but getting its index is expensive.
That's probably fine, as turning the index back into an iterator is O(n) and the iterators are stable. By comparison though, CodeWarrior's list caches the number of items.
Moral of the story: empty() is more efficient than size()==0, and, um, know what's in your STL.
(Insert rant about STL being like sausage here...)
Friday, June 15, 2007
Thursday, June 14, 2007
Virtual Memory Dumping for Fun and Profit
We're trying to answer the question: why did X-Plane run out of memory? To answer that, we need to look at the contents of virtual memory. Now dumping virtual memory isn't so bad.
On Windows you call VirtualQuery with a base address. Start at 0 and increment by the returned RegionSize. When you get to dwTotalVirtual (as returned by GlobalMemoryStatus) that's a good time to stop.
On the Mac call vm_region, starting at 0 and incrementing by "size" until it returns an error code. One tricky thing: vm_region will skip over unused virtual memory. To "see" these holes (this is address space that can be used later) compare the address you pass in to the one that's returned. If you pass in a pointer that's in an unused region, the address will be advanced to the next region.
What if you want to show the DLLs/dylibs that are mapped into a given region? I make no promise for the performance of this method but...
On Windows, use GetMappedFileName. Pass in your process handle and a base address from VirtualQuery and it fills in a buffer with a DLL name if possible. This is in psapi.dll so it isn't available on non-NT-derived Windows.
On Mac, use dladdr, passing in the base address of the region. You'll get a dylib file path and a symbol name, although the symbol name tends to be junk.
On Windows you call VirtualQuery with a base address. Start at 0 and increment by the returned RegionSize. When you get to dwTotalVirtual (as returned by GlobalMemoryStatus) that's a good time to stop.
On the Mac call vm_region, starting at 0 and incrementing by "size" until it returns an error code. One tricky thing: vm_region will skip over unused virtual memory. To "see" these holes (this is address space that can be used later) compare the address you pass in to the one that's returned. If you pass in a pointer that's in an unused region, the address will be advanced to the next region.
What if you want to show the DLLs/dylibs that are mapped into a given region? I make no promise for the performance of this method but...
On Windows, use GetMappedFileName. Pass in your process handle and a base address from VirtualQuery and it fills in a buffer with a DLL name if possible. This is in psapi.dll so it isn't available on non-NT-derived Windows.
On Mac, use dladdr, passing in the base address of the region. You'll get a dylib file path and a symbol name, although the symbol name tends to be junk.
Enum tags - C vs C++
So you're minding your own business when you come across a gem like this:
It doesn't take much to bring out the inherent beauty of C...it rolls off the tongue like the name of a fine perfume. Anyway, this is legal in both C and C++, but the rules are a little bit different.
First of all, this is a typedef statement. __MIDL___MIDL_itf_qnetwork_0082_0001 is the enum tag and AMExtendedSeekingCapabilities is the defined type. (I know, those aren't the real C++ spec names...)
A few things are clear:
By comparison, all 4 of those statements will compile correctly in C++. You can see C++ code like this:
This won't work in C++.
But C++ is more strict than C when it comes to typing. Given our above example, we can do this in C:
In C++ you'll get an error -- you can assign an int to an enum but not vise versa.
typedef enum __MIDL___MIDL_itf_qnetwork_0082_0001 {
AM_EXSEEK_CANSEEK = 1,
AM_EXSEEK_CANSCAN = 2
} AMExtendedSeekingCapabilities;
It doesn't take much to bring out the inherent beauty of C...it rolls off the tongue like the name of a fine perfume. Anyway, this is legal in both C and C++, but the rules are a little bit different.
First of all, this is a typedef statement. __MIDL___MIDL_itf_qnetwork_0082_0001 is the enum tag and AMExtendedSeekingCapabilities is the defined type. (I know, those aren't the real C++ spec names...)
A few things are clear:
- In C or C++ we can assign the enumerated values (AM_EXSEEK_CANSEEK, AM_EXSEEK_CANSCAN) to an integral type, and we'll get the values 1 and 2.
- I have no idea how big one of these enums is...check your local compiler settings. It may be that this enum only takes up 8 bits or it may have been allocated a full 32 bits. (We use ints for our enums in the X-Plane SDK to be sure how big they are. We lose type safety but see below.)
typedef enum tag { a, b } def;
enum tag var1; // legal - enum keyword and tag go together.
tag var2; // illegal - tag requires the enum keyword.
enum def var3; // illegal - enum keyword cannot be used with a typedef.
def var4; // legal - typedef is usable on its own.
By comparison, all 4 of those statements will compile correctly in C++. You can see C++ code like this:
enum letters {
a, b, c, d
};
letters my_var = a;
This won't work in C++.
But C++ is more strict than C when it comes to typing. Given our above example, we can do this in C:
def my_var = 5; // Illegal in C++
In C++ you'll get an error -- you can assign an int to an enum but not vise versa.
Tuesday, June 12, 2007
Dyamic Cast is Slow but...
...stupid code is slower.
In a performance tuning situation, sometimes you have to figure out:
(Typically I will leave the operation slow while I cut down the number of operations, because having slow primitives makes it easy to observe the effects of improving algorithm time in a profiler. Then once I fix the algorithm, I tune the operation itself, possibly even temporarily disabling the algorithm improvement to make it easier to "see" what's going on when profiling.)
For WED (the X-Plane scenry tool) I decided to "code first, tune later" -- given that
the design could change a lot while in progress, it seemed best to not spend a lot of
time on code that might get thrown out. I knew that the fundamental design (a group of airports, each spatially far apart) could easily be tuned to prevent major O(n) parts just using indexing and culling.
The resulting unoptimized code was, well, simply awful. What really surprised me was how much time was being spent in dynamic_cast - pretty close to all of it.
The solution was mostly to fix stupid code, but in some cases to not use dynamic_cast.
I'm not a big fan of visitor, so my original design simply used dynamic cast within the specific UI construct to figure out which datamodel construct it had. It turns out the performance problem with this isn't that we're double-dispatching, but only how we do it, and for a large number of objects, dynamic_cast is simply too expensive. Providing a virtual function to get some kind of "type info" on the generic object is much faster.
So if there's any lesson on dynamic_cast here, I think it's just "dynamic_cast does some serious work" ... don't use it unless you really have to, and don't use it inside a hot loop, or where N is large.
In a performance tuning situation, sometimes you have to figure out:
- Is the problem that I'm not doing each operation fast enough or...
- Is the problem that I'm simply doing too many operations?
(Typically I will leave the operation slow while I cut down the number of operations, because having slow primitives makes it easy to observe the effects of improving algorithm time in a profiler. Then once I fix the algorithm, I tune the operation itself, possibly even temporarily disabling the algorithm improvement to make it easier to "see" what's going on when profiling.)
For WED (the X-Plane scenry tool) I decided to "code first, tune later" -- given that
the design could change a lot while in progress, it seemed best to not spend a lot of
time on code that might get thrown out. I knew that the fundamental design (a group of airports, each spatially far apart) could easily be tuned to prevent major O(n) parts just using indexing and culling.
The resulting unoptimized code was, well, simply awful. What really surprised me was how much time was being spent in dynamic_cast - pretty close to all of it.
The solution was mostly to fix stupid code, but in some cases to not use dynamic_cast.
- The biggest single problem was a lack of caching. If we have tree structure, the intermediate nodes can't ask the leaf nodes for information on every function call - the result makse time complexity worse, not better!
- When dealing with a large dataset, pass counts matter. For example, avoiding iterating on the hierarchy of airports when drawing a layer that doesn't care was a big win - it takes a long time just to visit all of that data.
I'm not a big fan of visitor, so my original design simply used dynamic cast within the specific UI construct to figure out which datamodel construct it had. It turns out the performance problem with this isn't that we're double-dispatching, but only how we do it, and for a large number of objects, dynamic_cast is simply too expensive. Providing a virtual function to get some kind of "type info" on the generic object is much faster.
So if there's any lesson on dynamic_cast here, I think it's just "dynamic_cast does some serious work" ... don't use it unless you really have to, and don't use it inside a hot loop, or where N is large.
Monday, June 11, 2007
Cause and Effect
I have a tree-view (think like the tree of files in an Explorer or a Finder window). Since all of my rows are based on objects with integer IDs, I have something like this:
Note that if we don't have an ID in the map, we pretend its value is true. This "defaults" the hierarchy to open whenever we come upon a new ID - similarly "clearing" the map opens everybody.
So then I write this clever toggle routine:
And it doesn't work. Why not? Well, it turns out that the l-value mOpen[id] is evaluated before the right-side expression. Thus we have already called mOpen[id] before we ever call Is_Open.
Here's the subtle bug: calling the array operator on a map creates a default entry (by the default ctor of the value type) if there is no entry for the key. So mOpen[id] on an id that isn't present in the map inserts a default 0 into the map.
Now Is_Open was modified to return "true" for unknown IDs, but this code is making the ID known and false.
The result was: the first time the user clicks on an open triangle, it stays open. This is because it is being set to closed by map::operator[] and then opened.
The correct code is
This forces the current open state to be evaluated before the array operator can make a default entry.
To quote from chapter 5 of the C++ spec:
I knew the specification was good for something. :-)
map mOpen;
bool Is_Open(int id)
{
map::iterator i = mOpen.find(id);
return (i = mOpen.end() ? true : i->second;
}
Note that if we don't have an ID in the map, we pretend its value is true. This "defaults" the hierarchy to open whenever we come upon a new ID - similarly "clearing" the map opens everybody.
So then I write this clever toggle routine:
void Toggle_Open(int id)
{
mOpen[id] = !Is_Open(id);
}
And it doesn't work. Why not? Well, it turns out that the l-value mOpen[id] is evaluated before the right-side expression. Thus we have already called mOpen[id] before we ever call Is_Open.
Here's the subtle bug: calling the array operator on a map creates a default entry (by the default ctor of the value type) if there is no entry for the key. So mOpen[id] on an id that isn't present in the map inserts a default 0 into the map.
Now Is_Open was modified to return "true" for unknown IDs, but this code is making the ID known and false.
The result was: the first time the user clicks on an open triangle, it stays open. This is because it is being set to closed by map::operator[] and then opened.
The correct code is
void Toggle_Open(int id)
{
bool old = Is_Open(id);
mOpen[id] = !old;
}
This forces the current open state to be evaluated before the array operator can make a default entry.
To quote from chapter 5 of the C++ spec:
"Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions and the order in which the side effects take place, is unspecified."
I knew the specification was good for something. :-)
Interfaces and Diamonds
I have a set of abstract interfaces that go something like this:
I derive my point-heading from the point because it makes the interface more useful for clients. Becasue ALL point-headings MUST BE points, clients can simply set the location of a point-heading and use it like a point without having to do dynamic casting and failure-checking. So that seemed good.
Why is it virtual? Well, here's why:
Basically I inherit the implementation of my point into my point-with-heading to avoid recoding points. If IPoint isn't a virtual base class, then the implementation of SetLocation doesn't apply to WED_Point_Heading.
Now before I'll go on, let's examine that fact in more detail. It is a surprising detail. You might think: "well, WED_Point_Heading derives from WED_Point whichi implements SetLocation, what else would I need?" That's what I thought until I tried it, and then when GCC rejected it read the C++ spec.
The problem is simple: given a derived class with two bases (B1 and B2), a virtual function in B1 cannot provide an override for B2. (I don't see any C++ implementation reason why the language couldn't work that way, e.g. it is possible to create this structure in memory, but my guess is that the ensuing chaos of mixing in two unrelated bases and having one change the guts of the other would be worse than what we have now, by a lot.)
The reason that the use of IPoint as a virtual base solves this problem is becasue WED_Point_Heading only has one IPoint. The compiler merges the IPoint sub-object and thus WED_Point's overrides apply everywhere, because they're not being copied from one base to another, they're being applied to one virtual base and there is only one virtual base.
Now I tend to say that if you have an inheritence diamond you've probably already screwed up, and this case is no exception. The real problem here is the use of inheritence of implementation to get code reuse. The real way to unravel this is:
class IPoint {
public:
virtual void SetLocation(const Point2& location)=0;
};
class IPoint_Heading : public virtual IPoint {
public:
virtual void SetHeading(double heading)=0;
};
I derive my point-heading from the point because it makes the interface more useful for clients. Becasue ALL point-headings MUST BE points, clients can simply set the location of a point-heading and use it like a point without having to do dynamic casting and failure-checking. So that seemed good.
Why is it virtual? Well, here's why:
class WED_Point : public virtual IPoint {
public:
virtual void SetLocation(const Point2&) p;
};
class WED_Point_Heading : public WED_Point {
public:
virtual void SetHeading(double heading);
};
Basically I inherit the implementation of my point into my point-with-heading to avoid recoding points. If IPoint isn't a virtual base class, then the implementation of SetLocation doesn't apply to WED_Point_Heading.
Now before I'll go on, let's examine that fact in more detail. It is a surprising detail. You might think: "well, WED_Point_Heading derives from WED_Point whichi implements SetLocation, what else would I need?" That's what I thought until I tried it, and then when GCC rejected it read the C++ spec.
The problem is simple: given a derived class with two bases (B1 and B2), a virtual function in B1 cannot provide an override for B2. (I don't see any C++ implementation reason why the language couldn't work that way, e.g. it is possible to create this structure in memory, but my guess is that the ensuing chaos of mixing in two unrelated bases and having one change the guts of the other would be worse than what we have now, by a lot.)
The reason that the use of IPoint as a virtual base solves this problem is becasue WED_Point_Heading only has one IPoint. The compiler merges the IPoint sub-object and thus WED_Point's overrides apply everywhere, because they're not being copied from one base to another, they're being applied to one virtual base and there is only one virtual base.
Now I tend to say that if you have an inheritence diamond you've probably already screwed up, and this case is no exception. The real problem here is the use of inheritence of implementation to get code reuse. The real way to unravel this is:
- Define two non-virtual, non-polymorphic implementation classes (point-guts, heading-guts) that provide the true run-time implementations of the "SetLocation" code and "SetHeading" code.
- Derive WED_Point_Heading only from IPoint_Heading.
- Use those same guts in WED_Point and WED_Point_Heading. Because the common implementation has been factored out, the only "code duplication" is the calling of the common implementation. I would say this is not even code duplication, but code disambiguation (a careful description of how we want to map our interface hierarchy onto a set of reusable implementation pieces.