Usually I post here after concluding something, usually after a discussion. But this post is the part that comes first - groping with the underlying issues.
Unfortunately a number of my software engineering books are temporarily inaccessible. We packed our (unsorted) library when we moved a few months ago, and because we are redoing a large chunk of the house, everything is still in boxes. As if having to search linearly for any book wasn't demoralizing enough to a programmer, some of the boxes are on top of each other, making whole ranges very difficult to get to. My spirit is broken by O(N).
Anyway, the issue is: given a series of interfaces to a conceptual hierarchy and an implementation hierarchy, how do you implement this in C++? The following gets very ugly very fast:
class IPoint {
virtual Point2 Get()=0;
};
class IBezierPoint : public IPoint {
virtual Vector2 GetCtl()=0;
};
class WED_Point : public IPoint {
virtual Point2 Get()=0;
Point2 internal_pt;
};
If you've done this kind of design before, you can see how we're about to go straight off the clifff...
class WED_BezierPoint : public, um, um.
The problem is that if I inherit from my base implementation (WED_Point) and my full supported interface (IBezierPoint) I pick up two instances of the interface IPoint. This will not make C++ happy and is almost surely not what we want. (The result will be a bunch of "method not overridden" warnings and probably a lot of incorrect polymorphic behavior.)
There are only three ways to solve this in C++ and they all suck in different ways:
1. Don't inherit implementation. This is what I would tell anyone else - inheriting implementation is evil, bla bla bla. In otherwords, WED_BezierPoint derives only from IBezierPoint and reimplements storage for its base point. Because WED_BezierPoint supports the IPoint interface (by way of IBezierPoint) clients still get the appearance of polymorphism via the interface, which is what we really care about.
The only problem with this is that we end up recoding a lot of implementation. Inheritence of implementation is almost always a lousy idea, but in this case I can't help but think that it's giving us something similar to database normalization. Still, one could definitely argue that all algoirthm code should be working with IBezierPoint or IPoint and thus the repeat-implementation is really a non-issue. So breaking the implementation hierarchy might be the least-skanky thing to do.
1a. It should be noted that we can make an implementation hierarchy and contain it in classes derived from the interfaces but not each other. This is a "bridge" pattern and will get us out of trouble in a lot of different design cases.
2. Don't inherit interface. In otherwords, IBezierPoint doesn't derive from IPoint. This solves the problem, but with a rather weird result: IBezierPoints might not be points at all.
You can argue that since these interfaces are meant to be run-time cast and checked anyway, that this is okay, but I think we lose something here. It would be nice to require that all bezier points be points - that is what we mean, so having the compiler require it is nice. I'd have to catagorize this fix as skanky.
(It should be noted that having independent interfaces is a good thing a lot of the time. Just maybe not when a very clear IS-A relationship exists.)
3. Make the interfaces virtual base classes. Do two C++ wrongs make a right? Probably not, but for the sake of argument, if all interfaces are virtual base classes then we can have the IPoint interface from two places at once.
One warning about this: if you make your interfaces virtual you need to subclass from them virtually in two places - both when one interface is derived from another and when a base implementation class derives from an interface. (The implementation derivation can be non-virtual - as long as all but one relationship is virtual, C++ will figure out what we mean, at least in CodeWarrior where I tested this.)
I suppose it should be noted that, as gross as using virtual base classes is, if you derive your C++ interfaces from some kind of common casting base (read: IUnknown) it has to be a virtual base class anyway.
Tuesday, February 20, 2007
Saturday, February 17, 2007
Prototypes - Your Annoying Friend
Depending on your C++ compiler, when you start a project, C++ may require a prototype declaration for functions or not.
You really want prototypes! Here's why:
According to some byzantine aspect of C/C++, if you use a function that isn't defined and you don't have C++ requiring prototypes, then C++ will assume that the function (a) exists and (b) takes arguments that are exactly like what you are passing!
Now if the function doens't exist, you don't need prototypes to help you - you'll get a link error later on saying "we never found this function". Prototypes are better because it doesn't take a full compile to find out.
But where things really get ugly is if the function does exist, but the calling conventions you passed in assume an implicit conversion. For example, imagine if your function goes like this:
void some_happy_func(float n);
Now you call it like this:
some_happy_func(2.0);
Without prototypes, this probably will cause a lot of pain. Why? Well, 2.0 is a double-precision argument! If some_happy_func's prototype exists, the compiler will create an implicit conversion. But if it doesn't, the compiler will assume some_happy_func takes doubles, and generate an incorrect stack (with a double on it). When some_happy_func actually runs, it will interpret the low 32 bits of that double as a float and all hell will break loose.
In my experience, when prototypes reuqirements are off it's really easy to lose track of whether you've included a needed header or not, causing possibly subtle bugs where functions fail sometimes based on the calling code's headers. If the failure causes a hard-to-detect bug it can take a while to unravel this.
You really want prototypes! Here's why:
According to some byzantine aspect of C/C++, if you use a function that isn't defined and you don't have C++ requiring prototypes, then C++ will assume that the function (a) exists and (b) takes arguments that are exactly like what you are passing!
Now if the function doens't exist, you don't need prototypes to help you - you'll get a link error later on saying "we never found this function". Prototypes are better because it doesn't take a full compile to find out.
But where things really get ugly is if the function does exist, but the calling conventions you passed in assume an implicit conversion. For example, imagine if your function goes like this:
void some_happy_func(float n);
Now you call it like this:
some_happy_func(2.0);
Without prototypes, this probably will cause a lot of pain. Why? Well, 2.0 is a double-precision argument! If some_happy_func's prototype exists, the compiler will create an implicit conversion. But if it doesn't, the compiler will assume some_happy_func takes doubles, and generate an incorrect stack (with a double on it). When some_happy_func actually runs, it will interpret the low 32 bits of that double as a float and all hell will break loose.
In my experience, when prototypes reuqirements are off it's really easy to lose track of whether you've included a needed header or not, causing possibly subtle bugs where functions fail sometimes based on the calling code's headers. If the failure causes a hard-to-detect bug it can take a while to unravel this.
Monday, January 29, 2007
Why Not Binary Blobs?
I am working on the data model for WorldEditor, the X-Plane graphical scenery editor. At risk of "life blogging" the development, the design decisions for WED illustrate some design ideas.
WED uses C++ objects to represent the user's data in memory (the "internal data model"). I'll comment at another time on why I made this decision. On disk, however, WED uses an SQLite database file. That's another blog post too.
So one must ask the question, why do we need an on-disk data-model at all? Why not just dump out the C++ object contents to a file?
One might say "because you can't write STL classes to disk verbatim due to their internal pointers and private structure." But...WED uses an object-based undo system that requires each object to know how to serialize itself to a buffer...this means that we've already written serialization code for all of those STL structures.
It would make development faster to just reuse the object serialization code, but the result would be a file format that is a side-effect of the implementation code. This isn't good if:
In the short term you save time on writing file I/O code, but as soon as you change the object format you must write new code to read the old file format, so you pay the "cost" of that code eventually -- but you must write this code against a file format that wasn't really "designed" at all.
In particular with WED, we want the file format to be stable and low-change over a long period of time, because the kind of data that might be in a WED file can be useful over a relatively long lifetime.
Given this, I am writing an explicit file format up-front rather than use the object serialization mechanism.
WED uses C++ objects to represent the user's data in memory (the "internal data model"). I'll comment at another time on why I made this decision. On disk, however, WED uses an SQLite database file. That's another blog post too.
So one must ask the question, why do we need an on-disk data-model at all? Why not just dump out the C++ object contents to a file?
One might say "because you can't write STL classes to disk verbatim due to their internal pointers and private structure." But...WED uses an object-based undo system that requires each object to know how to serialize itself to a buffer...this means that we've already written serialization code for all of those STL structures.
It would make development faster to just reuse the object serialization code, but the result would be a file format that is a side-effect of the implementation code. This isn't good if:
- You want to edit the data from another application without having code interdendencies or
- You want to refactor the code (which would cause object layout to change) or
- You want to read a subset of the data. (The in-memory structure is, well, in-memory, so it assumes you have access to everything.
In the short term you save time on writing file I/O code, but as soon as you change the object format you must write new code to read the old file format, so you pay the "cost" of that code eventually -- but you must write this code against a file format that wasn't really "designed" at all.
In particular with WED, we want the file format to be stable and low-change over a long period of time, because the kind of data that might be in a WED file can be useful over a relatively long lifetime.
Given this, I am writing an explicit file format up-front rather than use the object serialization mechanism.
Thursday, January 25, 2007
Windows Vista - Stable API
I'm not a huge fan of Microsoft or Windows Vista, and I do all my primary development on a Mac. But...
The same X-Plane code that runs on Windows Vista will run on Windows 98 SE.
In that time Apple has changed the API (introducing Carbon), the ABI twice (moving from CFM to Mach-O, and GCC3 to 4*) and the instruction set (PPC to Intel), as well as the compiler twice (MPW to X-Code, or more likely Metrowerks to X-Code, and GCC 3 to 4).
I love Apple dearly, but if I write an app and burn the source code, I wouldn't be surprisde if the binaries run on Windows 5 years from now. I wouldn't make that prediction for the Mac.
*Is it fair to call the upgrade from GCC 3 to 4 an ABI change or compiler change? Not really, they're not full ones, but besides the usual slight changes, the real problem is that GCC 4 generates run-time dependencies on shared libraries that were not available in older Mac Operating Systems. This is why X-Plane is not alone in requiring OS X 10.3.9 or higher - 10.3.9 is the oldest operating system that has the runtime for GCC 4. Applications that can run all the way back to OS X 10.2 are actually building their PowerPC executable code against a different deployment target, which requires extra makefile gymnastics.
The same X-Plane code that runs on Windows Vista will run on Windows 98 SE.
In that time Apple has changed the API (introducing Carbon), the ABI twice (moving from CFM to Mach-O, and GCC3 to 4*) and the instruction set (PPC to Intel), as well as the compiler twice (MPW to X-Code, or more likely Metrowerks to X-Code, and GCC 3 to 4).
I love Apple dearly, but if I write an app and burn the source code, I wouldn't be surprisde if the binaries run on Windows 5 years from now. I wouldn't make that prediction for the Mac.
*Is it fair to call the upgrade from GCC 3 to 4 an ABI change or compiler change? Not really, they're not full ones, but besides the usual slight changes, the real problem is that GCC 4 generates run-time dependencies on shared libraries that were not available in older Mac Operating Systems. This is why X-Plane is not alone in requiring OS X 10.3.9 or higher - 10.3.9 is the oldest operating system that has the runtime for GCC 4. Applications that can run all the way back to OS X 10.2 are actually building their PowerPC executable code against a different deployment target, which requires extra makefile gymnastics.
Wednesday, January 24, 2007
Inheritance of Implementation is Evil
If you go to your first software engineering job interview fresh out of college, they might ask you: what are the three tenets of object-oriented programming? They're hoping you'll barf out "encapsulation, polymorphism, and inheritance" or something like that.
Of those, inheritance is probably the least important. They'll then ask why inheritance is a good thing, quite possibly hoping to hear those two dreaded words: "code reuse".
No!!!!!!!! Run for your lives! The sixty foot tall abominable derived classes are coming!
Inheritance of implementation happens any time you derive from a class that does things, and then try to change what that class does slightly by overriding part of its implementation. I would describe this as the code equivalent of an organ transplant - let's just rip out the pancreas, put a new one in (maybe it's not even the same species) and hope it all plays nice together.
I have come to the conclusion that inheritance of implementation is almost always a bad thing. I'm not saying never use it - there's no rule that always holds in software engineering. (Ha, sort that one out.) But in the case of inheritance of implementation, I think that, like virtual base classes, inheritance of implementation should be a red flag and cause for pause.
The problem with inheritance of implementation is that it makes it really easy to violate encapsulation, which is the most important thing in OOP. Parent-child classes make a difficult context to manage customization.
If you are going to do it, consider a known design pattern like "template method". Template method at least formalizes the relationship between parent and child class that get out of hand.
The temptation for inheritance of implementation is strong - when you've got an is-a relationship and the behavior seems to be driven by that relationship, how can you not want to "inherit" the behavior. But is-a describes a publicly described interface - the implementation code may have seams that aren't related to the is-a relationship. Pulling out pieces of implementation along these lines is asking for trouble.
The cure for inheritance of implementation is a "bridge" pattern - and I would go as far as to say there's no need to worry about whether the implementation is in a class hierarchy (use it only if it's useful) - the important thing is to make sure the implementation is designed based on what makes sense and not based on how the interface presents itself.
(I am not surprised to see Java in the Wikipedia article on "bridge" above - Java doesn't provide for multiple inheritance of implementation. I'm not a huge Java fan but in this case I think they got it 100% right in requiring coders to write a few more lines of code to avoid spaghetti.)
In my professional experience, inheritance of implementation happens a lot out of impatience - I have class A that almost does what I want, so I derive from it and make class B. The worst form of this involves declaring pieces of A that were not virtual to be virtual so that they can become overridden - changing the "API" of A after the fact.
The right thing to do would have been to refactor the code up-front; pull out from B the utility U that A wants to reuse, then A and B can call U. Eventually some one is going to do that refactoring, but it would have been a lot easier to see and do first when the implementation was all in A than when the implementation has been spit between A and B by random overriding of virtual methods.
Of those, inheritance is probably the least important. They'll then ask why inheritance is a good thing, quite possibly hoping to hear those two dreaded words: "code reuse".
No!!!!!!!! Run for your lives! The sixty foot tall abominable derived classes are coming!
Inheritance of implementation happens any time you derive from a class that does things, and then try to change what that class does slightly by overriding part of its implementation. I would describe this as the code equivalent of an organ transplant - let's just rip out the pancreas, put a new one in (maybe it's not even the same species) and hope it all plays nice together.
I have come to the conclusion that inheritance of implementation is almost always a bad thing. I'm not saying never use it - there's no rule that always holds in software engineering. (Ha, sort that one out.) But in the case of inheritance of implementation, I think that, like virtual base classes, inheritance of implementation should be a red flag and cause for pause.
The problem with inheritance of implementation is that it makes it really easy to violate encapsulation, which is the most important thing in OOP. Parent-child classes make a difficult context to manage customization.
If you are going to do it, consider a known design pattern like "template method". Template method at least formalizes the relationship between parent and child class that get out of hand.
The temptation for inheritance of implementation is strong - when you've got an is-a relationship and the behavior seems to be driven by that relationship, how can you not want to "inherit" the behavior. But is-a describes a publicly described interface - the implementation code may have seams that aren't related to the is-a relationship. Pulling out pieces of implementation along these lines is asking for trouble.
The cure for inheritance of implementation is a "bridge" pattern - and I would go as far as to say there's no need to worry about whether the implementation is in a class hierarchy (use it only if it's useful) - the important thing is to make sure the implementation is designed based on what makes sense and not based on how the interface presents itself.
(I am not surprised to see Java in the Wikipedia article on "bridge" above - Java doesn't provide for multiple inheritance of implementation. I'm not a huge Java fan but in this case I think they got it 100% right in requiring coders to write a few more lines of code to avoid spaghetti.)
In my professional experience, inheritance of implementation happens a lot out of impatience - I have class A that almost does what I want, so I derive from it and make class B. The worst form of this involves declaring pieces of A that were not virtual to be virtual so that they can become overridden - changing the "API" of A after the fact.
The right thing to do would have been to refactor the code up-front; pull out from B the utility U that A wants to reuse, then A and B can call U. Eventually some one is going to do that refactoring, but it would have been a lot easier to see and do first when the implementation was all in A than when the implementation has been spit between A and B by random overriding of virtual methods.
Friday, January 12, 2007
Why Const is "Wicked Weird"
This might be surprising:
typedef char * c_string;
typedef const char * const_c_string;
const c_string x;
const_c_string y;
y = x; // this is illegal!
In order to understand this you need to know a few things about C++.
First: a type specifier has parts:
typedef const int number_t;
typedef int const number_t;
Those are both the same. But it gets a little werider:
int typedef const number_t;
basically int, typedef and const are all parsed together - they can trade places.
Pointers however, most certainly can't. So these things are the same:
const char * t;
char const * t;
But this one is very different:
char * const t;
If you read the C++ spec, the syntax for the pointer part of a type-spec is actually * [cv-qualifier] -- that is, if the word const follows the *, it acts on the pointer.
The way I think of this is: the word const can move around, but it can never cross a *.
So now we can understand why the above didn't work...
typedef char * c_string;
typedef char const * const_c_string;
c_string const x; // this is char * const
const_c_string y; // this is char const *
When we view the "expansion" of the typedef with const moved around, we can see how these are not the same.
(You might wonder, if we had written const c_string whether it would be different from const c_string. Well, remember, the const can move around a named type without having any effect. Essentially the pointer has been "baked into" the type using typedef before we applied const. So const has to apply to the pointer, not to the data itself.)
typedef char * c_string;
typedef const char * const_c_string;
const c_string x;
const_c_string y;
y = x; // this is illegal!
In order to understand this you need to know a few things about C++.
First: a type specifier has parts:
- Some kind of typename, like int.
- Possible a "cv" (const-volatile) qualifier like "const".
- Possibly the word typedef.
- Other stuff...
typedef const int number_t;
typedef int const number_t;
Those are both the same. But it gets a little werider:
int typedef const number_t;
basically int, typedef and const are all parsed together - they can trade places.
Pointers however, most certainly can't. So these things are the same:
const char * t;
char const * t;
But this one is very different:
char * const t;
If you read the C++ spec, the syntax for the pointer part of a type-spec is actually * [cv-qualifier] -- that is, if the word const follows the *, it acts on the pointer.
The way I think of this is: the word const can move around, but it can never cross a *.
So now we can understand why the above didn't work...
typedef char * c_string;
typedef char const * const_c_string;
c_string const x; // this is char * const
const_c_string y; // this is char const *
When we view the "expansion" of the typedef with const moved around, we can see how these are not the same.
(You might wonder, if we had written const c_string whether it would be different from const c_string. Well, remember, the const can move around a named type without having any effect. Essentially the pointer has been "baked into" the type using typedef before we applied const. So const has to apply to the pointer, not to the data itself.)
Saturday, January 06, 2007
Ref Counting and Purging
X-Plane's memory management for graphic resources has changed over time.
Back in the old days, allocation was static - there was a fixed number of terrain textures, for example (600). Memory management was trivial, every resource went in its slot, and the code was really bullet-proof. On the other hand, it didn't scale well (meaning at all).
Parts of the code evolved to use a cache-and-retain system...each resource was loaded once the first time it was needed and retained forever. This was dynamic and good.
Unfortunately as the hardware is capable of blasting out more graphic data, scenery has gotten more complex, and now if we retain everything we tend to run out of virtual memory. So now X-Plane uses a reference-counting system to share resources but purge them when they're not used.
But the code isn't as simple as saying "if the ref count drops to zero, nuke the object". The problem is: consider two scenery files A and B that use three catagories of objects:
1. Objects only in A.
2. Objects only in B.
3. Objects in both A and B.
If we load scenery B before purging A, and we use simple ref counting, we will have all three sets of objects loaded at once. That's bad, because our total virtual memory footprint will temporarily spike up, possibly exhausting supply.
If we purge A and then load B, the objects in catagory 3 are deleted and immediately reloaded. That's unnecessarily slow.
So instead X-Plane uses the following rules:
Back in the old days, allocation was static - there was a fixed number of terrain textures, for example (600). Memory management was trivial, every resource went in its slot, and the code was really bullet-proof. On the other hand, it didn't scale well (meaning at all).
Parts of the code evolved to use a cache-and-retain system...each resource was loaded once the first time it was needed and retained forever. This was dynamic and good.
Unfortunately as the hardware is capable of blasting out more graphic data, scenery has gotten more complex, and now if we retain everything we tend to run out of virtual memory. So now X-Plane uses a reference-counting system to share resources but purge them when they're not used.
But the code isn't as simple as saying "if the ref count drops to zero, nuke the object". The problem is: consider two scenery files A and B that use three catagories of objects:
1. Objects only in A.
2. Objects only in B.
3. Objects in both A and B.
If we load scenery B before purging A, and we use simple ref counting, we will have all three sets of objects loaded at once. That's bad, because our total virtual memory footprint will temporarily spike up, possibly exhausting supply.
If we purge A and then load B, the objects in catagory 3 are deleted and immediately reloaded. That's unnecessarily slow.
So instead X-Plane uses the following rules:
- Objects are retained when their reference count is zero.
- All objects of reference count zero are explicitly purged via a global call.
- scenery A is purged before B is loaded.
- All new objects are loaded as lazily as possible, during flight.
- Purge scenery A. (Class 1 and 3 objects ref count goes to zero.)
- Load scenery B. (Class 3 objects ref count goes back to 1. Class 2 objects ref count is 1 but they are not yet loaded. So far our memory usage for objects hasn't changed.)
- Purge all unused objects. (Class 1 objects are now nuked. Memory goes down.)
- Start lazily loading missing objects. (Class 2 objects are loaded. Memory goes back up.)
Subscribe to:
Posts (Atom)