Tuesday, February 27, 2007

C++ Objects Part 4: typeid and casts

Now that we have described how multiple inheritance in CodeWarrior C++ works, we can look at how casting and type works in C++. We needed to discuss multiple inheritance first; recall from part 2 that for single inheritance, the "front" of an object in memory is its base. With this system, no work is needed at all to cast when we have only one base class.

For the purpose of this discussion, "full class" means the actual class of an object when it is instantiated - in other-words, the "biggest" downcast we can apply - it's real class.

Please also remember from our previous post that an object that inherits from two base classes has at least two vtables with different layouts. Clearly a pointer to a vtable is not a unique identifier of an object's class, if one object can have two different vtable pointers of different values.

So instead CodeWarrior creates a separate static structure for each class that uniquely identifies it: a type_info structure. This shouldn't be surprising - this is the same structure that you can use with the typeid operator. That is if you do this:

D obj;
const std::type_info& t = (typeid(obj));

"t" is a pointer to the internal type_info structure for class D that is used by the runtime for all operations that require typing. (t is a pointer - it looks like a reference in the C++ code, but in truth they are one and the same to the compiler.)

Every vtable starts with a pointer to a type_info structure, identifying the full type of an object. When an object has two vtables (as in our multiple inheritance case), both vtables point to the same type_info structure. No two classes will ever share a vtable (even if a derived class never overrides any member functions) so this is okay. Because both vtables point to the same type_info, we can get our full runtime class for an object no matter which base we have a pointer to.

Static Casts

With this in mind, let's look at static casts. A static cast is a cast between C++ class types whose effects are fully known at compile time. That is, the compiler uses compile-time type information to change the type (and possibly the pointer value) for an object. The compiler looks at the layout in memory of the base class and the derived class and adjusts the pointer as much as is needed. (When we have only one base class, a static cast will not need to adjust the pointer at all, because the base class is always first in memory.)

There are some casts that cannot be performed statically at all, but we will get to this later. Also, we can't check whether the cast is safe at compile time. If we have a pointer to X that we cast to type Y but the object is not actually of type Y, the compiler will not care and we may crash later.

Full Class and Offsets

When we first introduced the layout of a vtable we mentioned that there is an "offset" right after the type_info. But what is this offset?

The offset in a vtable is the number of bytes that a pointer to an object whose type uses this vtable must be adjusted to find the full class of the object.

So in our case where we have two bases, each embedded somewhere in a class, the vtable of each base contains the offset to recover the full object. Note that one vtable will probably have an offset of 0, and this vtable is the one we will use for the "full" class, so that we never adjust the pointer when we already have the full class.

(As far as I can tell, CodeWarrior layers all static vtables for a class consecutively, so when given a pointer to the first one, it can actually generate calls into any vtable, because their relative positioning is fixed.)

Dynamic Casts

A dynamic cast converts the type of an object using run-time information. The cast returns NULL if the object can't be viewed as a given type. Since dynamic casts look at the object at run-time and can see the full class of the object, they can perform casts that we can't do using only compile-time information.

(Consider a DLL that returns an object of some unknown type, but as a pointer to a base type. Without run-time type information, the compiler has no idea what real type that object is, because it doesn't have the source code. In practice even if we did have the code, the compiler can't track that object's pointers through all of the strange things C++ can do to memory.)

A dynamic cast works by examining the type_info structures of an object and using that information to perform the cast. So to understand dynamic casts, we'll need to look a the type_info structure.

The type_info structure contains a pointer to a static string that names the class - all dynamic casts will use class string-name comparisons to find equivalent type. It then contains a pointer to a "null-terminated" variable-length array of info about base classes. Each item in this array contains a pointer to the type_info of a base class (if this pointer is null, that indicates null termination) and an offset indicating how much the pointer to the full object must be adjusted to cast to this type.

It should be noted that this list of base classes contains all bases, not just immediate bases! In other words, if we have a class Z that derives from Y and Y derives from X, the array of base classes for Z contains X and Y! This means that with a single linear search of the list, we can find any possible base. We do not have to search the entire tree.

(As a side note, each type_info's array contains pointers to parents, so you can follow the chain of inheritance from type_info to type_info. As far as I can tell in CodeWarrior there is no recording of which bases are "immediate bases", as you never need this info in C++. Perhaps the order of the bases in the list would tell us but I am not sure.)

Dynamic cast is therefore a two-step process:

1. Given the vtable of a pointer to an object, use the offset to recover a pointer to the full class. (All adjustments will then be made from this pointer.) This is the equivalent of down-casting to the full class.

2. Search the type_info of the full class for the type we want - in other words, go through a list of all bases. If we find one, use the offset to adjust the pointer again. If the search in step 2 fails, return NULL.

(I am leaving out ambiguous base classes - we'll cover that later!)

3 comments:

  1. How do you compare the result of typeid to a known class?

    if (typeid(R) == char)

    doesn't work.

    ReplyDelete
  2. shagbark:

    Have you tried
    "if(typeid(R) == typeid(char))"?

    This works for me between two classes, I'm not sure if it will work with built-in types

    ReplyDelete
  3. I've used this:
    R r;
    string rtype = typeid(r).name();

    It's clunky, very slow, and gives unpredictable results for some non-builtin types, though. For testing type char, I use if (sizeof(r)) == 1; but again, clunky.

    ReplyDelete