Friday, September 07, 2007

How does COM know?

If you call an a non-blocking method on a COM object that's in a different apartment and you need to receive callbacks, you need to sit in a message pump of some kind. That fact is written in a few different blogs, but what I didn't understand until today is: why does that work? How is COM connected to the message pump?

First let's pull apart that statement with some basics:
  • Single-Threaded Apartment (STA) basically puts boundaries around sets of objects that are similar to the boundaries around a process. When calling between the apartments (sets of COM objects) we use an RPC mechanism instead of a direct function call.
  • There is a 1:1 correspondence between threads and apartments, and a 1:1 correspondence between threads and their message queues. Thus there is an appropriate message queue for each apartment, and posting to that queue will assure which thread gets the message.
  • RPCs are thus implemented by posting messages to the queue of the thread we're trying to reach. We then poll our own message queue until we get some kind of "reply" indicating the RPC has done. This looks like a blocking function call to client code.
In order for this to work, the thread in the apartment we are calling into must itself be waiting on a message queue. This would be the case if it is either (1) really bored and just querying the message queue or (2) it is itself blocked on a method call into another apartment, and is thus polling its queue to find out if its own RPC is done.

If this all seems like insanity to you, well, it is.

Now when I say "non-blocking" method call, what I really mean is: a method call that returns really fast but starts some work to be completed later.

Normally when a thread is blocked because it made an RPC into another apartment, that apartment can call right back because the same polling of the message queue to discover that the RPC is over allows other methods to be called. This simply means that the flow of code between COM objects can ignore STA when all method calls are "blocking".

But as soon as we have a non-blocking call, there is no guarantee that the client code is actually listening for method calls into its apartment. (By the rules of STA, if the thread is doing stuff, no calls CAN happen, because one thread per apartment.)

Typically client code will make the async call, maybe make a few, and then do some kind of blocking until we're done..for example, we might call WaitForMultipleObjects.

In this case the right thing to do is MsgWaitForMultipleObjects (followed by GetMessage/DispatchMessage if we get woken up for a message). This way while our thread is doing nothing, other apartments can call us back.

This works because the thread, message queue, and apartment are all 1:1 relationships. So to say "this thread needs to be open to COM RPCs" all we need to say is "this thread needs to block on its own message queue", which is done with GetMessage.