The Hacks of Life: 11/01/2015

When Blender describes a rotation as an 'XYZ' Euler (with 3 angles), this is what they mean:

The Z axis is "up" (the Y axis is away from us to be right-handed).
Each rotation is around the named axis. So X is a rotation around the X axis (a "pitching up" rotation for pilots).
The rotations are done in the order listed, extrinsically. In other words, we rotate around each of these global axes.

The net result of this is that the X rotation is affected by the Y and Z (because they happen later). If we were rotating around the rotated Y (Y') or rotated Z (Z'') axis, then the X axis would be unaffected.

The net result is that (from an aviation-angles perspective) we do yaw first (global Z is unaffected), then roll (transformed Y), then pitch (transformed X). (It should be noted that with pitch last, this does not even remotely correspond to how pilots think about these angles.)

To match X-Plane's transform, instead of XYZ, we need (in Blender) YXZ, which puts Y (roll) at lowest priority.

How the 2.49 Exporter Goes to Crazy Town

Blender 2.75 lets you select orientations; 2.49 is always in XYZ mode. Since these are global axes, the correct order to apply them in an OBJ is:

ANIM_rotate 0 0 1
ANIM_rotate 0 1 0
ANIM_rotate 1 0 0

That is, apply Z first, since X-Plane only has local transforms. (That is, in X-Plane, the last animation is affected by the prior two.)

When the Blender 2.49 exporter decomposes rotations into Eulers, it goes in this order, but it does so in X-Plane coordinates. Thus while "yaw" is unchanged in XYZ animation in Blender, "roll" is unchanged in the export.

I'm trying to not clog up the X-Plane developer blog with tons of technical C++ details. There are a small number of developers who actually want to know those details, so I'm going to post them here. This post explains why SASL was crashing on plugin-unload on El Capitan (but not older operating systems).

Both SASL and Apple's OpenAL implementation are open source, so despite this being a bug that was totally not in the X-Plane code base, I was able to look at everyone involved and debug it myself. I am not particularly happy about having to do that, but the symptoms of the bug were:

Upgrade to El Capitan for free - why not, new things are shiny.
Run X-Plane - seems okay!
Run SASL plane - seems okay!
Switch from SASL plane to plane that ships with X-Plane. Oh noes - my sim crashed! Report a bug to Laminar Research.

The back-trace from the Apple crash reports were all very clear: X-Plane was unloading SASL, SASL was asking OpenAL to tear down its audio context, and OpenAL was throwing an uncaught exception.

So I got involved because users thought this was our bug, even though it wasn't.

Hrm - new crash in Apple's framework in a new OS. Blame Apple! Except, no other OpenAL code is crashing.

Apple's Bug

It turns out there is a bug in Apple's OpenAL. It's one that has been in there for a long time, but only shows up in El Capitan, and frankly doesn't matter in any real way. On OS X, if you call alcDestroyContext on a context that has (1) playing sounds and (2) is the only context for its device and (2) isn't using effects on those sounds then you get an uncaught exception on El Capitan.

The actual bug is subtle - the tear-down order of the underlying audio units that power Apple's OpenAL implementation isn't quite right in this case, resulting in AudioUnits returning an error code in a destructor. The code throws this and catches it in the underlying alcDestroyContext call.

From what I can tell, there was a tool chain change in El Capitan that causes this to terminate an app. I am not an expert, but I think that throwing an exception out of a destructor is undefined behavior, and now Clang is putting its foot down. When I compiled OpenAL from source, my built version simply caught the exception and returned it from alcDestroyContext.

For what it's worth, I don't consider this a severe bug or engineering failure by Apple. The OpenAL specification is a total disaster, and I don't blame anyone who misses a corner case (assuming deleting a playing context even is legal - with a spec like that, who knows). And no app in its right mind would just go kill the context without stopping audio first. Which brings us to SASL's bug.

SASL's Bug

SASL had a bug too. SASL uses a stack based C++ class to change the OpenAL audio context from X-Plane's context to its own to do audio work and then turn it back when done. This is a classic RAII way to manage state.

ContextChanger changer(sound->context);

Except the clean-up code in SASL had this:

ContextChanger(sound->context);

That is, of course, totally legal C++, and totally not useful. I look forward to the day when creating a temporary object in its own expression with a non-trivial destructor is a warning, because I've done this in my own code too.

Without a working context changer, SASL's cleanup code would attempt to clean up all of X-Plane's audio objects (not cool man, not cool!) and then kill its own context. Of course, its own context was still playing since no cleanup had happened.

To put it bluntly, this bug makes me pretty mad, and here's why:

This code has literally never worked right. Not once, not since day one.
The fact that this code was not working right was easily detectable just by checking the OpenAL error code. When SASL goes to delete sources in the wrong context, in most cases the source names are wrong and OpenAL returns an error code. During development and in debug mode, SASL should be checking the OpenAL error code, at least when it finishes its own work before returning control to X-Plane.

Unfortunately, before this bug was fixed, SASL contained only one bit of "error checking code":

ContextChanger(ALCcontext *context) {oldContext = alcGetCurrentContext();alcMakeContextCurrent(context);alGetError();};

If you don't speak OpenAL, basically that's SASL clearing the error code before beginning audio work, with no check of what is in there. This is not how to do error checking.

The Fix Is In

The good news is that the newest version of SASL (2.4 as of this writing) fixes the context changer bug, and also in some cases checks the OpenAL error code after issuing OpenAL commands. The error checking is not as complete as I'd like to see, and still will silence the error sometimes, but it's a step in the right direction.

Are there any teachable moments here? I think there are a few:

If an API provides return codes* for the purpose of determining program correctness (E.g. OpenAL returning "invalid source") it is absolutely to leverage those return codes to do debug assertion checking.
It is not good enough to run the code and observe expected behavior at the user level - you need to verify that the code is actually doing what you expect, or you don't know. (A very wise senior engineer once told that to me 21 (!) years ago when I was just an intern at Avid Technology...it's taken me about that long to deeply understand this in my gut.)
Any time the behavior of code isn't going to be directly user observable (which includes pretty much all resource cleanup code), you need to design the system for debug-ability, e.g. create test cases, attach the debugger, put logging in place, put assertions in place. Proving a program is correct and debugging it is a design requirement just like functionality.

* I don't want to use the term error codes for these returns because I think it is important to distinguish between mistakes in program correctness (you, the programmer, screwed up) and expected failures of hardware (e.g. a disk read error). Having a return enumeration from a function is a coding idiom that can be used for either of these cases. In the case of OpenAL and OpenGL, the returned code detects both programmer mistakes and underlying "errors", e.g. exhaustion of memory.

The Hacks of Life

Saturday, November 21, 2015

Blender Notepad - Eulers

How the 2.49 Exporter Goes to Crazy Town

Friday, November 20, 2015

SASL Crash on El Capitan - the Gory Details

Apple's Bug

SASL's Bug

The Fix Is In