Thursday, December 13, 2012

Static Libraries and Plugins: Global Pain

Years after the X-Plane plugin SDK was ported to operating systems that support Unix .a (archive) static libraries, I have finally come to understand what a mess global symbols can make with incorrect linker settings. The problem is that the incorrect linker settings are almost always the defaults. This blog post will explain what goes wrong with this kind of linker setup and how to fix it.  While this stuff might be obvious to those intimately familiar with Linux and Unix-style linking, it's a bit astonishing to anyone coming from the Windows and pre-OS X Mac world, where the assumptions about linkage are very different.

This post may also thoroughly slander Linux, and if I learn why I'm an idiot and the whole problem can be solved in a much better way, hey, that's great.  I'd much rather find out that there's a better way and I'm wrong than find out that things really are as broken as they seem.

Globally Symbols and Shared Libraries

Unix-style linkers (e.g. ld on both OS X and Linux) support a shared global namespace for symbols exported from shared libraries. Simply put, for any given symbol name, there can be only one 'real' implementation of that symbol, and the first dynamic library (or host app with dynamic linkage, which is basically all host apps these days) to introduce that symbol defines it for every dynamic library.

In other words, if you have five implementations of "void a()" in your dynamic libraries, the first one loaded is used by everyone.  It's a global namespace.

Note that if your symbol is not global, it will not be replaced by an earlier variant.  So if your symbol isn't global, other people having global symbols can't hose you.

The implications of this are clear: you should be very very careful and very very minimal about what gets exported into the global namespace, because of the risk of symbol collision.  I found a bug in an X-Plane plugin because the internal routine sasl_done (in a plugin called sasl) was global and the second instance loaded - sasl_done from libsasl2.dylib had already been loaded by the OS.  The results: a random call into a DLL when the plugin thought it was calling itself!

Unfortunately, the default for GCC is to put everything into the global namespace.  As gcc 3.x fades into history, more code is using -fvisibility=hidden and attributes more aggressively, but the defaults make it really easy to do the wrong thing and dump a whole lot of symbols into the flat namespace.

There is one exception to this global calling: if you use dlsym to resolve a symbol from a specific dynamic library (as returned by dlopen) finds it in that dynamic library, like you would expect.  Therefore if you have a plugin with an "official" entry point (like "PluginStart") you can load multiple plugins into the global namespace and find the "right" start function via dlsym.  (If a plugin called its own start routine, it might jump into the wrong plugin due to globla namespace issues.

What Am I Exporting?

On both OS X and Linux you can use "nm" to view your globally exported symbols:
nm my_plugin.dylib | grep "T "
The stuff with a capital T from nm are code symbols in the global namespace.  If you make a plugin DLL that has a lot of those, your code may not operate if there are other plugins already loaded.

Static Libraries: Not So Static

In the Unix world, the .a (static archive) format is basically a collection of .o files with some header info to optimize when the code is linked.  .o files retain the hidden/visible attribute information that is used by the linker to export symbols out of a dynamic library.

What this means is: under normal operation, the linker may export dynamic library symbols out of a static library you link against. In other words, if you link against libpng.a, you may end up having your DLL export all of the symbols of libpng!  If you aren't the first dynamic library to load, the version of libpng you get may not be the static one you asked for.

This behavior is astonishing at best, but unfortunately it is, again, the default: if the static library didn't specifically set its symbols to hidden, you get "leakage" of static library symbols out of the client shared library.  Unfortunately, from my experience this kind of leakage happens all of the time.  With X-Plane we statically link libcurl, libfreetype and libpng, and all three have their symbols marked globally by default.  These are ./configure based libraries and we don't want to start second-guessing their build decisions.  Unfortunately the code tends to be marked up to build the right API in "shared library" mode but not static mode.

You can see this behavior using nm -m on OS X or objdump -t on Linux.

Working Around Library Leak

Someday we may reach a point where all Unix static libraries keep their symbols "hidden" for dynamic library purposes, but until then there is something a DLL can do to work around this problem: use an explicit list of symbol exports.

Using an explicit list of symbol exports is often considered annoying when an API has a large set of public entry points; usually attributes marking specific functions are preferred.  The advantage of an "official list" at link time is that the linker hides everything except that list, and if any static libraries have globally visible symbols, their absence from the master fixes the problem.

(As an example of how to set this up for gcc on Linux and OS X, see here.)

Addendum: What About Namespacing?

Both Linux and OS X have over time developed ways to cope with the flat namespace problem.

On OS X, a dynamic library can be linked with a two-level namespace.  The symbol is resolved against both the name of the providing dylib and the symbol itself. The result is that symbols come only from the dylibs where you thought they would come from.  If at link time symbol A comes from library X, library X is the only place where it will be provided in the future.  (This is the semantics Windows developers are used to.)

On Linux, library APIs can contain version information; as far as I can tell this works by "decorating" symbols with a named library version (e.g. @@GLIBC_2.0).  When the ABI is changed, symbols cannot conflict between versions, and in theory this may also protect against cross-talk between libraries since the version symbol has some kind of short universal library identifier.  I have found almost no documentation on library versioning; if anyone has a good Linux link I'll add it to this post.

X-Plane's plugin system does not use either of these mechanisms because the plugin system is older than both of them. (Technically on OS X two-level namespaces are older than the plugin system, but the plugin system is older than @loader_path, which is a requirement for strict linking of a dylib in the SDK.)  Thus we are stuck with the global namespace and find ourselves trying to force people to keep their symbols to themselves.

Sunday, December 09, 2012

Integrating LuaJIT with X-Plane: 64-bit fun

X-Plane supports a plugin SDK; plugins are dynamic libraries loaded by the sim into it's process address space.  X-Plane plugins link against a host library for access to the sim via a controlled API.

A number of popular plugins (Gizmo, SASL, FlyLua) provide Lua scripting to X-Plane add-ons using LuaJIT 2.0 as their Lua runtime engine.

LuaJIT on an x86_64 machine has an odd quirk: it requires all of Lua's allocations to be in the first 2 GB of address space.  This requirement comes from its use of offset addresses, which apparently have a signed 32-bit range.

Normally on OS X 64-bit the only special work you have to do is customize the zero page region, as OS X defaults the zero page to the entire bottom 4 GB of memory.  (I can only speculate that having the entire 32-bit address range be off limits helps catch programming errors where pointers are truncated to 32-bits because any truncated pointer will point to an illegal page.)

With X-Plane, however, we hit a snag: Lua scripts are often loaded by plugins late in the programs operation, when the user changes airplanes.  By this time, the OS has already consumed all of the "Lua-friendly" address space below 2 GB.  Looking at vmmap dumps, it looks like both the system malloc and OpenGL driver stack like this region of memory.

Here's how we 'fixed' this for 64-bit X-Plane:
  • The host sim pre-allocates as much of the 2 GB region as it can early on, in fixed size chunks.  Currently we're using 32 MB chunks, but this may change.
  • The host provides a custom Lua realloc function that is implemented using a hacked version of dlmalloc; dlmalloc uses the pool of pre-grabbed 32 MB chunks to form its pools.
  • Plugins use lua_newstate to connect the host's "chunk" allocator to the Lua runtime.
This last step required a modification to LuaJIT itself; the shipping 2.0 code has the API for custom allocators disabled in 64-bits (probably under the assumption that client code wouldn't follow the "bottom 2 GB" requirement.)  Fortunately the stub-out is just a series of #defines and the full API functionality is easily restored.

The 1.6 GB of virtual address space the sim grabs up front don't become actual memory allocations until plugins request the memory via Lua, because OS X is lazy about committing memory. Because the address space is pre-grabbed, VM allocates and mmaps that happen later in the app's life cycle simply go to higher address spaces.

There is one limitation to the implementation as it runs now: allocations larger than 32 MB will fail because the 'direct' allocation API in dlmalloc is not functional.  In theory the host allocator could search for consecutive 32 GB blocks to form larger allocations, but I suspect that in practice we'll simply set the chunk size larger.  Typical X-Plane based Lua scripts don't need huge amounts of memory (or at least it appears this way so far) and the number of separate blocks needed is based on the number of simultaneous Lua plugins running, which is also typically quite low.*

* By quite low I mean only one plugin at a time; as far as I can tell Lua plugins often export the runtime with global external symbols, which causes complete chaos.  That's the subject of another blog post.