Saturday, December 19, 2015

The Dangers of Super Smart Compilers

For the first time today, I ran an optimized DSF render using RenderFarm (the internal tool we use to make the global scenery) compiled by Clang.
The result was a segfault, which was a little bit surprising (and very disheartening) because the non-optimized debug build worked perfectly, and the optimized build works perfectly when compiled by GCC. When -O0 revealed no bug (meaning the bug wasn’t some #if DEV code) it was time for a “what did the optimizer do this time session.”
After a lot of printf and trial and error, it became clear that the optimizer had simply skipped an entire block of code that went roughly like this:
for(vector<mesh_mash_vertex_t>::iterator pts = 
   ioBorder.vertices.begin(); pts != 
   ioBorder.vertices.end(); ++pts)
if(pts->buddy == NULL)
{
   /* do really important stuff */
}
The really important stuff was being skipped, and as it turns out, it was really important.
So…WTF? Well, buddy isn’t a pointer - it’s a smart handle, so operator== isn’t a pointer compare it’s code. We can go look at that code, let’s see what’s in it.
The handle turns out to just be a wrapper around a pointer - it’s operator* returns *m_ptr. Operator== is defined out of line and has a case specifically designed to make comparison-with-null work.
  template < class DSC, bool Const >
  inline
  bool operator==(const CC_iterator<DSC, Const> &rhs,
                  Nullptr_t CGAL_assertion_code(n))
  {
    CGAL_assertion( n == NULL);
    return &*rhs == NULL;
  }
Of course, Clang is way smarter than I am, and it actually has commentary about this very line of code!
Reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to always evaluate to false.
Oh @#. Well, there’s our problem. This operator==, like plenty of other semi-legit code, is “unpacking” the handle wrapper by using &* to get a bare pointer to the thing being wrapped. In practice, the & and * cancel each other out and you get the bare pointer that is secretly inside whatever you’re working with.
Except that Clang is sooooo clever. It goes “hrm - if &*rhs == NULL then what was *rhs? It’s a NULL reference (because rhs is NULL and we dereferenced it). And since NULL objects by reference are illegal, this must never have happened - our code is in undefined behavior land as soon as *rhs runs.
Since our code is in undefined behavior land (if and only if *rhs is a “null object” if such a thing exists, which it doesn’t) then the compiler can do whatever it wants!
If *rhs is not a NULL object, &*rhs won’t ever equal NULL, and the result is false. So if one side of the case returns false and the other side is undefined, we can just rewrite the whole function.
  template < class DSC, bool Const >
  inline
  bool operator==(const CC_iterator<DSC, Const> &rhs,
                  Nullptr_t CGAL_assertion_code(n))
  {
    return false; /* there I fixed it! */
  }
and that is exactly what Clang does. Thus if(pts->buddy == NULL) turns into if(false) and my important stuff never runs.
The short term “fix” (and I use the term loosely) is to do this:
for(vector<mesh_mash_vertex_t>::iterator pts = 
   ioBorder.vertices.begin(); pts != 
   ioBorder.vertices.end(); ++pts)
if(pts->buddy == CDT::Vertex_handle())
{
   /* do really important stuff */
}
Now we have operator== between two handles:
  template < class DSC, bool Const1, bool Const2 >
  inline
  bool operator!=(const CC_iterator<DSC, Const1> &rhs,
                  const CC_iterator<DSC, Const2> &lhs)
  {
    return &*rhs != &*lhs;
  }
This one is also doing illegal undefined stuff (&* on a null ptr = bad) but Clang can’t tell in advance that this is bad, so the optimizer doesn’t hammer our code. Instead it shortens this to a pointer compare and we win.
Newer versions of CGAL* have fixed this by taking advantage of the fact that a custom operator->() returns the bare pointer underneath the iterator, avoiding the illegal null reference case. (This technique doesn’t work in the general case, but the CGAL template is specialized for a particular iterator.)
In Clang’s defense, the execution time of the program was faster until it segfaulted!
  • You can make fun of me for not updating to the latest version of every library every time it comes out, but given the time it takes to update libraries on 3 or 4 compilers/build systems and then deal with the chain of dependencies if they don’t all work together, you’ll have to forgive me for choosing to get real work done instead.

Thursday, December 10, 2015

Source Control for Art Assets - This Must Exist

I've been thinking a lot lately about revision control for art assets. As X-Plane has grown, our art team has grown, and as the art team has grown, our strategy for dealing with art assets is coming under strain.

Currently we use GIT for source code and SVN for art assets in a single shared repo. No one likes SVN - it was selected as the least bad alternative:

  • Since it's centralized, it's much more in line with what artists expect for revision control - no explaining distributed source control to non-programmers.
  • It doesn't replicate the entire history of an art asset, which is too much data.
  • Parts of a tree can be checked out without paying for the entire tree.
  • There are decent GUIs for every platform.
  • It's scriptable for integration flexibility.
SVN still has some real problems:

  • It is just so slow. You can look at your wire speed and SVN's speed and you're just not getting a fast transfer.Update: this finding is wrong! SVN's speed at transferring binary files is about the same as your wire speed to the server. I'll write up a separate post on speed tests. Many of us are using GUI clients and it is possible that some of them are adding a tax, but the command line SVN client is similar in up/down transfer speed to GIT and rsync for basic data transfer.
  • SVN can't do an incremental update without a working repo, which means having a .svn directory even for the art assets you're not working on. That means at least 2x the disk space on the entire art asset pile, just to be able to get latest.

GIT's Not It

Since I am a programmer, my first thought was: well, clearly GIT can be made to do this, because GIT is the answer to all problems involving files. I spent some time trying to figure out how to shoe-horn GIT into this roll and have concluded that it's not a good idea. GIT simply makes too many fundamental assumptions that are right for source trees and wrong for art asset piles. We'd be fighting GIT's behavior all of the time.

We Kind of Want Rsync

There are two parts of art asset version control: letting the guys who are doing the work make revisions, and letting the people not doing the work get those revisions. It's easy to overlook that second task, but for any given person working on X-Plane, that artist is not working on most of the airplanes, scenery packs, etc.  And the programming team is working on none of them.

For the task of getting art without revision control, rsync would be just great.

  • It can work incrementally.
  • It only gets what you need.
  • It's reasonably fast.
  • It doesn't waste any disk space.
One of the main problems with SVN is performance - if I have to change a branch, having SVN take half an hour to get the new art asset pack I need is pretty painful. So it's at least interesting to look at the architecture rsync implies:

  • Files live on the server.
  • We fetch only the files we want.
  • We basically do a straight network transfer and we don't try anything to clever.
Hrm....I know another program like that.

We Kind of Want The X-Plane Installer/Updater

We solved the problem of getting the latest art assets for all of our users - it's called the X-Plane updater. In case you haven't spent your copious free time wire-sharking our updater, it's really, really simple:

  • All files live on an HTTP server, pre-compressed.
  • A manifest lives on the HTTP server.
  • The client downloads the manifests, compares what it has to what's on the server, then fetches the missing or newer files and decompresses them.
Our installer is (sadly) not content-addressed (meaning a file's name is what is inside it, which naturally removes dupes). If I could redesign it now it would be, but in my defense, GIT wasn't a round when we did the original design. (As a side note, it's way easier to debug server side problems when you are not content addressed. :-)

But we can imagine if it was. If it was, we wouldn't keep a fresh mirror of every version of X-Plane on the server - we'd just have a big pool of content-addressed files (a la GIT) and fetch the subset we need.

Let's Version Control the Manifest

So naively my thinking is that all we need to do is version control our file manifest and we have our art asset management solution.
  • Each atomic revision of a version-controlled art asset pack (at whatever granularity that is) creates a new manifest describing exactly what art assets we have.
  • Art assets are transferred from a loose file dump by syncing the manifest with the local machine.
Here's what is interesting to me: we could use pretty much any source control system and get away with it, because the manifest files are going to be relatively small.

Does This Really Not Exist

I feel like I must be missing something...does a tool like this not already exist?  Please point me in the right direction and call me an idiot in the comments section if someone has already done this!

Importance Sampling: Look Mom, No Weights

For anyone doing serious graphics works, this post will be totally "duh", but it took me a few minutes to get my head straight, so I figure it might be worth a note.

Fair and Balanced or Biased?

The idea of importance sampling is to sample a function in a biased way, where you intentionally bias your samples around where most of the information is. The result is better leverage from your sampling budget.

As an example, imagine that we want to sample a lighting function integrated over a hemisphere, and we know that that lighting function has a cosine term (e.g. it is multiplied by the dot product of the light direction and the normal.)

What this means is that the contributing values of the integration will be largest in the direction of the normal and zero at 90 degrees.

We could sample equally all around the hemisphere to learn what this function does. But every sample around the the outer rim (90 degrees off) of the hemisphere is a total waste; the sampled function is multiplied by cos(90), in other words, zero, so we get no useful information. Spending a lot of our samples on this area is a real waste. Ideally we'd sample more where we know we'll get more information back (near the normal) and less at the base of the hemisphere.

One way we can do this is to produce a sample distribution over the hemisphere with weights. The weight will be inversely proportional to the sample density. We come up with a probability density function - that is, a function that tells us how likely it is that there is information in a given location, and we put more samples where it is high, but with lower weights.  In the high probability regions, we get the sum of lots of small-weight samples, for a really good, high quality sampling. In the low probability region, we put a few high weight samples, knowing that despite the high weight, the contribution will be small.

You can implement this by using a table of sample directions and weights and walking it, and you can get just about any sampling pattern you want.  Buuuuuut...

Lighting Functions - Kill the Middle Man

With this approach we end up with something slightly silly:
  1. We sample a lighting equation at a high density region (e.g. in the middle of a specular highlight).
  2. We end up with a "strong" lighting return, e.g. a high radiance value.
  3. We multiply this by a small weight.
  4. We do this a lot.
In the meantime:
  1. We sample a lighting equation in a low density region.
  2. We end up with a very low radiance value.
  3. We multiply it by a heavy weight.
  4. We do this once.
Note that the radiance result and the weight are always inverses, because the probability density function is designed to match the lighting function. The relative weight of the brightness thus comes from the number of samples (a lot at the specular highlight, very few elsewhere).

We can simplify this by (1) throwing out the weights completely and (2) removing from our lighting equation the math terms that are exactly the same as our probability density function.  Steps 2 and 3 go away, and we can sample a simpler equation with no weighting.

Here's the key point: when you find a probability density function for some part of a lighting equation on the interwebs, the author will have already done this.

An Example

For example, if you go look up the GGX distribution equation, you'll find something like this:

GGX distribution:
float den = NdotH * NdotH * (alpha2 - 1.0f) + 1.0f;
return alpha2 / (PI * den * den);
That's the actual math for the distribution, used for analytic lights (meaning, like, the sun).  The probability density function will be something like this:
float Phi = 2 * PI * Xi.x;
float CosTheta = sqrt( (1 - Xi.y) / ( 1 + (a*a - 1) * Xi.y ) );
float SinTheta = sqrt( 1 - CosTheta * CosTheta );
(In this form, theta of 90 points at your normal vector; Xi is a 2-d variable that uniformly samples from 0,0 to 1,1. The sample at y = 0 samples in the direction of your normal.)

Note that the probability density function contains no weights. That's because the sample density resulting from running this function over a hemisphere (you input a big pile of 0,0 to 1,1 and get out phi/theta for a hemisphere) replaces the distribution function itself.

Therefore you don't need to run that GGX distribution function at all when using this sampling. You simply sample your incoming irradiance at those locations, add them up, divide by the samples and you are done.

Doing It The Silly Way

As a final note, it is totally possible to sample using a probability density function that is not related to your actual lighting equation - you'll need to have sample weights and you'll need to run your full lighting equation at every point.

Doing so is, however, woefully inefficient. While it is better than uniform sampling, it's still miles away from importance sampling with the real probability density function replacing the distribution itself.