Friday, December 26, 2014

OpenGL ES Performance: The iPhone 4 Performance Gap

Now that we've shipped X-Plane 10 Mobile, I can blog the occasional OpenGL ES performance note without revealing a stealth project.

X-Plane 10 Mobile requires iOS 8.  One reason why Chris set the OS requirement so high was to intentionally exclude the iPhone 4 from our device set.

For most of the project, we worked with the iPhone 4 as our minimum device, and for the entire project, it suffered performance problems that we didn't see in any of the newer devices.  The iPad 2 and iPhone 4S (the next hardware up) both performed significantly better.

I don't know what caused the gap, but it wasn't a "this phone is 20% slower" or "this ipad has 4x the shader units" kind of gap.  It was more like "this code runs fine on newer devices and we can just see the iPhone 4 trying to beat itself to death with its old-style dock connector so it doesn't have to run another frame of our main rendering loop".

I do not know what was going on under the hood, but I can share a few observations besides "If the 4 is having performance problems, it may be specific to the iPhone 4."

  • The iPhone 4 was the only device that would get bottlenecked on vertex count.  This was a real problem for us because we had models that we couldn't cut vertex count on without our artists spending a ton of time.  We had already LODed out everything expendable on the 4 and we were still getting jammed in the 3-d cockpit.
  • The iPhone 4 is very sensitive to the number of varyings and how they are packed!!!!  I found that packing fog and emissive light level into a 2-component varying significantly improved performance compared to having them as individual scalers.  (Of course, cutting the number of varyings made the biggest improvement.)
  • The iPhone 4 seemed to be spending significant driver time recycling the pool of "orphaned" buffers - that is, VBOs that had been completely discarded and respecified on a per-frame basis.
I can't say what was going on inside the driver, but I can say that all of these things were changes in the kinds of performance problems we were having, not just a matter of degree.

Once we cut the iPhone 4 and the 4S became the "runt of the litter" handling the low-end became a lot easier.  The iPhone 4S is incredibly capable for an older-generation smart-phone, and while it is the first to run out of CPU cycles or fill rate, the losses were proportional to spec, not "fall on your face and die."

I'm hoping to post a little bit more about performance tuning OpenGL ES in future posts, but from this point forward, any advice I have will apply to the 4S and above.  Having cut the iPhone 4 from our product, I no longer have time to figure out what makes it go fast.*

* One of the difficulties of OpenGL and OpenGL ES is that while the spec specifies what the APIs do, they don't specify how fast they do it.  Performance isn't guaranteed, deterministic, or often even published except in IHV presentations.  One of the big pluses of Metal (and possibly GLNext) is deterministic performance - an API that tells you what will be expensive and what won't be.