Tuesday, October 29, 2019

Klingons Do Not Release Software

Brent Simmons on ETAs for software releases:
This is all just to say that app-making is nothing like building a house. It’s more like building the first house ever in the history of houses, with a pile of rusty nails and your bare hands, in a non-stop tornado.
He put this at the end of the article, but I think it's the most important lesson here: it's hard to estimate how long a coding task will take because every coding task that you undertake that is worth something to your business is the very first time your business has made anything like that task.

This comes from the fundamentally reusable nature of software.  We see this with X-Plane.  We have decent estimates for how long aircraft will take to build because each aircraft, while different from the others in terms of its aerodynamics, look, texturing, modeling, is the same in terms of the kind of tasks: 3-d modeling, UV mapping, animation, etc.

On the software side, we do everything exactly once. Once we rewrite the entire rendering engine to run under Vulkan, we will never do that again, because we will have Vulkan support, and coding it twice would be silly.

Almost every software feature ends up like this - the feature is fundamentally novel compared to the last one because you don't need a second copy of code.

Brent finishes up the article with this:
The only reason anything ever ships is because people just keep working until it’s ready.
But this is only half-true. Code also ships when the scope of the project is reduced to the things you have already finished. And I think this is important to remember because reducing scope is the one thing that actually reduces ship times.

This reminded me of the old nerd joke:
Klingons do not release software. Klingon software escapes, leaving a bloody trail of design engineers and quality assurance testers in its wake.
On a big, unwieldy, messy release with seat-of-the-pants project management, that's a pretty good description of what happens - at some point the overwhelming business force of needing to ship dominates what's left in the todo pile and the software smashes through the wall, Kool-Aid style and gives the users a big "Oh Yeah!"

I think Brent's definition is right, as long as we recognize that "ready" is a dynamic term that can change based on market conditions and the better understanding of what you really need to ship that you get from implementation. If your app figures out it's ready before you do, you might have Klingon software on your hands.

Saturday, September 14, 2019

Hardware Performance: Old and New iPhones and PCs

I am looking at performance on our latest revision of X-Plane Mobile, and I've noticed that the performance bottlenecks have changed from past versions.

In the past, we have been limited by some of everything - vertex count, CPU-side costs (mostly in the non-graphics code, which is ported from desktop), and fill rate/shading costs. This time around, something has changed - performance is pretty much about shading costs, and that's the whole ballgame.

This is a huge relief! Reducing shading costs is a problem for which real-time graphics has a lot of good solutions. It's normally the problem, so pretty much all rendering techniques find ways to address it.

How did this shift happen? I think part of it is that Apple's mobile CPUs have just gotten really, really good. The iPhone X has a single-core GeekBench 4 score of 4245. By comparison, the 2019 iMac with an i5-8500 at 3 ghz gets 5187. That's just not a huge gap between a full-size gaming mid-range desktop computer and...a phone.

In particular, we've noticed that bottlenecks that used to show up only on mobile have more or less gone away. My take on this is that Apple's mobile chips can now cope with not-hand-tuned code as well Intel's can, e.g. less than perfect memory access patterns, data dependencies, etc. (A few years ago, code that used to be good enough that hand tuning wouldn't be a big win would need some TLC for mobile. It's a huge dev time win to have that not be the case anymore.)

If I can summarize performance advice for today's mobile devices in a single item, it would be: "don't blend." The devices easily have enough ALU and texture bandwidth to run complex algorithms (like full PBR) at high res as long as you're not shading the full screen multiple times. Because the GPUs are tiling, the GPU will eliminate virtually any triangles that are not visible in the final product as long as blending is off.

By comparison, on desktop GPUs we find that utilization is often the biggest problem - that is, the GPU has the physical ALU and bandwidth to over-draw the scene multiple times if blending is left on, but the frame is made up of multiple smaller draw calls, and the GPU can't keep the entire card fed given the higher frequency of small jobs.

(We did try simplifying the shading environment - a smaller number of shaders doing more work by moving what used to be shader changes into conditional logic on the GPU. It was not a win! I think the problem is that if any part of a batch is changed, it's hard for the GPU to keep a lot of work in flight, even if the batch is "less different" than before optimization. Since we couldn't get down to "these batches are identical, merge them" we ended up with similarly poor utilization and higher ALU costs while shading.)

Saturday, April 06, 2019

Keeping the Blue Side Up: Coordinate Conventions for OpenGL, Metal and Vulkan

OpenGL, Metal and Vulkan all have different ideas about which way is up - that is, where the origin is located and which way the Y axis goes for a framebuffer. This post explains the API differences and suggests a few ways to cope with them. I'm not going to cover the Z axis or Z-buffer here - perhaps that'll be a separate post.

Things We Can All Agree On

Let's start with some stuff that's the same for all three APIs: in all three APIs the origin of a framebuffer and the origin of a texture both represent the lowest byte in memory for the data backing that image. In other words, memory addressing starts at 0,0 and then increases as we go the right, then as we go to the next line.  Whether we are in texels in a texture or pixels in a framebuffer, this relationship holds up.

This means that your model's UV maps and textures will Just Work™ in all three APIs. When your artist puts 0,0 into that giant fighting robot's UV map, the intent is "the texels at the beginning of memory for that texture."  You can load the image into the API the same way on all platforms and the UV map will pull out the right texels and the robot will look shiny.

All three APIs also agree on the definition of a clockwise or counterclockwise polygon - this decision is made in framebuffer coordinates as a human would see it if presented to the screen.  This works out well - if your model robot is drawing the way you expect, the windings are the way your artist created them, and you can keep your front face definition consistent across APIs.

Refresher: Coordinate Systems

For the purpose of our APIs, we care about three coordinate systems:

  • Clip Coordinates: these are the coordinates that come out of your shader. It's often to think in terms of normalized device coordinates (NDC) - the post-clip, post-perspective divide coordinates - but you don't get to see them.
  • Framebuffer coordinates. These are the coordinates that are rasterized, after the NDC coordinates are transformed by the viewport transform.
  • Texture coordinates. These are the coordinates we feed into the samplers to read from textures. They're not that interesting because, per above, they work the same on all APIs.

OpenGL: Consistently Weird

OpenGL's conventions are different from approximately every other API ever, but at least they are self-consistent: every single origin in OpenGL is in the lower left corner of the image, so the +Y axis is always up. +Y is up in clip coordinates, NDC, and framebuffer coordinates.

What's weird about this is that every window manager ever uses +Y = down, so your OpenGL driver is prrrrrobably flipping the image for you when it sends it off to the compositor or whatever your OS has. But after 15+ years of writing OpenGL code, +Y=up now seems normal to me, and the consistency is nice. One rule works everywhere.*

In the OpenGL world, we render with the +Y axis being up, the high memory of the framebuffer is the top of the image, which is what the user sees, and if we render to texture, the higher texel coordinates are that top of the image, so everything is good. You basically can't mess this system up.

Metal: Up is Down and Down is Up

Metal's convention is to have +Y = up in clip coordinates (and NDC) but +Y = down in framebuffer coordinates, with the framebuffer origin in the upper left. While this is baffling to programmers coming from GL/GLES, it feels familiar to Direct3d programmers.  In Metal, the viewport transformation has a built-in Y flip that you can't control at the API level.

The window manager presents Metal framebuffers with the lowest byte in the upper left, so if you go in with a model that transforms with +Y = up (OpenGL style), your image will come out right side up and all is good. But be warned, chaos lurks beneath the surface.

Metal's viewport and scissors are defined in framebuffer coordinates, so they now run +Y=down, and will require parameter adjustment to match OpenGL.

Note also that our screenshot code that reads back the framebuffer will have to run differently on OpenGL and Metal; one (depending on your output image file format) will require an image flip, and one will not.

Render-to-Texture: Two Wrongs Make a "Ship It"

Here's the problem with Metal: let's say we draw a nice scene with blue sky up top and green grass on the bottom. We're going to use it as an environment input and sample it. Our OpenGL code expects that low texture coordinates (near 0) get us the green grass at the bottom of memory and high texture coordinates (near 1) get us the blue sky at the top of memory.

Unfortunately in the render-to-texture case, Metal's upper-left origin has been applied - the sky is now in low memory and the grass is in high memory, and our code that samples this image will show something upside-down and probably quite silly looking.

We have two options:
  1. Adjust the image at creation time by hacking the transform matrix or
  2. Adjusting the code that uses the image by adjusting the sampling coordinates.

For X-Plane, we picked door number 1 - intentionally render the image upside down (by Metal standards, or "the way it was meant to be" by OpenGL standards) so that the image is oriented as the samplers expect.

Why do it this way? Well, in our case, we often have shaders that sample both from images on disk and from rendered textures; if we flip our textures on disk (to match Metal's default framebuffer orientation) then we have to adjust every UV map that references a disk image, and that's a huge amount of code, because it covers all shaders and C++ code that generate UV maps. Focusing on render-to-texture is a smaller surface area to attack.

For Metal, we need to intentionally flip the Y coordinate by applying a Y-reverse to our transform stack - in our case this also meant ensuring that every shader used the transform stack; we had a few that were skipping it and had to be set up with identity transforms so the low level code could slip in the inversion.

We also need to change our front face's winding order, because winding orders are labeled in the API based on what a human would see if the image is presented to the screen. By mirroring our image to be upside down, we've also inverted all of our models' triangle windings, so we need to change our definition of what is in front.

Sampling with the gl_FragCoord or [[position]]: Three Wrongs Make a "How Did I Get Here"?

There's one more loose end with Metal: if you wrote a shader that uses gl_FragCoord to reconstruct some kind of coordinates based on the window rasterization position, they're going to be upside-down from what your shader did in OpenGL.  The upper left of your framebuffer will rasterize with 0,0 for its position, and if you pass this on to a texture sampler, you're going to pick off low memory.

Had we left well enough alone, this would have been fine, as Metal wants to put the upper left of an image in low memory when rasterizing. But since we intentionally flipped things, we're now...upside down again.

Here we have two options:

  1. Don't actually flip the framebuffer when rendering to texture. Maybe that was a dumb idea.
  2. Insert code to flip the window coordinates.
For X-Plane we do both: some render targets are intentionally rasterized at API orientation (and not X-Plane's canonical lower-left-origin orientation) specifically so they can be resampled using window positions.  For example, we render a buffer that is sampled to get per-pixel fog, and we leave it at API orientation to get correct fogging.

Flipping the window coordinate in the sampled code makes sense when the window position is going to be used to reconstitute some kind of world-space coordinate system.  Our skydome, for example, is drawn as a full screen quad that calculates the ray that would project through the point in question. It takes as inputs the four corners of the view frustum, and swapping those in C++ fixes our sampling to match our upside down^H^H^H^H^H^HOpenGL-and-perfect-just-the-way-it-is image.

What Have We Learned (About Metal)

So to summarize, with Metal:

  • If we're going to render to a texture for a model, we put a Y-flip into our transform stack and swap our front face winding direction.
  • If we're going to render to a texture for sampling via window coordinates, we don't.
  • If we're going to use window coordinates to reconstruct 3-d, we have to swap the reconstruction coefficients.

Vulkan: What Would Spock Do?


Apparently: headstands! Vulkan's default coordinate orientation is +Y=down, full stop. The upper left of the framebuffer is the origin, and there's no inversion of the Y axis. This is consistent, but it's also consistently different from every other 3-d API ever, in that the Y axis in clip coordinates is backward from OpenGL, Metal, and DX.

The good news is: with Vulkan 1.1 you can specify a negative viewport height, which gives you a Y axis swap. With this trick, Vulkan maches DX and Metal, and all you have to worry about it is all of the craziness listed above.


* a side effect of this is: when I built our table UI component, I defined the bottom-most row of the table as row 0. My co-workers incorrectly think this is a weird convention for UI, and one of them went as far as to write a table row flipping function called
correct_row_indexes_because_bens_table_api_was_intentionally_written_backwards_so_we_dont_ask_him_to_write_UI_controls_any_more.
The moral of the story is that +Y=up is both consistent and a great way to get out of being asked to maintain old and poorly thought out UI widgets.  I trust my co-worker will come around to my way of thinking in another fifteen years.