The Hacks of Life: 10/01/2006

Here's a video of a very very low visibility approach in X-Plane. Note how the "fog" (that is, the mixing of the runway and ground to gray) pulses in and out as we fly, and they don't do it at the same time. What's going on here?

What you're seeing is a defect in the fixed-function pipeline. The problem is two-fold:

OpenGL implementations are allowed to calculate fog colors at vertices and do a simple interpolation between the vertices.
The vertices that we interpolate between are not necessarily the corners of your triangle; they could be the vertices that OpenGL adds when it clips your triangle to the view frustum.

So we have two sets of artifacts at once. First consider the case of the ground and runways. Since the fogging "interval" (the distance between fog = 0% and fog = 100%) is quite small here, the same amount of fog is spread along the entirity of a runway triangle (about 50 meters deep) and a ground mesh triangle (at least 90 meters deep, but possibly up to 1 km deep). That means that we go from visible to fog much faster over the runway than over the ground.

As we fly, the actual size of the mesh triangles is changing, as part of each mesh triangle scrolls off screen. This in turn affects the gradient of how fast we fog and what the corner fog colors are.

The results are, well, that video: fog doens't match between the runways and the ground, and the particular strange results vary as we fly.

The solution is, like all things in life, to replace the fixed-function pipeline with a pixel shader. The pixel shader can then use a per-fragment value (like the depth value) to fog. This is more expensive (well, probably not really...we have the depth value around and it's the same number of DSP ops) but will produce consistent fog across the entire area.

(The great is the enemy of the good - in preparing my 10-part series on why iostreams totally suck, I've been putting off blogging anything else.)

Apple has posted two very nice bits of sample code demonstrating PBOs and FBOs. Even though these are Mac-specific, as sample code they're good on any platoform because PBOs and FBOs are OpenGL extensions, not windowing system extensions.

So what are all these objects? Here's the situation:

Vertex Buffer Arrays (VBOs)

The VBO is the greatest thing to come to OpenGL since sliced bread. Basically it's a memory buffer containing geometry data that is managed by the driver. You can either tell OpenGL to copy data into the buffer, or temporarily memory map it and write to it. VBOs abstract as an "alternate memory space" - that is you tell OpenGL to read out of the VBO rather than your process's virtuall address space. (In practice the VBO may be in your process's address space too but this is hidden from your app.)

VBOs rule because:

They allow OpenGL to act asynchronously. When you call client arrays, the driver has to copy all of your data immediately before the function returns. Because OpenGL owns the memory in a VBO it can schedule the VBO to be read later and be sure it won't be tampered with. (After all, the only way to "edit" the VBO is via OpenGL.)
VBOs can be in VRAM or at least placed in memory that's easy for the card to get at. This means potentially much faster drawing.

It's important to note that VBOs as objects are unformatted...that is they are just a big chunk of bytes. OpenGL doesn't know or care whether they contain float data, what the ordering is, etc. until the instant you say "draw". (In fact it is an absurd example, but you could draw the same VBO using the data with multiple interpretations.)

PBOs (Pixel Buffer Objects):

The PBO extension doesn't really make a new kind of OpenGL object. It just says "we can use Buffer Objects for pixel operations, not just geometry operations." Normally a VBO can only be used to provide the data to something like glDrawArrays, that is, as vertex data. PBOs says you can also use a VBO as the memory buffer for glReadPixels (copy from framebuffer to memory) and glDrawPixels (copy from memory to framebuffer), glTexImage2d (copy from memory to a texture) and glGetTexImage (texture to memory).

There are some interesting applications for this (if the driver is clever):

Read-pixels can be very slow because we have to finish drawing before we can read the memory. Since the memory in a PBO is owned by the driver and isn't accessed by the application without OpenGL's knowledge, a read-pixels from the framebuffer to a PBO can be asynchronous; the driver will schedule it for once drawing is done. (But see comments on timing below.) This is an important case when OpenGL's output will go to something other than the screen, like making a movie file.
Texture upload can be slow; a PBO allows this to happen asynchronously, which could allow for some kind of fast threaded texture setup. This could also allow for more efficient processing when textures are changed every frame (e.g. playing a video file in an OpenGL scene).
The case that Apple shows in their demo: PBOs and VBOs are not different OpenGL objects - they're using "buffer objects" (that is, untyped blocks of memory) for pixel or vertex data. So you can draw something to the screen, read it (using glReadPixels) into a PBO and then draw it (using glClientArrays) as a VBO. Since the buffer is never touched directly by the application, this "use an image as vertices" trick happens 100% on the graphics card and can be very fast, with no trips over the graphics bus.

One thing to note, however, is that a PBO is not a texture; you must copy from the PBO to a texture before you can draw with it. These are accelerated copies, but they are still copies.

Buffer Objects and Timing

Buffer objects can be used asynchronously until your application tries to use them. When you map them or try to read their data, OpenGL will have to block your thread until the execution of OpenGL commands has caught up. So in order to enjoy the benefits of asynchronous processing, you need to do "some other stuff" between a call to glReadPixels and a call to glGetBufferSubData or glMapBuffer.

(This is very similar to occlusion queries, where if you ask OpenGL how many pixels were drawn right after you draw them, you'll block until the pixels are really drawn. It's always best to assume OpenGL is lagging behind you. To get around this, X-Plane always asks about the number of pixels from the last frame before counting the number in this frame. This gives OpenGL an entire frame's worth of time to do the counting...by that point we can be sure the drawing has been done.)

FBOs (Framebuffer Objects) and, um, Renderbuffer Objects

Framebuffer objects represent places you can draw to other than the screen. This is the extension we've all been waiting for: with framebuffer objects you can draw directly into a texture.

An FBO is actually distinct from a texture - it represents all of the images you draw into at once. (Remember that when we draw we usually have an RGBA image buffer and a 32-bit depth buffer that is separate, and maybe other buffers too.) So the FBO lets us draw into an RGBA texture and a DEPTH-type texture at the same time, or even mix and match.

The FBO extension also lets us create buffers that are not textures for off-screen rendering.

FBOs represent one of the nicest ways to draw into a texture because they're simple and require no copying from the framebuffer to the texture. The FBO extension also has some functions to handle mipmap generation, so you can draw once and let OpenGL build the mipmap pyramid.

In some ways FBOs make PBOs obsolete: the most modern cards allow vertex shaders to read textures. This means that you can use a texture as geometry data directly. And with FBOs we can draw right to a texture. So rather than the old way (draw to the screen, read to a PBO, use it as a VBO, which involves one copy) we can avoid the copy (draw to a texture, then draw again using that texture for vertex data).

On the other hand I'm not sure that FBOs will replace PBOs that fast; PBOs are an older extension and thus more available, and vertex-shader-based texture reading is only available in 7th generation cards (6th for nVidia). So using FBOs avoids a copy but cuts out a whole set of graphics cards, including the very capable 9700-type cards from ATI (we see a lot of these in-field). It seems that a driver could do a reasonable implementation of PBOs even on older hardware, and with the wide to-computer bandwidth of PCIe even a bus transfer isn't the killer it used to be.

One last musing: I don't know how good the performance of texture-based vertex processing will be vs. streaming a VBO. With X-Plane we do occaisionally see a bottleneck trying to read texture memory...that is, the card can't get pixels from its own high-speed memory to its own high-speed shaders via its own high-speed bus fast enough. This is pretty amazing on its own because the memory controllers on a graphics card are extraordinarily fast. But the combination of high anisotropic filtering and full-screen anti-aliasing (combined with a healthy dose of overdraw by poorly behaved applications :-) can really stretch texture bandwidth. So I don't know whether using texture memory for vertex information will cause a further bottleneck.

(One guess is that for cards that can read from texture memory to vertex shaders, they're very new and have DDR3 memory, so texture memory is so fast that PCIe16x bandwidth isn't that important.)

The Hacks of Life

Tuesday, October 10, 2006

OpenGL Fogging Artifacts

Monday, October 09, 2006

VBOs, PBOs and FBOs