Apple has posted two very nice bits of sample code demonstrating PBOs and FBOs. Even though these are Mac-specific, as sample code they're good on any platoform because PBOs and FBOs are OpenGL extensions, not windowing system extensions.
So what are all these objects? Here's the situation:
Vertex Buffer Arrays (VBOs)
The VBO is the greatest thing to come to OpenGL since sliced bread. Basically it's a memory buffer containing geometry data that is managed by the driver. You can either tell OpenGL to copy data into the buffer, or temporarily memory map it and write to it. VBOs abstract as an "alternate memory space" - that is you tell OpenGL to read out of the VBO rather than your process's virtuall address space. (In practice the VBO may be in your process's address space too but this is hidden from your app.)
VBOs rule because:
- They allow OpenGL to act asynchronously. When you call client arrays, the driver has to copy all of your data immediately before the function returns. Because OpenGL owns the memory in a VBO it can schedule the VBO to be read later and be sure it won't be tampered with. (After all, the only way to "edit" the VBO is via OpenGL.)
- VBOs can be in VRAM or at least placed in memory that's easy for the card to get at. This means potentially much faster drawing.
PBOs (Pixel Buffer Objects):
The PBO extension doesn't really make a new kind of OpenGL object. It just says "we can use Buffer Objects for pixel operations, not just geometry operations." Normally a VBO can only be used to provide the data to something like glDrawArrays, that is, as vertex data. PBOs says you can also use a VBO as the memory buffer for glReadPixels (copy from framebuffer to memory) and glDrawPixels (copy from memory to framebuffer), glTexImage2d (copy from memory to a texture) and glGetTexImage (texture to memory).
There are some interesting applications for this (if the driver is clever):
- Read-pixels can be very slow because we have to finish drawing before we can read the memory. Since the memory in a PBO is owned by the driver and isn't accessed by the application without OpenGL's knowledge, a read-pixels from the framebuffer to a PBO can be asynchronous; the driver will schedule it for once drawing is done. (But see comments on timing below.) This is an important case when OpenGL's output will go to something other than the screen, like making a movie file.
- Texture upload can be slow; a PBO allows this to happen asynchronously, which could allow for some kind of fast threaded texture setup. This could also allow for more efficient processing when textures are changed every frame (e.g. playing a video file in an OpenGL scene).
- The case that Apple shows in their demo: PBOs and VBOs are not different OpenGL objects - they're using "buffer objects" (that is, untyped blocks of memory) for pixel or vertex data. So you can draw something to the screen, read it (using glReadPixels) into a PBO and then draw it (using glClientArrays) as a VBO. Since the buffer is never touched directly by the application, this "use an image as vertices" trick happens 100% on the graphics card and can be very fast, with no trips over the graphics bus.
Buffer Objects and Timing
Buffer objects can be used asynchronously until your application tries to use them. When you map them or try to read their data, OpenGL will have to block your thread until the execution of OpenGL commands has caught up. So in order to enjoy the benefits of asynchronous processing, you need to do "some other stuff" between a call to glReadPixels and a call to glGetBufferSubData or glMapBuffer.
(This is very similar to occlusion queries, where if you ask OpenGL how many pixels were drawn right after you draw them, you'll block until the pixels are really drawn. It's always best to assume OpenGL is lagging behind you. To get around this, X-Plane always asks about the number of pixels from the last frame before counting the number in this frame. This gives OpenGL an entire frame's worth of time to do the counting...by that point we can be sure the drawing has been done.)
FBOs (Framebuffer Objects) and, um, Renderbuffer Objects
Framebuffer objects represent places you can draw to other than the screen. This is the extension we've all been waiting for: with framebuffer objects you can draw directly into a texture.
An FBO is actually distinct from a texture - it represents all of the images you draw into at once. (Remember that when we draw we usually have an RGBA image buffer and a 32-bit depth buffer that is separate, and maybe other buffers too.) So the FBO lets us draw into an RGBA texture and a DEPTH-type texture at the same time, or even mix and match.
The FBO extension also lets us create buffers that are not textures for off-screen rendering.
FBOs represent one of the nicest ways to draw into a texture because they're simple and require no copying from the framebuffer to the texture. The FBO extension also has some functions to handle mipmap generation, so you can draw once and let OpenGL build the mipmap pyramid.
In some ways FBOs make PBOs obsolete: the most modern cards allow vertex shaders to read textures. This means that you can use a texture as geometry data directly. And with FBOs we can draw right to a texture. So rather than the old way (draw to the screen, read to a PBO, use it as a VBO, which involves one copy) we can avoid the copy (draw to a texture, then draw again using that texture for vertex data).
On the other hand I'm not sure that FBOs will replace PBOs that fast; PBOs are an older extension and thus more available, and vertex-shader-based texture reading is only available in 7th generation cards (6th for nVidia). So using FBOs avoids a copy but cuts out a whole set of graphics cards, including the very capable 9700-type cards from ATI (we see a lot of these in-field). It seems that a driver could do a reasonable implementation of PBOs even on older hardware, and with the wide to-computer bandwidth of PCIe even a bus transfer isn't the killer it used to be.
One last musing: I don't know how good the performance of texture-based vertex processing will be vs. streaming a VBO. With X-Plane we do occaisionally see a bottleneck trying to read texture memory...that is, the card can't get pixels from its own high-speed memory to its own high-speed shaders via its own high-speed bus fast enough. This is pretty amazing on its own because the memory controllers on a graphics card are extraordinarily fast. But the combination of high anisotropic filtering and full-screen anti-aliasing (combined with a healthy dose of overdraw by poorly behaved applications :-) can really stretch texture bandwidth. So I don't know whether using texture memory for vertex information will cause a further bottleneck.
(One guess is that for cards that can read from texture memory to vertex shaders, they're very new and have DDR3 memory, so texture memory is so fast that PCIe16x bandwidth isn't that important.)
No comments:
Post a Comment