The Hacks of Life

Tuesday, October 10, 2006

OpenGL Fogging Artifacts

Here's a video of a very very low visibility approach in X-Plane. Note how the "fog" (that is, the mixing of the runway and ground to gray) pulses in and out as we fly, and they don't do it at the same time. What's going on here?

What you're seeing is a defect in the fixed-function pipeline. The problem is two-fold:

OpenGL implementations are allowed to calculate fog colors at vertices and do a simple interpolation between the vertices.
The vertices that we interpolate between are not necessarily the corners of your triangle; they could be the vertices that OpenGL adds when it clips your triangle to the view frustum.

So we have two sets of artifacts at once. First consider the case of the ground and runways. Since the fogging "interval" (the distance between fog = 0% and fog = 100%) is quite small here, the same amount of fog is spread along the entirity of a runway triangle (about 50 meters deep) and a ground mesh triangle (at least 90 meters deep, but possibly up to 1 km deep). That means that we go from visible to fog much faster over the runway than over the ground.

As we fly, the actual size of the mesh triangles is changing, as part of each mesh triangle scrolls off screen. This in turn affects the gradient of how fast we fog and what the corner fog colors are.

The results are, well, that video: fog doens't match between the runways and the ground, and the particular strange results vary as we fly.

The solution is, like all things in life, to replace the fixed-function pipeline with a pixel shader. The pixel shader can then use a per-fragment value (like the depth value) to fog. This is more expensive (well, probably not really...we have the depth value around and it's the same number of DSP ops) but will produce consistent fog across the entire area.

Monday, October 09, 2006

VBOs, PBOs and FBOs

(The great is the enemy of the good - in preparing my 10-part series on why iostreams totally suck, I've been putting off blogging anything else.)

Apple has posted two very nice bits of sample code demonstrating PBOs and FBOs. Even though these are Mac-specific, as sample code they're good on any platoform because PBOs and FBOs are OpenGL extensions, not windowing system extensions.

So what are all these objects? Here's the situation:

Vertex Buffer Arrays (VBOs)

The VBO is the greatest thing to come to OpenGL since sliced bread. Basically it's a memory buffer containing geometry data that is managed by the driver. You can either tell OpenGL to copy data into the buffer, or temporarily memory map it and write to it. VBOs abstract as an "alternate memory space" - that is you tell OpenGL to read out of the VBO rather than your process's virtuall address space. (In practice the VBO may be in your process's address space too but this is hidden from your app.)

VBOs rule because:

They allow OpenGL to act asynchronously. When you call client arrays, the driver has to copy all of your data immediately before the function returns. Because OpenGL owns the memory in a VBO it can schedule the VBO to be read later and be sure it won't be tampered with. (After all, the only way to "edit" the VBO is via OpenGL.)
VBOs can be in VRAM or at least placed in memory that's easy for the card to get at. This means potentially much faster drawing.

It's important to note that VBOs as objects are unformatted...that is they are just a big chunk of bytes. OpenGL doesn't know or care whether they contain float data, what the ordering is, etc. until the instant you say "draw". (In fact it is an absurd example, but you could draw the same VBO using the data with multiple interpretations.)

PBOs (Pixel Buffer Objects):

The PBO extension doesn't really make a new kind of OpenGL object. It just says "we can use Buffer Objects for pixel operations, not just geometry operations." Normally a VBO can only be used to provide the data to something like glDrawArrays, that is, as vertex data. PBOs says you can also use a VBO as the memory buffer for glReadPixels (copy from framebuffer to memory) and glDrawPixels (copy from memory to framebuffer), glTexImage2d (copy from memory to a texture) and glGetTexImage (texture to memory).

There are some interesting applications for this (if the driver is clever):

Read-pixels can be very slow because we have to finish drawing before we can read the memory. Since the memory in a PBO is owned by the driver and isn't accessed by the application without OpenGL's knowledge, a read-pixels from the framebuffer to a PBO can be asynchronous; the driver will schedule it for once drawing is done. (But see comments on timing below.) This is an important case when OpenGL's output will go to something other than the screen, like making a movie file.
Texture upload can be slow; a PBO allows this to happen asynchronously, which could allow for some kind of fast threaded texture setup. This could also allow for more efficient processing when textures are changed every frame (e.g. playing a video file in an OpenGL scene).
The case that Apple shows in their demo: PBOs and VBOs are not different OpenGL objects - they're using "buffer objects" (that is, untyped blocks of memory) for pixel or vertex data. So you can draw something to the screen, read it (using glReadPixels) into a PBO and then draw it (using glClientArrays) as a VBO. Since the buffer is never touched directly by the application, this "use an image as vertices" trick happens 100% on the graphics card and can be very fast, with no trips over the graphics bus.

One thing to note, however, is that a PBO is not a texture; you must copy from the PBO to a texture before you can draw with it. These are accelerated copies, but they are still copies.

Buffer Objects and Timing

Buffer objects can be used asynchronously until your application tries to use them. When you map them or try to read their data, OpenGL will have to block your thread until the execution of OpenGL commands has caught up. So in order to enjoy the benefits of asynchronous processing, you need to do "some other stuff" between a call to glReadPixels and a call to glGetBufferSubData or glMapBuffer.

(This is very similar to occlusion queries, where if you ask OpenGL how many pixels were drawn right after you draw them, you'll block until the pixels are really drawn. It's always best to assume OpenGL is lagging behind you. To get around this, X-Plane always asks about the number of pixels from the last frame before counting the number in this frame. This gives OpenGL an entire frame's worth of time to do the counting...by that point we can be sure the drawing has been done.)

FBOs (Framebuffer Objects) and, um, Renderbuffer Objects

Framebuffer objects represent places you can draw to other than the screen. This is the extension we've all been waiting for: with framebuffer objects you can draw directly into a texture.

An FBO is actually distinct from a texture - it represents all of the images you draw into at once. (Remember that when we draw we usually have an RGBA image buffer and a 32-bit depth buffer that is separate, and maybe other buffers too.) So the FBO lets us draw into an RGBA texture and a DEPTH-type texture at the same time, or even mix and match.

The FBO extension also lets us create buffers that are not textures for off-screen rendering.

FBOs represent one of the nicest ways to draw into a texture because they're simple and require no copying from the framebuffer to the texture. The FBO extension also has some functions to handle mipmap generation, so you can draw once and let OpenGL build the mipmap pyramid.

In some ways FBOs make PBOs obsolete: the most modern cards allow vertex shaders to read textures. This means that you can use a texture as geometry data directly. And with FBOs we can draw right to a texture. So rather than the old way (draw to the screen, read to a PBO, use it as a VBO, which involves one copy) we can avoid the copy (draw to a texture, then draw again using that texture for vertex data).

On the other hand I'm not sure that FBOs will replace PBOs that fast; PBOs are an older extension and thus more available, and vertex-shader-based texture reading is only available in 7th generation cards (6th for nVidia). So using FBOs avoids a copy but cuts out a whole set of graphics cards, including the very capable 9700-type cards from ATI (we see a lot of these in-field). It seems that a driver could do a reasonable implementation of PBOs even on older hardware, and with the wide to-computer bandwidth of PCIe even a bus transfer isn't the killer it used to be.

One last musing: I don't know how good the performance of texture-based vertex processing will be vs. streaming a VBO. With X-Plane we do occaisionally see a bottleneck trying to read texture memory...that is, the card can't get pixels from its own high-speed memory to its own high-speed shaders via its own high-speed bus fast enough. This is pretty amazing on its own because the memory controllers on a graphics card are extraordinarily fast. But the combination of high anisotropic filtering and full-screen anti-aliasing (combined with a healthy dose of overdraw by poorly behaved applications :-) can really stretch texture bandwidth. So I don't know whether using texture memory for vertex information will cause a further bottleneck.

(One guess is that for cards that can read from texture memory to vertex shaders, they're very new and have DDR3 memory, so texture memory is so fast that PCIe16x bandwidth isn't that important.)

Tuesday, September 19, 2006

Triple-Boot Mac (and MBR hell)

It only took about 8 hours of total time, but my MacBook Pro can now triple-boot into OS X 10.4.7, Windows XP Home SP2, and Ubuntu Linux (6.06 I think).

The install is actually not that bad once you've been through it once...the process is basically:

Use diskutil to dynamically build two new partitions. The Mac comes with one "EFI" partition for boot control and one HFS+ partition. MBR gives you four max, so you build a Linux one followed by a FAT 32 one.
Install Windows XP. Tedious and annoying but not complicated. Windows takes longer to boot from CD than an entire OS X install from scratch.
Install Linux. This is the slightly dangerous part, as I found out.

So two things took a long time:

First, for some reason my brand new, genuine Windows XP CD-ROM doesn't work very well. Setup had at least two hangs, a failure to init a disk, a BSOD, and an assertion failure. Maybe all of those holograms on the disk play havoc with the optical drive? So the biggest single item was a 5 hour Windows install that mostly involved trying to boot setup over and over.

Windows XP asks questions during the install, so you have to sit there and watch it. Furthermore, if you are using rEFIt like I am, then you'll have to manually select Windows each time the installer reboots itself. The installer normally expects to reincarnate itself, so if your machine mysteriously reboots without warning be sure to keep reselecting to boot from the Windows install CD-ROM.

The other problem was that, almost certainly due to my own sleepiness this morning, I managed to splat LILO onto the boot block of the Windows partitiont. Ooops. Most tutorials online state that the only fix for this is a complete reinstall of both Windows and Linux, but fortunately I discovered this is not true.

It turns out that you can very gently run fixmbr on your Windows partition without disturbing triple-boot goodness. It's pretty much that simple...just launch the windows setup CD, wait 30 minutes for it to load every driver ever written, launch the recovery tool and use fixmbr on the appropriate drive. Linux and Mac continue to boot via rEFIt.

Now the only issue is: Windows can't understand Mac or Linux partitions, Linux can't understand Mac partitions, and while the Mac will figure anything out, it's not automounting.

Monday, September 11, 2006

Remote Debugging with WinDBG

This is not really a tutorial but rather a quick summary of how to do this so I can find this information quickly in the future:

On the "debugging server" (the PC with the problem)

Start WinDBG
Type .server npipe:pipe=pipename

On the "debugging client" (the one you're sitting at)

Start WinDBG
Go to File>Connect to Remote Session
For Connection String enter npipe:server=Server, pipe=PipeName [,password=Password] where Server is the hostname/IP and PipeName is the name that the server chose. The password section is optional.

Friday, August 18, 2006

MacBook Pro

This is just me blogging about getting a new toy...I just upgraded my work Mac laptop to one of the new 15" MacBook Pros. Some random musings...

They're really, really nice machines. Apple does a great job on the "appliance" aspect of the machines. The power cable attaches magnetically. The machine comes with a remote control that integrates with every media app that ships with the system. (All of which can share media from a remote server right out of the box. Digital housem anyone?) As always with Apple power management is good - Apple was preaching power management 10 yearse ago when no one knew what that was about, so by this point it's down in all levels of the software.

The MacBook Pros cost a bloody fortune ... I wouldn't own one if it wasn't for work. But the pro models come with a Radeon X1600, a very nice graphics chip for flight sim. By comparison the regular Mac Books ship with Intel integrated graphics, which are just crap...I'll have to rant on that some other time. Bottom line is - the non-pro MacBooks really aren't usable for flight simming, let alone flight sim dev. But for non-flight-sim users, the regular MacBooks aren't a bad way to get a nice Apple media-friendly machine.

Performance is great. Compile and sim speed rival the G5, a full sized heavy-duty desktop, and should be even better when the second gig of mem comes in the mail.

One quirk that I've seen in web forums too: when the machine is on batteries the wireless network will hang up if left idle for too long. Leaving a ping running keeps the card alive, but email checks aren't frequent enough. I haven't found a setting to keep wireless alive all the time.

EDIT: there is one way the G5 hauls the MacBook Pro - obvious but - virtual memory performance from the 7200 RPM SATA drives in the G5 is light-years ahead of the 5400 drive in the laptop. This is one case where a desktop comes through; the G5 is a "no-wait" machine because the paging performance is so good. But that's a form-factor issue.

Thursday, August 17, 2006

Decoding Error Codes

While i'm on the topic of Windows shortcuts, I'll talk about a command prompt tool that'll decode error codes for you quickly.

net helpmsg <code>

Will respond back with a sentance explaining the error. For example:

net helpmsg 17
The system cannot move the file to a different disk drive.

Very handy! The only catch is that the error code needs to be in decimal...not hex!

Service Control - SC

Windows comes with a built in function known as the Service Control (sc.exe) which gives the user access to some pretty interesting capabilities and information about services. One can Create, Stop, Start, Query or Delete any Windows service.

First the syntax:

SC [\\server] [command] [service_name] [Options]

Now the commands:

query  [qryOpt]   Show status
queryEx [qryOpt]  Show extended info - pid, flags
GetDisplayName    Show the DisplayName
GetKeyName        Show the ServiceKeyName
EnumDepend        Show Dependencies
qc                Show config - dependencies, full path etc
start          START a service.
stop           STOP a service
pause          PAUSE a service.
continue       CONTINUE a service.
create         Create a service. (add it to the registry)
config         permanently change the service configuration
delete         Delete a service (from the registry)
control        Send a control to a service
interrogate    Send an INTERROGATE control request to a service
Qdescription   Query the description of a service
description    Change the description of a service
Qfailure       Query the actions taken by a service upon failure
failure        Change the actions taken by a service upon failure
sdShow         Display a service's security descriptor using SDDL
SdSet          Sets a service's security descriptor using SDDL

The output looks something like this:

SERVICE_NAME       : messenger
TYPE               : 20  WIN32_SHARE_PROCESS
STATE              : 4  RUNNING
                      (STOPPABLE,NOT_PAUSABLE,ACCEPTS_SHUTDOWN)
WIN32_EXIT_CODE    : 0  (0x0)
SERVICE_EXIT_CODE  : 0  (0x0)
CHECKPOINT         : 0x0
WAIT_HINT          : 0x0

It can be useful from time to time to get some more in-depth information about a service that the administrator panel doesn't provide.