In order to understand the problem of sampling and texture coordinates in OpenGL, we need to first define our terms. For the purpose of discussion, I will define two ways of dealing with raster data:
Area-oriented. In this scheme each "pixel" covers a rectangular area. The pixel has a real non-zero size, and thus it is meaningful to talk about the infinitely thin geometric lines that define the left/right and top/bottom of the pixel.
This is the way that 2-d raster APIs like QuickDraw define things. Typically an area-oriented system will define a standard way of referring to a pixel, e.g. a pixel is referred to by its lower-left corner. (Thus, the lower left pixel is number 0,0 and its top right corner is at point 1,1.)
Sample-oriented. In this scheme each data point is a measurement at a location in space - that is, a "sample". We don't know what goes on between samples (because we didn't check).
A raster set of geological soundings might fit this model, e.g. we drilled a hole every 90m and checked for toxic goo. In this scheme, a sample is represented b a point that represents its center, e.g. "where we sampled". How big is a sample? Technically it is infinitely small, but we might care about the space between samples.
So is OpenGL area oriented or sample oriented? Delightfully, it is both!
RasterizationRasterization refers to how OpenGL geometric primitives (points, lines, polygons) turn into filled in pixels on screen. Geometric primitives are, well, they're geometric, meaning a "line" in its abstract form that you issue with GL_LINES is infinitely thin for the process of coordinate processing. Only at rasterization time does it gain thickness.
The pixels of the viewport are treated by area. OpenGL defines a pixel's center, whose location is 0.5 pixels up and to the right of the lower left corner of the pixel (which defines the pixel). In other words, pixel 0,0 goes from 0,0 to 1,1 and its sample-center is 0.5, 0.5.
This is an important link: it tells us that, to calculate the color to show in a pixel, we apply our shader at the
center of the pixel.
For points, if the point size is odd, the point center is snapped to a grid based on floor(p) + 0.5. For even points, we use floor(p+0.5). This basically ensures that the point, as a box of pixels, is aligned to "whole" pixels. Note that the texture coordinates for a pixel should be pretty much what you pass in.
Lines are sort of a nightmare - the spec allows GL implementations to "roll their own" with up to 1 pixel of weirdness relative to the spec algorithm. The only requirements are that lines not double-pixel along the major axis they follow, and that end points not be double-rasterized.
I wrote a quick test program to look at line rasterization and came to the conclusion that it's a fugly mess. If you want a particular set of pixels filled in, I wouldn't use lines. On my 8800, the "decision point" for slipping the grid appears to different for different segments in a line loop, and the loop's shared end point will get lost under certain conditions.
Polygons are perhaps the simplest case in theory: any pixel whose sampling center is inside the mathematical boundaries of the transformed polygon is "in". On-the-line cases are implementation specific but must be deterministic - two adjacent polygons can't share pixels.
To take a simple polygon example: if we draw a rectangle from 0,0 to 8,8 then the first 8x8 pixels will be included. Why? Well, the sampling centers of that first row are: 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5. Those 8 pixels are all clearly inside the range (0-8).
(Note that we use a different inclusion test than a raster-oriented library like QuickDraw, which explicitly defines which pixels are inside a rectanlge - but the results are the same, and we get the same sane result: an 8 pixel wide rectangle fills in 8 pixels.)
Note that in the case of polygons, if your polygon lines run along integral lines, you unambiguously pick up the "contained" pixels, because the test is done at pixel center.
Texture Coordinate InterpretationThe GL will sample the texture for each pixel filled in by our polygon. So what texture coordinates are used per pixel?
The texture coordinates (and all fragment information) is interpolated per fragment (pixel) at the pixel center. This has some interesting consequences. Let's make our life easy assume we are using a 4x4 texture whose ST coordinates run from 0,0 to 1,1.
Texture coordinate interpolation is bathymetric and perspective correct, but for our purposes it's good enough to note that it's affine for affine UV maps, and for a rectangle that means it's a linear interpolation. In plain English: the interpolation is not surprising.
But remember, we are sampling at 0.5, 1.5, 2.5, etc. So our texture coordinates are going to be: 0.0625, 0.1875, 0.3125, ... 0.9375.
Note that the texture is never actually sampled at its edges (0 or 1). This will turn out to be fortuitous later on.
This is also an interesting result for color and other such information. Basically if you use integer input coordinates, the corners of a rectangle do not receive the unmodified input coordinates from GL vertex! There will be a very slight modification to the values due to the need to sample the interior.
(By comparison, if you are using floating point coordinates or diagonal lines, some pixel centers could be 'on the edge' of a polygon. This is why when MSAA is in use, the "centroid" keyword is sometimes necessary - sometimes the pixel centers for sub-samples in the MSAA are actually outside your polygon, but inside the larger single pixel whose center is inside your polygon. I suggest extreme paranoia in your shaders.)
Texel CoordinatesThe actual texels (texture pixels) are numbered similar to the screen: by their lower left corner, that is, the lower left texel is 0,0. Thus the conversion to texel coordinates (I will call this 'UV') from texture coordinates (ST, that is, normalized coordinates) is a simple multiply.
Multiplying out that horrible sequence for our 4x4 texture on an 8x8 quad we get: 0.25, 0.75, 1.25, 1.75, 2.25, 2.75, 3.25, 3.75.
Nearest NeighborNearest neighbor texture filtering is a simple floor function on the UV coordinates. In the case of our quad, this will result in using the following U coordinates for texels: 0, 0, 1, 1, 2, 2, 3, 3. In other words, we get something that looks a lot like the square area of the texture.
What we have here is a huge cascade of changes from area to sample back to area that is perfectly balanced. The screen is area, the fragment is sampled, the texture coordinate is sampled, and then the nearest neighbor gives us the overlapping area.
If there's a rule of thumb, it is: given integral alignment of a texture's texels over screen pixels, nearest neighbor will give you a clean copy.
Linear FilteringLinear interpolation gets a little bit ugly in this case. Linear interpolation attempts to treat the texture as "point" samples and treats the space in between samples as an area of interpolation.
The formula goes something like this:
- Subtract 0.5 from each of the UV coordinates. (This deals with the fact that texel numbers are area based but our UV is a sample.)
- Take the floor of our offset coordinates. The resulting pixel is the lower left corner of a group of 4 (well, 8 if you count the 3-d case, but let's ignore that).
- Take the fractional part of these coordinates. These give us our blend ratios, e.g. a fractional distance of 0,0 means take only the lower left, whereas 0.5, 0.5 means a 25% blend of all four pixels.
Our first two pixels were 0.25, 0.75. That first pixel is sampled from -0.25, meaning the box filter is -1, 0, using 25% of the pixel at -1, and 75% of the pixel at 0. The second pixel is sampled from 0.25, so we take pixels 0 and 1, sampling 75% from pixel 0 and 25% from pixel 1.
What happened here? Well, because our texture is scaled, the center of the texels are now on even integral grid lines on the screen. But screen pixels are sampled at their center. Thus we never have a direct hit on the pixels. The entire texture is going to look a bit "smeared".
Furthermore, we went off the edge. When using linear filtering it's pretty likely that the leftmost pixels are going to blend a texel inside your texture with texels outside your texture. To address this, you need a border, wrapping, clamping, etc.
(If you really trust the math of your favorite GPU, you could add half a texel of ST coordinates to the left edge to assure that linear interpolation stays inside the lines.)
In theory we can offset our scaled up texture by half a texel to realign the texel centers with pixel centers, and we would get a different interpolation: every other pixel would be an unmolested texel, and the texels in between would be 50-50 blends.
Direct Hit With Linear?What if we use an 8x8 texture? Now our texture UV coordinates turn into: 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5.
When we run this through our linear filtering, a funny thing happens: subtract out 0.5 and we get lower left texels of 0,1,2,3,4,5,6,7 and each one has...100% blend!
In other words, a linear filter provides a clean 1:1 copy of the texture as long as the texel centers are aligned with the fragment centers, and this happens when the texture scale is 1:1 and the pixels are on integer grid lines.
If you have ever scrolled a linear texture by sub-pixel amounts on the screen, you've seen this phenomenon: as the image moves into "alignment" the image becomes very sharp (since it is a direct copy - 100% of any given texel used just once) and then it will "blur out". At the half-way point we have 50% of two pixels, e.g. a simple box filter blur.
Copying With Linear Filtering In X-PlaneTo give an example of coping with the problem of on-post linear filtering: in X-Plane we use the following strategy to keep the panel looking as good as possible:
- All panel textures are linearly filtered. We don't go around changing the filter all the time.
- All screen-aligned integral-size pixel rectangles are aligned with the screen grid*. This means they sample with no blurring. We do not make this alignment for elements rotated at odd angles, as they will get sampled anyway.
- When we are at a 1:1 overall ratio of panel to screen, the panel's scroll position is forced to be integral to preserve clean sampling where it is possible.
- When the panel is scaled (not 1:1) we let scrolling float freely; since we are already interpolating everywhere, scrolling makes no difference.
* The panel is defined in pixel coordinates, but it scrolls. So we align our pixel rectangles to the overall "panel" space, so that all pixels, instrument and background are on the same theoretical pixel grid. Thus later if we keep the whole panel aligned with the screen, all of the parts attached to it are aligned too.