Tuesday, July 07, 2009

GeoTIFF and Off-By-One

I fear I have stumbled into GeoTIFF off-by-one hell. There is surprisingly little written about this, considering that (1) GeoTIFF is somewhat widely used and (2) the whole point of GeoTIFF is that the file tells you where the data goes. So...here is a summary of what I've stumbled into.

GeoTIFF has a flag to indicate whether pixels are "pixel is area" or "pixel is point". To summarize:
• Pixel is area means that each pixel is a rectangle, with infinitely thin coordinates bounding all four sides. The sample data represents a summary of the entire area, or maybe a sample at the center of the area.
• Pixel is point means that each pixel is the infinitely tiny intersection of two infinitely thin grid-points and represents the data as sampled at that point.
Who cares? Well, it turns out it matters. Imagine we have "3 second" data, which means that there are 1200 samples per degree, each one approximately 90 meters apart. (This is geographic projection, so the postings vary with latitude - I mention 90 meters only for the "feel" of the data.)

Let's consider what we have to do if we want data that covers exactly one degree.

If we are using area pixels, that one degree needs 1200 samples, each one 1/1200th of a degree wide. The left edge of the left most sample would be on a longitude line and the right edge of the right-most sample would be too. The next tile should have no data duplication from this tile.

If we are using point pixels, that one degree needs 1201 samples, each 1/1200th of a degree apart, but with no width on their own. The longitude line of the left edge has samples exactly on it, as does the right edge. And in fact, the right edge of our tile is a duplicate of the left-edge of the previous tile.

We could choose to exclude two of the four edges to in the pixel-center case to avoid duplication, but when we're tiling with pixel-center data, it is sort of nice to have the entire tile.

SRTM

The SRTM is pixel-center, at least in the raw data that JPL provides; we can only assume that every derived product should be that way too.

When you download a seamless SRTM tile from the USGS you get 1201 pixels, tagged as area pixels, and a bounding box that goes 1.5 arc-seconds outside the bounds you might expect. This is a little bit strange (IMO) but more or less correct.

The CGIAR SRTMs in GeoTIFF appear to be shifted by half a pixel: while the ARC ASCII files come in 6001 blocks that are clearly point-pixel (and this is what we would expect from a SRTM-derived product) the GeoTIFF version provides 6000 postings, but is listed as area-pixel with bounds on tile boundaries. Visual inspection indicates that the south and east rows have been dropped to avoid duplication; this would imply that the bounding box should be inset on the south and east edges by 1.5 arc seconds (half a pixel) and outside on the other edges.

Follow-Up 4/1/11: the v4 CGIAR SRTM files appear to have been recut at 6001x6001 for correct pixel-center formatting. The GeoTIFFs are marked as area pixels and the bounding box is expanded by 0.5 samples on all sides.

NED

NED appears to truly be area aligned (unlike SRTM); every form I have seen has been 3600 pixels centered on a lat-lon tile. In this case area pixels is an appropriate way to tag the data set.