I went digging into the Linux thread library source. First: pthread_getspecific (which pulls out a specific variable) works by:
- Finding the thread's "identity" as a pointer to a basic control block.
- Thread-local storage is simply an array at the end of the control block.
- The specific retrieval is just an array access.
In a way that I thought was quite clever: stacks are stored on a fixed granularity/page size. The calling thread has a stack pointer within its own stack. The thread control block is at the very beginning of the stack.
So all the calling code has to do is take a stack-local variable and round its address down to the thread stack granularity and there's the control block. That's pretty quick! No system calls needed.
Edit: here's an even cheaper way.