Well, it turns out you need to be safe from four things, two in your compiler and two in hardware.
(As a side note, the Apple perf-optimization list is a great resource...I didn't know about sequence points until this post.)
An optimizing compiler will try to reorder code in ways that it thinks might be faster but won't change the meaning of execution. Since the compiler doesn't know or care what other threads are doing, this could be really bad.
C/C++ have the notion of sequence points - basically the optimizer isn't allowed to move code around a sequence point. There are a few types of sequence points but the most useful one is the function call. Call a function, and you've divided your code in half.
In C/C++ volatile memory is memory that the compiler has to write to when you tell it to. Normally the compiler might analyze the side effects of a chunk of code and change exactly how it talks to memory to improve performance. Between sequence points, the compiler can do what it wants as long as the end result is the same.
But when a variable is volatile, the compiler must write to it once each time your code says to. The original classic use would be for I/O registers, where the write operation does something, dn you need to do it exactly once or twice, or however many times the code says.
Generally you'll want the variables you use as atomics to be volatile, so that a lock gets written when you want it to be.
The next thing you need to be sure of is that your atomic operation really happens all at once. How this happens depends on the CPU, but fortunately you don't care - as an applications writer you use an operating system function like InterlockedExchangeAdd or __sync_add_and_fetch. All the major operating systems have these functions.
For example, incrementing a variable would traditionally be three instructions: a load, an add, and a store. This isn't very atomic; something could happen between the load and store. Using some CPU-specific technology, the OS atomic operation guarantees that the operation happens all at once. If you have a system with multiple cores and multiple caches, the atomic operation makes sure that future code isn't fooled by old cached values.
The final part of the puzzle are memory barriers. Basically some CPUs have some freedom to reorder the reading and writing to memory. Normally this is done in a way that still produces consistent results. But if you are going to write data, and then write "unlock", you don't want those things to be reversed - writing unlock could allow another thread to look at the data...it has to be there.
A memory barrier is a synchronization point in the stream of reads and writes...basically all the writing on one side of the barrier is done before the reading on another.
Four In One
Fortunately operating systems typically give you all four properties in one place:
- The atomic operations are functions, so they are sequence points.
- They take inputs of type volatile T *.
- They are atomic in implementation.
- They have a barrier built in (or there is a variant that includes a barrier - use that one).