In the past I've focused on time profiles, which tell you where CPU and thread time are being spent, letting you understand (1) why your app isn't fast and (2) why a thread isn't getting things done in a timely manner.
You can also do a system call trace. What's cool about a system call trace is that it picks up both the Kernel traps necessary to talk to the OpenGL driver and all of thread-synchronization primitives.
Below it the yellow and purple bar is a forest extrusion; the purple are VM-based zero-fill calls; the green icon on the end is a call to the GL to buffer up a VBO - turns out this causes zero fill. At the end of the bar we see a blue icon when the worker thread waits on the message queue for the next bit of work to process.
Apple provides another tool to view thread time: Thread Viewer.
On the right is a thread viewer of X-Plane on an 8-core Mac Pro. The bottom solid green line is the main thread, running all the time. The cascading green dots are asynchronous scenery generation (in this case forests are on maximum and the sim is running in a debug mode, which consumes more CPU). As far as I can tell, they cascade because X-Plaen has built 8 worker threads, and each one gets the next work task in turn. Up top are threads built by the sound manager to feed the audio hardware.