Wednesday, June 24, 2009

The True Power of NEDMalloc

I have dabbled with using NedMalloc in is faster than the built-in allocator. But sometimes you have to find the right problem to see scalability issues.

I've been hacking at an architecture change (probably going into X-Plane 940) that allows X-Plane to build 3-d forests on an arbitrary number of processors.

(Currently X-Plane 930 can only do this on one processor, albeit not the processor you use to draw the world and fly the plane.)

Going to KBTV with the highest tree settings on an 8 core Mac Pro we get these numbers for "preload" times (that is, the time to pre-generate all nearby forests, which is basically a memory-intensive operation (since the trees are so simple):
  • 9.30: 1 core, 20 seconds.
  • 9.40, OS allocator: 8 cores, 40 seconds.
  • 9.40, NedMalloc: 8 cores, 2 seconds.
I am not making that up. The 8x or better speed-up isn't surprising, but what is incredible is the difference between the system allocator and NedMalloc with only 8 threads!

1 comment:

  1. Wow :-) This sound like a really good thing :-) Now we only need "more memory" to hold all the forests. At least I know some areas (like southern Sweden) where X-Plane easily saturates the whole 32bit address space.