As expected, SGI have used the SC09 show to launch their latest single system image NUMA machine – the Altix UV. The specs are impressive – not only because SGI have dropped in Nehalem EX processors (as expected) – but the improvements in core density and NUMAlink bandwidth are also impressive. 
As with the previous Origin and Altix machines, the Altix UV is available in two models. The Altix UV 100 is the familiar 3u ‘building block’ system, allowing you to scale up as needed. Altix UV 100 scales to 96 sockets (768 cores) and 6TB of shared memory in two racks for up to a claimed 7.0 Tflops of performance.
Altix UV 1000 is the big daddy, scaling up to 256 sockets (giving 2,048 processor cores) and 16TB of shared memory in four racks, for up to a claimed 18.6 Tflops of performance. Interestingly, the 16TB memory limit is imposed by the Nehalem EX architecture.
The next generation of NUMAlink offers a staggering 15 GB/sec transfer rate. The new hub chip has been designed to offload MPI communication. Instead of the CPUs having to handle the packaging and transmission of MPI messages, the UV hub now takes that load. This clears the CPUs to do pure number crunching, but still enabling the level of fast MPI communications that’s needed in such a large NUMA system.
Speaking of large systems, the UV design allows individual 4 rack systems to be hooked together in an 8×8 torus. The theoretical limit of the UV hub could provision over 32,000 cores. The UV hub also has some FB-DIMMs to cache directory information, which not only speeds up operations but also helps with the scalability of the solution.
The design of the processor board is interesting. SGI have used Intel’s Boxboro chipset to handle I/O, with the UV hub plugged directly into both Nehalem CPUs via the QPI interconnect.
The I/O risers mean that, since it’s a single system image, any processor core can access any I/O device anywhere in the system. With the potential for so much I/O throughput, it would be interesting to see what a large Altix UV system packed with Tesla GPUs could achieve.
The Altix UV is an evolution, rather than an evolution, of the flexible NUMA design that first appeared in the Origin 3000. Despite all the press about clusters, big single system image machines still remain the most efficient for many problems. The problems that needed solutions like the original Origin 2000 – pre- and post-processing tasks, very large data problems, I/O and memory intensive apps – have, if anything, gotten more complex and demanding over time, and SGI still have the technology to solve them.
The Nehalem EX won’t be formally launched by Intel until Q2 2010, so SGI aren’t releasing any performance figures. However SGI have announced four initial customers, who will be taking delivering once the processors start volume shipment.
The customers announced at launch are the University of Tennessee (1024 cores, 4TB memory), the North German Supercomputing Alliance (HLRN) (two systems totalling 4352 cores, 18TB of memory, to plug into their existing ICE installation), CALcul en MIdi-Pyrénées/Computations in Midi-Pyrénées (CALMIP) based at the University of Toulouse in France (128 cores and 1 TB of memory), and the Institute of Low Temperature Science at Hokkaido University in Japan (180 cores, 360 GB of memory).
With the Altix UV no longer requiring customers to recompile for Itanium, SGI now have a real chance to push these machines – not just for HPC, but also in business data centres, where Sun and HP have been very successful selling large machines like the F25k and Superdome.