TACC's Ranger Cluster
TACC’s HPC Systems page mentions their new Ranger cluster will eventually have 52,608 cores (13,152 AMD “Barcelona” quad-core processors) across 3,288 nodes (mentioned here as four-socket blades) with a total of 105 TB of memory and 1.7 PB of disk. If the total memory is equally distributed, each blade has 32 GB of memory (that’s 2 GB per core). According to one article, the nodes are interconnected with a 3,456 port InfiniBand system (over three switches?) with a total bandwidth of 110Tb/s (I wonder how this was calculated?). As for the disk space, it was mentioned that some StorageTek disk was used. I wonder if all 1.7 PB is StorageTek disk or if it also counts the disk internal to each blade (if any)? And how is the StorageTek disk presented to the blades?
Some other things I am wondering:
- What Linux distribution are they using?
- Are these nodes diskless? If not, is the OS installed locally or are they network booted?
- Are they using Sun’s LOM for out-of-band management?
- I’d estimate around 69 racks were needed to house the blades (48 blades per rack). How many racks are used for storage and switches?
- How many staff are dedicated to system administration of this system?
- What batch system or job scheduler is used? MyCluster?
In any case, the system sounds impressive and I look forward to hearing about the first production applications to run on it.
-
Gary Stiehr
-
Jay Boisseau
-
Gary Stiehr
-
gimlet