TACC's Ranger Cluster

By Gary Stiehr, June 29, 2007 11:28 pm

TACC’s HPC Systems page mentions their new Ranger cluster will eventually have 52,608 cores (13,152 AMD “Barcelona” quad-core processors) across 3,288 nodes (mentioned here as four-socket blades) with a total of 105 TB of memory and 1.7 PB of disk. If the total memory is equally distributed, each blade has 32 GB of memory (that’s 2 GB per core). According to one article, the nodes are interconnected with a 3,456 port InfiniBand system (over three switches?) with a total bandwidth of 110Tb/s (I wonder how this was calculated?). As for the disk space, it was mentioned that some StorageTek disk was used. I wonder if all 1.7 PB is StorageTek disk or if it also counts the disk internal to each blade (if any)? And how is the StorageTek disk presented to the blades?

Some other things I am wondering:

  • What Linux distribution are they using?
  • Are these nodes diskless? If not, is the OS installed locally or are they network booted?
  • Are they using Sun’s LOM for out-of-band management?
  • I’d estimate around 69 racks were needed to house the blades (48 blades per rack). How many racks are used for storage and switches?
  • How many staff are dedicated to system administration of this system?
  • What batch system or job scheduler is used? MyCluster?

In any case, the system sounds impressive and I look forward to hearing about the first production applications to run on it.

  • Hi Jay, thanks for the comments. I look forward to seeing the updated page.
  • I'm the director of TACC and the PI of the proposal. I apologize for the inconsistent information. Our plans changed (improved!) mid-stream but we could not update them publicly until we receive formal approval of the changes, and by then we, um... forgot to update that page. I will see that the Ranger on our web site gets updated right away with the answers to all of the technical questions above.
  • Thanks for the info. It is interesting how much conflicting information there is throughout the various articles. Perhaps some of them were written before the final configuration was decided upon. Although I'd expect TACC's description (13,152 processors) and Sun's description ("over 15,000" processors) to be the closer. Well, what's a couple thousand processors here and there?

    I'll have to watch for info on how they've configured the x4500s.
  • Sun has a blurb on their website about the TACC Ranger


    The Ranger Cluster and TACC

    The first petascale implementation of the Sun Constellation System is the Ranger HPC cluster, which was developed jointly with the Texas Advanced Computing Center (TACC) of The University of Texas at Austin. When it is fully deployed on the TeraGrid national network of supercomputers in late 2007, it is expected to be one of the most powerful general purpose computing platforms in the world at over one-half petaFLOP peak performance.

    The Ranger cluster will deliver 1.7 petabytes of storage using the Sun Fire X4500 data servers, the highest density available. Once completed, the TACC installation will consist of over 80 Sun Constellation System racks of computing power totaling over 15,000 quad-core microprocessors, all connected by Sun's new high density, 3456-port InfiniBand switch. Sun Grid Engine will be used as a resource manager to dynamically allocate compute resources to applications.
blog comments powered by Disqus

Panorama theme by Themocracy