Penguin Computing POD: HPC in the Cloud
Penguin Computing is offering a service “Penguin on Demand” or “POD” where customers can ssh into Penguin’s clusters and pay for CPU-core-hours. Citing performance concerns, they are forgoing the use of virtualization, which is often a key component of a cloud computing offering. As of mid-2009, I/O does take a hit in virtualized environments but upcoming PCI-IOV standards may help–see HPC Trends and Virtualization starting at slide 11. Initially, their cluster will consist of a modest 1,000 cores but at least some will have access to GPUs as well.
For delivering your data to the cluster, you have the option of overnighting “2 TB hot-swappable drives” to them if it is not feasible to transfer over the Internet. It’s not clear whether they are providing specific drives that you must then have a compatible system or if you can use your own eSATA or other external drive for this. While that sounds easy enough, I think the loading of data onto these drives and handling of file systems between source and destination OS can introduce plenty of delays especially in the case of poorly chosen external drive interfaces. Of course, the feasibility of just transferring over the network should be investigated first.
From what I’ve been able to read, you’d have an account on Penguin’s cluster and use their queuing system. Its also not clear how your OS will be deployed to the cluster (presumably you’d provide an image to them and they’d deploy using functionality in Scyld’s ClusterWare software). At this point, it is not clear that you’d be doing much more than using someone else’s cluster and getting billed for it…something that several places have done for some time (e.g., NCSA’s Private Sector Program, R Systems, etc.).
What is needed is a programmatic interface to interact with their cluster with strong auth/authz technologies to allow an organization to seamlessly flex their HPC infrastructure and manage jobs with local apps. For some disciplines, transferring large data sets may continue to be a barrier to seamless extensions of their infrastructure…or perhaps not with Darkstrand’s growing presence providing 10-40 Gb/s commercial connectivity to HPC resources. Amazon provides more web services to move closer to this programmatic interface but as Penguin points out, EC2 is not very well geared for many types of HPC applications (although it is for some).
Penguin does not give many details yet about the storage environment associated with their POD service–only that it is “high-speed”. NetApp is mentioned in one article about POD. It seems all nodes have Gigabit Ethernet and/or DDR Infiniband network fabrics but it is not clear about the scalability achievable before nodes are spread across multiple hops to reach each other for inter-process communication. It does not mention over which fabric the storage is accessible.
Have you read any additional articles providing any of the missing details?