Scientific research often benefits from open innovation. While there are many examples, I am particularly excited to see what happens in the area of cancer genomics. The Genome Center at Washington University published the results of sequencing the first cancer genome back in November 2008. Internally, there was collaboration between departments in the School of Medicine resulting in innovative analyses and leading to more discoveries. Since then I’ve read and heard about a number of similar or follow up projects at varioius institutions. As data is shared amongst researchers across the world, new collaborations will be formed. The innovations resulting from these collaborations will hopefully result in better treatments for cancer.
We at The Genome Center at Washington University were happy to get official word that we will be adding an additional 21 Illumina Genome Analyzers to our portfolio of sequencing technology. That enables us to sequence enough DNA to be equivalent to an entire human genome per day (at 25x coverage). There is a lot of excitement about the potential such capacity brings. The Genome Center’s director had this to say:
“Our intention to substantially scale-up with this technology reflects our commitment to large-scale sequencing projects that aim to uncover the underlying genetic basis of various human diseases. With the rapid decline in the cost of whole-genome sequencing, we believe now is the time to embark on initiatives which were previously not possible,” said Richard K. Wilson, Ph.D., Professor of Genetics and Director of the Genome Center at Washington University. “We are confident that we can further reduce the cost and accelerate the rate of human genome sequencing.”
A scale up of sequencing capacity brings a scale up in IT capacity. We’ll be watching our internal network, disk and HPC resources and scaling as appropriate. It will be likely that these sequencers alone will generate upwards of 20 TB of data per day, which needs further processing on The Genome Center’s computational resources. I’m excited about the possibilities that this scale up will bring!
IBM has won a contract to build a supercomputer, called Sequoia, for the DOE’s NNSA. It is estimated to be installed and brought online in 2011 and 2012. It will have 1.6 million cores (from potentially 16-core chips) within 96 racks (in about 3,400 sq. ft.). It will have around 1.6 Petabytes of memory and achieve about 20 Petaflops. It will require about 6 million watts of power to operate, which is around 3.3 billion operations per second per watt–very impressive. I wonder if that includes the power needed for the cooling system. And is that when the processors are at 100% or when the system is idle?
At 1.6 PB of memory for 1.6 million cores, that is a relatively low amount of memory per core. If the memory is doubled, for example, the system may require a few more megawatts of power. This is based off of very rough estimates of power needed per GB of memory based off of some recent commodity clusters. Do you have any hard numbers on power per GB of memory today? Any information on the type of memory that might be used in Sequoia?
For more information, see IBM to send blazing fast supercomputer to Energy Dept. and/or U.S. taps IBM for 20 petaflops computer.
From Blue Data Center Will Be Powered by the Tides (found via @tkunau/@ecogeek):
At first, tidal power will only cover one-fifth of the data center’s needs, but Atlantis hopes that if the first phase is successful, they can expand the tidal array to make up the remaining wattage.