At The Genome Center at Washington University, we are seeing an ever increasing need to align against various reference sequences. In many cases, hundreds of nodes at time need to access the same input file (e.g., the appropriate reference sequence database). The size of the file varies depending on the organism and the aligner being used but, in aggregate for hundreds of copies, a terabyte or more might be requested at the same instant. At startup, all of the jobs grab the same input file at once, which can put a significant toll on our NFS servers and the other unrelated jobs also using the NFS servers. In some instances, we wanted to copy the input dataset permanently to the local disk on the computational nodes. However, we can not do that for all possible inputs.
In the past, I had used a tool called rgang (doesn’t seem to be available for download anymore) to distribute files using a distribution tree (e.g., one node would transfer to five others, which in turn would each transfer to five each and so on). Alternatives to that were other peer-to-peer distribution methods that could ease the burden on the centralized NFS servers while better leveraging the bandwidth available in the cluster’s network switches.
When hearing peer-to-peer many people thought of using the bittorrent protocol so I decided to take a look to see if anyone had applied that to staging large datasets to many compute nodes. I found that this had been studied in several cases for some years. See some of the bittorrent links I ran across, especially ones related to data distribution in clusters. While I had seen bittorrent used in some versions of ROCKS and SystemImager for OS deployment to cluster nodes, I hadn’t seen it used directly for distributing large datasets to compute nodes. We’ll continue to look into using bittorrent to see if we might be able to decrease the I/O wait time associated with many nodes needing the same input file at the same time.