Posts tagged: storage

Duty Cycle of SATA Drives

By Gary Stiehr, September 4, 2009 12:24 am

One definition of duty cycle specific to disk drives is this excerpt from a patent description (designed to enforce certain duty cycles):

The disk drive duty cycle can be expressed as a ratio=Ta/t for a given time period t where Ta is the amount of time the disk drive is actively processing read/write commands during the time period. For example, if during a time period t=60 seconds, the disk drive was actively processing read/write commands for a collective total of Ta=15 seconds, then the average duty cycle for that time period would be 25%.

Some SATA drives out there say they have a duty cycle of 100% (24×7 usage) but others are less than 100%.  What happens if you exceed the duty cycle?  Drives fail more often than the stated MTBF/MTTF.  But at what rate?

How can you tell, though, to what extent you are using your drives in terms of a duty cycle? For example, what if you have a general purpose HPC cluster.  That cluster accesses data through a clustered file system across multiple arrays of hundreds of disks.  I/O patterns will vary.  Some weeks you may pull 2 GB/s from your disk array while other weeks you may only pull 750 MB/s.  On the other hand, some weeks you may be pulling 40,000 IOPS from your disk arrays while other weeks you are at 20,000 IOPS.

So to determine your actual utilization, would you add up the total IOPS of the SATA drives in your disk array and then track the IOPS actually performed by that array.  Take the duty cycle ratio to equal the IOPS performed divided by max possible IOPS and see if that duty cycle ratio is within the threshold of the type of SATA drives you have in your array?  Perhaps you take a monthly average?  Or would you need to consider the amount of time spent reading/writing–i.e., would a single IO reading/writing more data sequentially count more against the duty cycle than a smaller single IO?

On the other hand, some sources have indicated that drive reliability (ala FC vs SATA) is not so much due to duty cycles related to drive mechanics but to the possibility of bit errors during rebuilds of RAID sets.  Some FC drives have more measures taken to correct errors compared to SATA drives resulting in bit errors being perhaps 100 times more likely on SATA drives vs. FC drives (1 in 10^14 bits vs. 1 in 10^16 bits).

Here is some other info I’m taking a look at in regard to drive reliability issues:

Scalable Staging of Large Datasets to Many Compute Nodes

By Gary Stiehr, April 9, 2009 11:46 pm

At The Genome Center at Washington University, we are seeing an ever increasing need to align against various reference sequences.  In many cases, hundreds of nodes at time need to access the same input file (e.g., the appropriate reference sequence database).  The size of the file varies depending on the organism and the aligner being used but, in aggregate for hundreds of copies, a terabyte or more might be requested at the same instant.  At startup, all of the jobs grab the same input file at once, which can put a significant toll on our NFS servers and the other unrelated jobs also using the NFS servers.  In some instances, we wanted to copy the input dataset permanently to the local disk on the computational nodes.  However, we can not do that for all possible inputs.

In the past, I had used a tool called rgang (doesn’t seem to be available for download anymore) to distribute files using a distribution tree (e.g., one node would transfer to five others, which in turn would each transfer to five each and so on).  Alternatives to that were other peer-to-peer distribution methods that could ease the burden on the centralized NFS servers while better leveraging the bandwidth available in the cluster’s network switches.

When hearing peer-to-peer many people thought of using the bittorrent protocol so I decided to take a look to see if anyone had applied that to staging large datasets to many compute nodes.  I found that this had been studied in several cases for some years.  See some of the bittorrent links I ran across, especially ones related to data distribution in clusters.  While I had seen bittorrent used in some versions of ROCKS and SystemImager for OS deployment to cluster nodes, I hadn’t seen it used directly for distributing large datasets to compute nodes.  We’ll continue to look into using bittorrent to see if we might be able to decrease the I/O wait time associated with many nodes needing the same input file at the same time.

2 PB in one 9kW rack with pureSilicon 1TB 2.5-Inch SSD?

By Gary Stiehr, January 13, 2009 3:11 am

Four of these drives deliver 4TB in the same space as a standard 3.5-inch HDD.

via HPCwire: pureSilicon Debuts 1TB 2.5-Inch SSD.

Is that right?  Four 2.5-inch drives in the place of one 3.5-inch drive?

If so, in systems that can hold 48 3.5-inch drives, then could we fit 192 of these 2.5-inch, 1 TB drives?  If those 48-drive systems fit in 4U of rack space and we put 10 of them in one rack, we could get 1,920 TB in one rack.  That’s incredible density.

According to the stats at the at the article above, this rack would require about 9.2 kW of power when active and only 192 Watts (yes, Watts) when idle.  Of course this considers only the drives’ power consumption.

At 240 MB/s read and 215 MB/s write per drive, we’d have incredible I/O rates per 192-drive system.  Imagine the performance of such systems for large OLTP databases, for example.

So what are the challenges with such a system (besides price, I’d imagine)?  With one drive potentially nearly saturating the theoretical SATAII bus capability, how could we take advantage of so many drives?

Instead of a 192-drive in a system in 4U then, what about 48 drives in a 1U system?  Are the same technical challenges there as far as getting more of the I/O potential out of these drives?

Panorama theme by Themocracy