Kevin Burton wrote a sort-of-reply to my call for action in getting LSI to open source their CLI tool for the LSI MegaRAID SAS aka Dell PERC 5/i, where he asserted that “RAID is dying”. I’d like to assert otherwise. In my world, RAID is quite alive and well. Why?
- RAID is cheap. Contrary to popular opinion, RAID isn’t really that expensive. The controller is cheap (only $299 for Dell’s PERC 5/i, with BBWC, if you pay full retail). The “2x” disk usage in RAID 10 is really quite debatable, since those disks aren’t just wasting space, they are also improving read (and subsequently write) performance.
- Latency. The battery-backed write cache is a necessity. If you want to safely store data quickly, you need a place to stash it that is reliable1. This is one of the main reasons (or only reasons, even) for using hardware RAID controllers.
- Disks fail. Often. If anything, we should have learned that from Google. Automatic RAID rebuild is proven and effective way to manage this without sinking a huge amount of time and/or resources into managing disk failures. RAID turns a disk failure into a non-event instead of a crisis.
- Hot swap ability. If you forgo hardware RAID, but make use of multiple disks in the machine, there’s a very good chance you will not be able to hot swap a failed disk. Most hot-swappable disk controllers are RAID controllers. So, if you want to hot-swap your disks, you likely end up paying the cost for the controller anyway.
I don’t think it’s fair for anyone to say “Google doesn’t use RAID”. For a few reasons:
- I would be willing to bet there are a number of hardware RAIDs spread across Google (feel free to correct me if I’m wrong, Googlers, but I very much doubt I am). Google has many applications. Many applications with different needs.
- As pointed out by a commenter on Kevin’s entry, Google is, in many ways, its own RAID. So even in applications where they don’t use real RAID, they are sort of a special case.
In the latter half of his entry, Kevin mentions some crazy examples using single disks running multiple MySQL daemons, etc., to avoid RAID. He seems fixated on “performance” and talks about MBps, which is, in most databases, just about the least important aspect of “performance”. What his solution does not address, and in fact where it makes matters worse, is latency. Running four MySQL servers against four disks individually is going to make absolutely terrible use of those disks in the normal case.
One of the biggest concerns our customers, and many other companies have, is power consumption. I like to think of hardware in terms of “critical” and “overhead” components. Most database servers are bottlenecked on disk IO, specifically on latency (seeks). This means that their CPUs, power supplies, etc., are all “overhead” — components necessary to support the “critical” component: disk spindles. The less overhead you have in your overall system, the better, obviously. This means you want to make the best use (in terms of seek capacity) of your disks possible, and minimize downtime, in order to make the best use of the immutable overhead.
RAID 10 helps in this case by making the best use of the available spindles, spreading IO across the disks so that as long as there is work to be done, in theory, no disk is underutilized. This is exactly something you cannot accomplish using single disks and crazy multiple-daemon setups. In addition, in your crazy setup, you will waste untold amounts of memory and CPU by handling the same logical connection multiple times. Again, more overhead.
What do I think is the future, if RAID is not dying? Better RAID, faster disks (20k anyone? 30k? Bring it on!), bigger battery-backed write caches, and non-spinning storage, such as flash.
1 There’s a lot to be said for treating the network as “reliable”, for instance with Google’s semi-synchronous replication, but that is not available at this time, and isn’t really a viable option for most applications. Nonetheless, I would still assert that RAID is cheap compared to the cost (in terms of time, wasted effort, blips, etc.) of rebuilding an entire machine/daemon due to a single failed disk.