You might have noticed that there’s been quite a (mostly civil, I think) debate about RAID and scaling going on recently:
- I originally wrote a call for help in Help convince Dell to leverage LSI to Open Source MegaCli
- Kevin Burton followed up asserting that “RAID is dying” in MySQL and the The Death of RAID
- I responded to Kevin touting the benefits of RAID and giving my side in RAID: Alive and well in the real world
- Kevin responded, taking on a few of my points, in Yes Jeremy, RAID Really Is Dying
- Brian Moon responded to the exchange also puzzled by the assertion that “RAID is dying” in RAID is dying?
- Kevin followed up Brian’s comments asserting that “Only scaling out works.” in RAID and Scaling Out vs Scaling Up
I’d like to address some of theâ€”in my opinionâ€”misconceptions about “scaling out” that I’ve seen many times recently, and provide some of my experience and opinions.
It’s all about compromise.
Human time is expensive. Having operations, engineering, etc. deal with tasks (such as re-imaging a machine) when fixing a problem that could have been a 30-second disk swap is inefficient use of human resources. Don’t cut corners where it doesn’t make sense. This calls back to Brian’s comments about the real cost of your failed $200 part.
Scaling out doesn’t mean using crappy hardware. I think people take the “scale out” model (that they’ve often only read about from outdated conference presentations) to quite an extreme. They think scaling out means using desktop-class, bad hardware, and just buying a ton of them. That model doesn’t work, and it’s hell to maintain in the long term.
Compromise. One of the key points in the scale-out model: size the physical hardware reasonably to achieve the best compromise between scaling out and scaling UP. This is the main reason that I assert RAID is not going anywhere… it is often simply the best and cheapest way to achieve the performance and reliability that you need in each physical machine in order to make the scale out model work.
Use commodity hardware. You often hear the term “commodity hardware” in reference to scale out. While crappy hardware is also commodity, what this means is that instead of getting stuck on the low-end $40k machine, with thoughts of upgrading to the $250k machine, and maybe later the $1M machine, you use data partitioning and any number of let’s say $5k machines. That doesn’t mean a $1k single-disk crappy machine as said above. What does it mean for the machine to be “commodity”? It means that the components are standardized, common, and the price is set by the market, not by a single corporation. Use commodity machines configured with a good balance of price vs. performance.
Use data partitioning (sharding). I haven’t talked much about this in my previous posts, because it’s sort of a given. My participation in the HiveDB project and my recent talks on “Scaling and High Availability Architectures” at the MySQL Conference and Expo should say enough about my feelings on this subject. Nonetheless I’ll repeat a few points from my talk: data partitioning is the only game in town, cache everything, and use MySQL replication for high availability and redundancy.
Nonetheless, RAID is cheap. I’ve said it several times already, just to be sure you heard me correctly: RAID is a cheap and efficient way to gain both performance and reliability out of your commodity hardware. For most systems, engineering time, operations time, etc., is going to be a lot more expensive to get the same sort of reliability out of a non-RAID partitioned system versus a RAID partitioned system. Yes, other components will fail, but in a sufficiently large data-centric system with server class hardware, disks will fail 10:1 or more over anything else.
That is all, carry on.