Scaling out AND up, a compromise

You might have noticed that there’s been quite a (mostly civil, I think) debate about RAID and scaling going on recently:

I originally wrote a call for help in Help convince Dell to leverage LSI to Open Source MegaCli
Kevin Burton followed up asserting that “RAID is dying” in MySQL and the The Death of RAID
I responded to Kevin touting the benefits of RAID and giving my side in RAID: Alive and well in the real world
Kevin responded, taking on a few of my points, in Yes Jeremy, RAID Really Is Dying
Brian Moon responded to the exchange also puzzled by the assertion that “RAID is dying” in RAID is dying?
Kevin followed up Brian’s comments asserting that “Only scaling out works.” in RAID and Scaling Out vs Scaling Up

I’d like to address some of theâ€”in my opinionâ€”misconceptions about “scaling out” that I’ve seen many times recently, and provide some of my experience and opinions.

It’s all about compromise.

Human time is expensive. Having operations, engineering, etc. deal with tasks (such as re-imaging a machine) when fixing a problem that could have been a 30-second disk swap is inefficient use of human resources. Don’t cut corners where it doesn’t make sense. This calls back to Brian’s comments about the real cost of your failed $200 part.

Scaling out doesn’t mean using crappy hardware. I think people take the “scale out” model (that they’ve often only read about from outdated conference presentations) to quite an extreme. They think scaling out means using desktop-class, bad hardware, and just buying a ton of them. That model doesn’t work, and it’s hell to maintain in the long term.

Compromise. One of the key points in the scale-out model: size the physical hardware reasonably to achieve the best compromise between scaling out and scaling UP. This is the main reason that I assert RAID is not going anywhere… it is often simply the best and cheapest way to achieve the performance and reliability that you need in each physical machine in order to make the scale out model work.

Use commodity hardware. You often hear the term “commodity hardware” in reference to scale out. While crappy hardware is also commodity, what this means is that instead of getting stuck on the low-end $40k machine, with thoughts of upgrading to the $250k machine, and maybe later the $1M machine, you use data partitioning and any number of let’s say $5k machines. That doesn’t mean a $1k single-disk crappy machine as said above. What does it mean for the machine to be “commodity”? It means that the components are standardized, common, and the price is set by the market, not by a single corporation. Use commodity machines configured with a good balance of price vs. performance.

Use data partitioning (sharding). I haven’t talked much about this in my previous posts, because it’s sort of a given. My participation in the HiveDB project and my recent talks on “Scaling and High Availability Architectures” at the MySQL Conference and Expo should say enough about my feelings on this subject. Nonetheless I’ll repeat a few points from my talk: data partitioning is the only game in town, cache everything, and use MySQL replication for high availability and redundancy.

Nonetheless, RAID is cheap. I’ve said it several times already, just to be sure you heard me correctly: RAID is a cheap and efficient way to gain both performance and reliability out of your commodity hardware. For most systems, engineering time, operations time, etc., is going to be a lot more expensive to get the same sort of reliability out of a non-RAID partitioned system versus a RAID partitioned system. Yes, other components will fail, but in a sufficiently large data-centric system with server class hardware, disks will fail 10:1 or more over anything else.

That is all, carry on.

Update: Sebastian Wallberg has translated this entry to German. Thanks Sebastian!

8 thoughts on “Scaling out AND up, a compromise”

Kevin Burton says:

June 11, 2007 at 02:01

Agreed on the compromise part….

This is one reason we’re going with Opterons with more memory instead of our cheaper athlon boxes. The bang for the buck is just better.

Another point on the whole “RAID is dying” meme.. I think the growth of software RAID is a sign that RAID is dying. Eventually the software will support multiple disks internally. MySQL 5.1 and partitioning is a good example (though it’s not all the way there yet).

Onward!

Kevin

Xaprb says:

June 11, 2007 at 03:18

I like hearing you define what commodity hardware is. I so often hear people talk about it as though one should strive for running your whole company on this enormous cluster of eMachine Pentium II 300MHz machines from 1998. This is usually from someone who becomes a loudmouth when in a bar. And then they always say “That’s how Google does it!”

So I asked someone at Google who knows, and as you know, that’s not how Google does it :-) So I think it bears repeating as often as you mention commodity hardware “and by commodity, I mean…”

"links for 2007-06-12" by Bob Plankers, The Lone Sysadmin
MySQL Performance Blog » RAID and Scale Out Discussions
Steven Roussey says:

June 12, 2007 at 16:37

After the disk, it will be the fans! Anyway, we still scale out with dual socket servers and 8 disk RAID 10 for true database stuff (as opposed to files). Easy and fast.

Final words on MySQL and RAID « FranÃ§ois Schiettecatte’s Blog
Scaling with RAID? :: Fat Penguin
Everything is a Funky DNS Problem !