Cloud Elasticity

This article is part of a series on the defining value propositions of cloud computing platforms. The whole series includes:

The Cloud – IaaS
The Cloud – Managed Services (coming soon)
The Cloud – Elasticity
The Cloud – Integrated Development Platforms (coming soon)

The Stretchy Cloud

Another core tenet of cloud computing is its inherent elasticity, i.e. the ability to consume more or less resources on-demand, and through a pay-per-usage model.

Elasticity is a critical enough differentiator for cloud computing that Amazon even named their IaaS offering “EC2”, for “Elastic Compute Cloud”.

Autoscaling

There are two ways to take advantage of elastic computing resources.

Exploit the ability to rapidly provision new resources whenever you notice demand getting high, expect an increase in capacity needs, or have a one-off task that would benefit from increased horsepower.
Get a computer to do #1 for you based on some predetermined thresholds. For example, CPU load getting above 80% on your current servers.

Amazon Web Services, Rackspace, Microsoft Azure and others all offer #2, known as autoscaling.

I’ve found the real usefulness of autoscaling to be limited, and know very few companies that do it in practice. A now ancient (2008!) article by George Reese sums up some of the arguments against autoscaling, which still ring true.

If you have very specific application services that have predictable load patterns, or if you have so many servers that it’s worth a few engineers worth of salary to manage the complexities and assumptions of an autoscaling cluster, go for it. Until then, stick to scaling your infrastructure on the fly, but manually, to take advantage of your elastic infrastructure.

An Underutilized Feature

The elastic nature of the cloud may be its single most under-utilized feature. Most companies move to the cloud to save on infrastructure costs or to take advantage of managed services. And many use the elastic nature of the cloud to easily provision servers. But disappointingly few automatically scale their resources up and down to meet demand. something I suspect will change as the tooling for cloud platforms matures.

The best dollar value for cloud computing comes when you can exploit the elasticity to only pay for resources you need, when you need them. You win because you aren’t paying for what you don’t need, and IaaS providers win because they aren rent the same resources to someone else during the hours you aren’t using them.

Case Study: NYTimes

Despite being ignore by many tech companies, who just want the cloud because it’s a low-cost way to deploy an application, there are still hundreds (maybe thousands?) of companies that are making liberal use of the elasticity of Amazon and other big cloud providers.

The New York Times technology team has long been at the cutting edge of both front-end and architectural strategies. When it came time for them to put all their 11 million public domain articles online as PDFs, they took advantage of EC2 by spinning up 100 machines for 24 hours to crunch all the data. Let’s say they used m1.large instance types (just guessing), at a cost of $0.24/hour. That’s only $576 for 100 instances to run a whole day. A veritable super-computer for the cost of a low-end iPad!

I went to a talk a number of years back where the New York Times discussed the process of updating data on election night as poll results get uploaded to the Associated Press FTP server for consumption by all the media outlets. New poll results flow in once per hour, and time is of the essence when reporting breaking election results to your readers. So the NYT would spin up lots of cloud instances on election night to parallelize the process of slurping down election results every hour and updating them on their site. This was part of a much larger strategy for serving high-traffic on election night, all made possible by the elasticity of the cloud.

Case Study: You

Take a look at how your business uses infrastructure, and ask two questions.

How variable is my utilization? How much money could I save if I turned machines on and off in-sync with demand?
What cool stuff could I do if cost was no barrier to having hundreds of machines at my disposal for quick bursts? Is there a way to leapfrog my competitors by throwing computing horsepower at the problem?

Keep in mind that it’ll take a bunch of work for your ops team to figure out how to best exploit elasticity on your particular deployment. Make sure it’s a big enough lever for you before you send them on a wild goose chase. But know that the capability is there, and is one of the great benefits of the Cloud.