Amazon and the cloud rules of engagement.

Amazon’s outage in third day: debate over cloud computing’s future begins

As Amazon’s web services outage passed its third day, the debate on the future of cloud computing is underway. The outage is costing web sites such as Reddit and Quora considerable losses as users turn elsewhere to get their social media needs met.

Amazon’s Elastic Compute Cloud service hosts thousands of major web sites that rely on it to serve pages to users. And users rely on these services to store their personal accounts and data remotely. So when the EC 2 service goes down, so do the web sites, and that means users can’t log in to access their data. It’s a big hiccup for an industry that is supposed to grow to $55 billion by 2014, according to market researcher IDC.

The duration of the outage has surprised many, since Amazon has a lot of backup computing infrastructure. If Amazon can’t safeguard the cloud, how can we rely on it? So the debate begins on the future of cloud computing and what to do to make users and companies put their trust in cloud vendors such as Amazon.


I love the romantic notion of cloud computing, with computing power on tap, minimal outlay, and majestic, infinitely reliable availability. This story feels like the every science fiction novel of my childhood complete with aliens, robots, and rocket-ships. 

Tragically, this is still a fiction, a dream. But it is a dream deeply believed, apparently, by many. Why else would we see such outrage for an outcome that was predictable, and arguably a good thing. 

Cloud environments are very large machines, with large-scale components (warehouses) and large numbers of self similar sub-components (servers, virtual machines, processes, etc). This is made more complex by explosive growth alongside the march of progress in servers, appliances, and other components. Cloud environments are extremely valuable, and powerful, but we should not expect perfect robustness.

This is not at all to say that the ‘cloud’  should be avoided. Rather that the risks need to be understood, and managed.

So occasional outages like Amazon’s are healthy. We need signals that tell us to architect our cloud-reliant systems robustly, to avoid failure scenarios. Without seeing some ‘outages’ along the way, we are much more likely to end up in something of a “computing sub-prime” crisis, with blind over-commitment to a resource that does’t reach our ‘better than real’ expectations. (This is human nature.)

These failures, and the outrage and disbelief are part of cloud computing transitioning from a fairy tale, into a hard nosed, down to earth, resource. Into something that builds our future. This is an IT cultural transition, collectively learning the upsides, downsides, and rules of engagement for the IT ‘cloud’.

Cloud computing as paradigm shift (notes on awesome talk by @swardley via @cloudbook)

I like the broad sweep view that Simon takes here. It’s a nice meta view on the notion ( that cloud is a heavily overloaded term – so much so that it means everything and nothing.

A proof point for cloud being like the industrial revolution and centralization of power generation would be to revisit writings from these eras to see how they were talked about at the time, and whether there were similar buzzwords used to describe ‘the age of’ … or whether these things are only visible in retrospect. On the other hand, as William Gibson says, “The future is already here – it is just unevenly distributed” — perhaps we’re already at the point of reviewing changes that have occurred in pockets and the (start of the) revolution is over.

Simon summarizes the work of Nicholas Carr on ‘IT doesn’t matter’ (more recently ‘the Big Switch’) with a single table showing how technologies commoditize and that the biggest change for IT is that it has become a cost of doing business rather than an innovation driver. Implication is that standardization, simplification, and reducing cost becomes the imperative which then becomes one of the many drivers for “the cloud”. This is undoubtedly true.

Implications for Citrix — For a long time, Citrix has provided tools to turn IT into a utility — the Citrix Cloud Center ( will provide the same functionality in a world where IT assets live both inside and outside the datacenter. This forms part of the armory for IT organizations (and companies in general) to continue the inexorable cost reduction forced by IT becoming ‘just’ a cost of business.

‘Cloud’ is best understood as a shorthand for the changes wrought by the internet and by IT surpassing the core needs of business. It is a label for a range of new technologies, capabilities, and uses of the same.

Cloud is NOT a technology nor an end in itself.


[post updated to reflect twitter conv with Simon Wardley]

Here’s a blog post from Simon on the issue:

Warehouse-Scale Computers – today’s extreme of IT standardization and factoryization

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

Synthesis Lectures on Computer Architecture

2009, 108 pages, (doi:10.2200/S00193ED1V01Y200905CAC006)
Luiz Andr?? Barroso???

Google Inc.

Urs H??lzle???

Google Inc.


As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board.

Looks to be an excellent overview paper of the approaches taken by Google, Amazon and the like.