The main reason for the existence of cloud computing is to provide an elastic platform scaling both up and down to meet services demand. To provide such a capability demands some pretty impressive infrastructure and orchestration services! Cloud infrastructure and platform vendors place the onus on the customer to construct their application in a fault tolerant manner to deal with problems and outages of a service in a way that does not visibly impact consumer interaction (although this concept doesn’t apply to SaaS where the responsibility resides with the vendor).
Cloud vendors often set the expectation of failure being a possibility – or in some cases a common occurrence! But they still build hugely resilient infrastructures on a scale not generally affordable for mere corporate users, which can only dream of having a dedicated multi-datacentre capability. With cloud computing, though, they can buy this service out of the box with little or no capital investment. This means that for some businesses, the real attraction is perhaps the disaster recovery capability and not the workload flexibility.
In fact, resilience and elasticity are closely related when it comes to deploying an application on a cloud platform. Being elastic means a service needs to scale up and down. Scaling down is very closely related to coping with a failure. For example, elastic scaling rules may dictate that a working application is removed when load decreases. Scaling up resource is akin to dealing with the aftermath of failure. For example, if an application fails a new application instance needs to spin up seamlessly. Therefore applications need to be constructed in a manner that can deal with failure and the consequences in a controlled manner. So when deploying a solution we need to think about how a failure of each component will affect our service and how to handle such events.
So whether we are deploying a solution to scale from ten to one hundred servers and back again, or simply ensuring that at all times ten servers exist across a solution, our applications must be capable of handling such events.