Blogger social buttons

Tuesday, 22 January 2013

The Cost of Keeping your Application Running During Cloud Outages

Get a handle on costs for a range of outage proofing strategies for your cloud applications

Cloud outages happen.Over the last year, we continued to see highly visible cloud applications falter during the inevitable cloud outages. Cloud experts (including RightScale) and cloud providers continue to advise cloud users on strategies to outage proof cloud applications. RightScale customers use our cloud management platform to automate the process of outage proofing their cloud applications.

Cost is one of the factor that companies consider when choosing an outage-proofing strategy. However, cloud users often find it difficult to accurately forecast the costs across the entire range of options, from cold to warm to hot disaster recovery architectures. PlanForCloud is a free cloud cost forecasting tool from RightScale that helps our customers budget for different cloud usage scenarios, including outage-proofing. This blog post will cover the costs of different outage proofing strategies and show you how you can forecast the cost of outage proofing your own cloud applications.

This post will focus on disaster recovery (DR), the process, policies and procedures related to restoring critical systems back to normal after a catastrophic event such as a cloud outage. Later posts will focus on the costs of high availability (HA) architectures.

When thinking about your outage proofing strategies, there are a couple of things you need to keep in mind:

  • Large scale failures in the cloud are rare but do happen and will continue to happen.
  • The Application owner is ultimately responsible for availability and recoverability.
  • There is complexity associated with different strategies, so you need to get the balance right between the effort and cost required and the risks you are willing to bear.
  • Cloud infrastructure has made DR and HA architectures more affordable, however, there are still costs associated with it. We will talk about this in more detail below.

Types of Outage Proofing
There are a number of options when looking at DR, let's have a look through some of these:


Multi-Region Cold DR (most common)
In this scenario, you configure and set up a parallel deployment configured on a different region, but not having this running. (Side note: This is quite simple if you are using RightScale. You could also develop scripts to do this, however be sure to test it over and over since you won’t want anything to go wrong when you are suffering an outage.)
You would only switch this deployment on if something was to happen to your primary deployment. The time consuming parts of this option is to move the Database and some application data over to the secondary deployment, start the servers and maybe do some testing before enabling traffic to it. If you have issues with accessing the master or slave DB from your primary deployment, your application may suffer from data loss. This would be for non-core business applications since the time your application may be down could be over an hour. This option may look something like this:

Multi-Region Cold Disaster Recovery cloud application setup
The cost of a cold DR is quite simple - since you are not running the secondary deployment, you are not paying for it and therefore the additional cost of DR is near zero (unless you require EBS volumes to be provisioned in which can you will need to pay for this). This is assuming you have spent time in designing your DR strategy (using Server Templates and RightScale deployments for example).

Multi-Region Warm DR (recommended)
This is when you have a parallel deployment configured on a different region with some core components, typically the database, running, but the app servers and load balancers are not running (again, this is quite simple if you are using RightScale). The difference between this and the above scenario is that less time is required to get the deployment functional due to core components existing. All that is required is for the Slave DB to be promoted, the app servers and the load balancers to be switched on and your DNS re-configured to redirect traffic. Usually this can be done within the hour.
Multi-Region Warm Disaster Recovery cloud application setup
The cost of a warm DR strategy is quite interesting and is based on how much of your system you would like to keep on 'stand-by'. We will look into more details of the costs associated with this option a little further down.

Multi-Region Hot DR (least common)
This is when you have a parallel deployment running your application, however the primary deployment will take all the user traffic. You would only switch to sending traffic to the secondary deployment if something was to happen to the primary. This may apply to banking situations, or a core component of your business which would be detrimental if it was to go down. The application can be back up and running within 5 minutes.
Multi-Region Hot Disaster Recovery cloud application setup
The cost of running a Hot DR strategy is also quite simple. Since you will be replicating more or less all of the components of your system onto another region, the costs will be more or less double that of your normal running system. Another way to look at it is that the increase running costs of a Cold DR is near 0% and the increase running costs of a Hot DR is 100%; with Warm DR being somewhere in between.


The middle ground: Cost of Multi-Region Warm DR
Due to the costs and complexity associated with HA and DR, it is no surprise that many of our customers have chosen to use Multi-Region Warm DR. In this middle ground, the user is able to pick and mix which components of their system should be replicated (on stand-by) and which components of the system should not. This is the reason it is our recommended DR strategy.
We have used PlanForCloud, which holds the latest prices from the cloud providers, to simulate a few different scenarios to see how much these options cost. Here is our basic deployment setup and cost:
PlanForCloud screenshot - no DR

From the PlanForCloud screenshot, we can see that our basic setup costs $38,735 per year. Now let's look at scenarios where we applying a warm DR strategy to the deployment:

At a minimum, we recommend you set up a master and slave database on one region (with Multi-Zone HA), while replicating from your master to another slave on another region. This would mean that if the primary region was to go down, you would need to start up your load balancers and application servers on the secondary region and have them talk to the already running slave, which saves a lot of time. Our deployment setup and costs:
PlanForCloud screenshot - Minimum Warm DR
We can see that our minimum Warm DR strategy along with our deployment costs $54,771 per year.

If you wanted to, you could also set up a fuller Warm DR strategy. This is when you have the very minimum required to run your full application (albeit at a degraded level) in another region. This means that in our example, we would have a single load balancer and a single app server running along with our slave Database. Our deployment setup and costs:

PlanForCloud screenshot - Fuller Warm DR
We can see that our fuller Warm DR strategy along with our deployment costs $61,603 per year.

Keep in Mind
In AWS, when there is an outage which forces many people to start to move applications over to another region, you need to know two things:
1. AWS throttles their APIs so that the other regions do not get overwhelmed. This means that if your applications are not prepared and you are not following some best practices, your application could be down for more than you expect.
2. If there is not enough capacity on the other region, you will not be able to get resources, therefore we recommend that you look at buying Reserved Instances in your backup region. Buying reserved instances (Light, Medium or Heavy utilisation) will mean that you will have that capacity reserved, no matter who else uses that region.


Summary
There are different options of outage proofing your cloud application by using High Availability and Disaster Recovery techniques. Each come with its own complexity and cost, however using a cloud management system such as RightScale (free edition signup) can help in setting these procedures in place by using standard configurations. Ultimately, it is up to the application owner to get the balance right between the effort and cost of setting up HA and DR vs the amount of risk that you are willing to bear.

For our example deployment above, we ran a few simulations to see how much the different flavours of Warm DR would cost. Here is a summary:


Depending on what your applications look like, the cost of setting up a HA and DR strategy will be different. In our example deployment, we can see that from a cold DR (currently the most common type of DR) to a Warm DR there is an increase in cost due to running another slave DB (which equates to roughly 40% increase in our example deployment yearly cost). PlanForCloud is completely free, just Log-in as a guest and use the tool to calculate how much your HA & DR strategies will cost.

We are more than happy to answer your questions around outage proofing your cloud applications and giving you a free live demo of how this can be achieved: contact form


-- Hassan Hosseini
Product Manager at PlanForCloud

1 comment:

  1. This is a very good piece of work. Thank you so much for sharing.

    ReplyDelete