AWS has definitely changed the way we all approach infrastructures these days, especially here — at Mind Candy.
We’re finally not limited by the amount of available hardware, so we can get whatever amount of resources (well, nearly) we need, whenever we need, plus we get CloudFormations.
However, as exciting as spawning 100+ servers can be, as with many things, if you’re not cautious and smart, it can cost you a lot of money.
One way to save a bit of money on your AWS bill (and “a bit” is a serious understatement) is by utilising Spot Instances.
“Spot Instances allow you to name your own price for Amazon EC2 computing capacity. You simply bid on spare Amazon EC2 instances and run them whenever your bid exceeds the current Spot Price, which varies in real-time based on supply and demand.” – http://aws.amazon.com/ec2/purchasing-options/spot-instances/
How much can you save? Well, the c3.large instances which we use across the board for our application tier in on-demand pricing cost $0.12 per hour. When we use the same instance type with spot pricing we get them most of the time for around $0.02. That’s 6x cheaper compared to on-demand.
So what’s the trade-off? Well, if for some reason the spot-instance price exceeds your bid price, your spot reservations will get cancelled and your spot-instances will be killed. In short — your instances can and will die at random times and it’s not 100% guaranteed that you’ll get them when you want them.
That’s not good. Even if you use CloudFormations and auto-scaling as you could end up without instances when the spot price becomes too high – that could be almost the same as an AZ failure if you’re not prepared for it.
However, there’s a way to overcome that risk. In a single CloudFormation, you can create two launch configurations — one for on-demand instances and another one for spot-instances. With carefully tweaked scaling thresholds, you can make your spot-instances be preferred over on-demand instances, but still ensure on-demand takes over should spot-instances no longer be available at your bid price.
This way, if you can get spot-instances, your stack will be pretty much fully built using spot-instances. If (and when) the price goes over your bid price, spot instances will start getting killed and your on-demand instances will start booting up instead to cover the increased price. When the spot-price return beneath your bid price, spot instances will start booting up, slowly phasing out on-demand instances.
After few weeks of tests we managed to come up with a set of thresholds which work pretty well for us and keeps our stacks stable around the clock.
With on-demand, we always have a single instance running by setting the minimum to 1. Scale-up event happens when our average CPU usage exceeds 80% for a 5 minute period and we increase the on-demand autoscale group by 2 instances. We then scale down 1 instance at a time if the average CPU usage is less then 65% for a period of 5 minutes, and we ensure that a scale-down event only happens once in a 15 minute period.
With spot-instances, we also request a minimum of 1 instances but we set ourselves a bid price of $0.12 – remember, the bid price is not the price you pay, it’s the maximum you are willing to pay. Most of the time we have a spot-price cost of just $0.02!
As with on-demand we scale on average CPU in the spot-price autoscale group. However, we scale-up whenever we reach 50% (instead of 80%), and we also add 2 instances. We scale down and cancel our spot instances when we dip below 30% CPU usage.
The result is probably best as a picture from Ice (Ice is a great tool from Netflix that helps manage AWS costs). Below is the hourly cost of one of our app tiers before and after we started utilising spot instances.
For us, in the case of this specific stack, spot instances gave us savings up to 60%. Bear in mind the size of this specific stack is quite small (up to 10-12 instances at peak); so the bigger the stack, the more savings you’ll see!
To wrap up, I just wanted to share few tips and tricks we picked up along the way, that should help you:
- bake AMIs; tools like Packer will greatly help you do this; this will let you minimise time required to boot up a new instances; it’ll give you much more, but the time is crucial when it comes to scale-up events, especially when spot-instances are being killed and you want on-demand instances to fill out the empty spaces ASAP. We managed to get time required to boot up a new instances down to around 75 seconds
- use EBS based instances; they cost a fraction more (and yeah, EBS can be painful) but their boot time is significantly faster then the ephemeral-storage based instances
- bid price = on-demand instance price; this way in worst case you’ll pay what you’d normally pay for on-demand instance
- utilise reserved instances for the “base” on-demand instances in on-demand stacks
- did I mention Ice from Netflix? Use it!
<shamelessplug> Obviously, the most important requirement is having an awesome application that is cloud-friendly. If you’re interested in building cloud-native applications and awesome infrastructures, we’d love to hear from you! ;-) </ shamelessplug>
That’ll be all folks! Happy spot-instancing!