We love using AWS CloudFormation, here, at Mind Candy. Last year we moved all our cloud-based products application stacks to CloudFormations. We have learned, sometimes the hard way, how to design and use them in the best possible way for us. In this post I’m trying to summarize how we build and operate CloudFormations and what are the DOs and DON’Ts when using this technology. Throughout this post I will refer to CloudFormation as CF, to save some precious typing time.
First of all, you need to get to know cloud formation templates. This are just blocks of JSON, and as such are not really nice to edit (remember – no comments allowed in JSON). Because of that we use a helper technology – a templating tool to build CF templates. We decided to use tuxpiper’s cloudcast library (we are a python shop). You can take a peek or download it here https://github.com/tuxpiper/cloudcast. If your primary language is different than python you can easily find or write your own templating tool – it was pointed to me by a former colleague that CFNDSL is a good starting point for rubyists (https://github.com/howech/cfndsl). So lesson one is – don’t use plain JSON to write your CF templates. You will save yourself a lot of tedious time.
Once you have your first stack up and running you’ll realise how easy it is to modify and use it. But wait, what about testing the changes? That’s one of the biggest flaws of the CF technology. There is no other way to test your template than to apply it. CF does not give you a second chance – you can easily terminate/recreate your whole stack by changing of single line in your template. The good practice we try to adhere to is to test every single change in the template using different AWS account (we use separate AWS accounts for our development, integration, staging and production environments) or region, i.e. launch identical stack first in another AWS location and then perform the change on it to test if we end up in the desired state.
To make it possible to launch identical stacks in different accounts or regions one can leverage CF mappings and parameters. We don’t use parameters yet, but we use mapping heavily. That allows us to use a single CF template file to create multiple stacks in different environments. All you have to do is to define environment-specific properties within a global mapping on top of our template and then use CF’s “Fn::FindInMap” intrinsic function (actually, cloudcast does it for you). Also, use CF Outputs – they will allow you to programmatically access the resources created in your CF.
Next one is a set of more generic hints for those who work with AWS, still 100% valid for CF. First, use IAM roles to launch your stacks/instances. Let me quote AWS IAM official documentation here:
“A role is an entity that has its own set of permissions, but that isn’t a user or group. Roles also don’t have their own permanent set of credentials the way IAM users do. Instead, a role is assumed by other entities. Credentials are then either associated with the assuming identity, or IAM dynamically provides temporary credentials (in the case of Amazon EC2)“.
That will make your environment more secure and save you misery of maintaining IAM users and keys. Bear in mind that once the instance is created you cannot assign it to an IAM role, so if you’re not using IAM roles yet you should create IAM role with an “empty” policy now and use it for all your resources until you’re ready to benefit from full-fat IAM roles.
Secondly, use a minimalistic user data – make it identical for your whole estate. Delegate environment/application specific settings to your configuration management system. This will just make your life easier. Get familiar with and start using auto-scaling groups, even if you’re launching a single instance (in that case you can have an auto-scaling group with minimum and maximum number of instances equal to 1). You’ll benefit from that approach later, once your service starts to scale up.
Finally, use AWS tags to tag your AWS resources. Tags allow you to do a lot of funky stuff with your AWS resources (let me only mention grouping, accounting, monitoring and reporting here).
Now, a few DON’Ts for your CF:
- Don’t mix VPC and non-VPC regions in your mappings – CF uses different set of properties for EC2-VPC resources than for EC2-classic resources
- Don’t ever specify resource name properties in your CF template. Using auto-generated names makes your stack easily portable. Thus, you can copy your existing stack to another environment or launch a completely new stack (say your canary stack) using the same template. Also some of AWS resource names need to be globally/regionally unique, so defining a name in your stack is not such a good idea. Finally, virtually any resource which allows you to set its name will require replacement on update – just imagine your whole stack relaunching from scratch when someone comes with a clever idea to rename resources in line with a new naming convention or a new product name?
- Don’t use existing (non-CF built) AWS objects in your stack, if you can. Using existing resources also makes your stack non-portable. A lot here depends on the use case (i.e. we have a couple of security groups which we use in our stacks, but even then we provide their names/ids in the mappings or parameters, rather than using them directly in resource declaration).
Know your limits – CF is great orchestration tool, but it has its limits. You cannot create or update some AWS resources (e.g. EC2 keypairs). You cannot self-reference security groups in their definitions, which sucks (how do I open all my cassandra nodes for inter-node communication on port 7001 within the CF?). Stacks are difficult to maintain, as there are no incremental changes. For the above and other, obvious, reasons – don’t forget to source control your CF stacks (we have a dedicated git repository for that).
Finally, the last, and maybe most important, point – separate your applications into multiple CF stacks. One can easily get excited about CF and create a single stack for the whole application (network, databases, application servers, load balancers, caches, queues and so one). That’s not a good idea – you don’t want your database servers to relaunch when you decide to modify the properties of the auto-scaling group for you application layer. The solution is simple – create multiple CF stacks for your four application stack. Make your database layer a separate CF stack, then your distribution (app server auto-scaling groups and ELBs) a second CF stack and so on. This will give you the flexibility of CF without taking a risk of unwanted service disruption, due to CF stack update (been there, done that…). It’s very tempting to create very sophisticated CF stack, with many inter-dependent components, but I cannot stress enough how important is not to do it.
We are all the time looking to improve our stacks and processes, so definitely we are only at the beginning of our CF journey. One of my colleagues is looking at another CF templating library (https://github.com/cloudtools/troposphere) to help us automate our processes of CF creation even more. We will very likely start to protect our CF resources in production using stack policies soon. We will start working with CF parameters and dependencies more to make our templates 100% independent of our account/regional settings. Finally, we need to research if Custom Resources are fit for our purposes.