London Python Dojo Returns to Mind Candy

Last night saw the return of the London Python Dojo to the Mind Candy office. For those who are unfamiliar, the dojo is a monthly meetup for python enthusiasts that really covers the full range of “What’s Python?” to “I’ve been using Python for 24 years”, where we meet and come up with ideas/scenarios/problems that can be solved with Python in a couple of hours (usually after some refreshments).resized1.resized2

Ideas are written on a whiteboard an then voted on for what to project to take on for the evening. We then break of in to teams and code for about an hour or so and then each team presents what they’ve done and the ideas behind what they were trying to accomplish (even if you don’t have working code).

resized3(Last nights board)

Last night’s task was to implement battleship logic/strategy which then could be played against other teams. You can see the (unofficial) winning team’s code here.

I am really biased as I was in fact a part of Team 1.

Overall a very good evening in good company creatively coding. (as it usually is).

For more information on the London Python Dojo check out http://ldnpydojo.org.uk/ and follow @ldnpydojo on twitter.

You can also join the Python UK mailing list here.

A DevOps Journey

Over the past few years Mind Candy has gone through a DevOps transformation. We did this because we knew that we had to improve the delivery of our products and we knew that where we wanted to be involved having the following three things in place.

1. Shared goals and practices by aligning our different teams.
2. Unified tool sets, again we needed to align around a common set of tools.
3. Collaborative learning – knowledge sharing was and remains vitally important to us.

Obviously, achieving something like this cannot happen overnight. It had to be an iterative process just as software development is, and its starting point required changing the mindset of people across the teams so that we began to do DevOps.

These are some of the practical things we did on that journey.

Familiarity doesn’t breed contempt

In Aesop’s fable of the Fox and the Lion, we’re taught the moral that familiarity breeds contempt. However, in an organisation trying to transform towards a DevOps way of thinking we turned the fable on its head, acknowledging that it’s not familiarity that breeds contempt but separation does in the form of silos.

For us this didn’t mean that we needed everyone to know or be familiar with everything about everything. Unicorns don’t exist. What it meant was making our physical working proximity closer. It’s pretty amazing how, when different teams can hear each other – from Dev through to QA and onto Ops – how much more readily they interact and collaborate organically.

We found that technical decision-making became a much more shared process. Closer working environments encourage greater mutual support between teams.

It’s good to talk

Email is a wonderful thing. Instant messaging and relay chats are even better once you’re in a good DevOps place. However, if you’re trying to shift attitude and thinking email is not a substitute for getting up and talking to someone or having a phone/video call.

It might not always be possible across timezones, but it doesn’t take a genius to realise that intonation can easily be lost in the written word even if someone uses a emoticon.

The slowest and most problematic IT organisations I’ve known have tended to be ones where everyone hides behind email, resulting in bubbling tensions, and often leading to escalation and wars over who can CC the most senior people in. Change is able to be effected but only based on who has the loudest shout or clout.

Meanwhile, the best and least problematic IT organisations tend to be the ones where different functional teams not only sit physically close to each other but where they also walk across the office to talk to each other instead of sending snippet of easily misinterpreted text over the Internet. Obviously when you have no choice you have to use electronic communications, but when you don’t need to you probably shouldn’t.

Investment in knowledge pays the best interest

When you look up a typical DevOps venn diagram online, it will be one where DevOps sits as the joined intersection of Dev, QA and Ops. Acknowledging this intersection is crucial in moving an organisation’s mindset. The intersection represents all the things that you do that have a shared interest and investment in them. This is the place that you need to align across teams.

Take code deployment as the classic example.

During any software cycle, each team will deploy to different environments and it’s highly likely that there may be differences in the process due to the scale of environments, whether they operate under SLA, or under any internal governance controls like change management.

The tools used to deploy, and the process followed are an excellent starting point in any DevOps transformation. They not only encourage collaboration between teams, but also enable you to unify your toolset under known standards, something we have done at Mind Candy that I blogged about previously.

This has empowered tech teams to collaborate on a shared interest and shared investment, whilst also carrying a shared responsibility for its maintenance. The tool is as much a “product” as the product that it ships.

The net result of this investment is that code deployment becomes so trivial that it widens the scope of who can “push to live” to pretty much anyone. This shouldn’t be mistaken for anyone should (or does) deploy to live. That would be silly. Rather it should be seen in the terms that a robust deployment process can eliminate the lone rock star engineer being a single point of failure.

As Mazz Mosley said at Monki Gras 2013 when talking about how GDS built gov.uk, “rockstars are not webscale”.

This approach doesn’t negate strict change control and governance in the organisation (if you have it). It simply removes blockers from your delivery pipeline. Thats a win for the business as much as it is a win for those who have shared and gained knowledge through collaboration.

Devs as Ops and Ops as Devs

Once we had shared ownership and responsibility of tooling like deployment spanning across teams in the organisation it was clear that the reality of the DevOps intersection is one where Devs are Ops and Ops are Devs

This doesn’t mean that either team does the others job. This is not the full stack unicorn. Sysadmins are not dead and nor are developers, It just means that where the things they do have alignment they can learn from each other.

Take the traditional sysadmin position. They will often be quick to tell you that they’re not a developer. They may even say it with a sense of disgust that you even dared to ask the question. The sad truth is that they’re actually in denial.

They might not like it, but when writing short scripts, or declaring something in a configuration management system, they are developing, and, as the saying goes they’re doing “infrastructure as code”.

The only difference really is that frequently they have made life hard for themselves by lovingly hacking systems and creating the snowflake server. It’s great for job security of course, but it’s terrible for the business – rock star ninja single points of failure again.

At the very least they need to be using some sort of version control for the infrastructure, and what is version control if not a development tool? However, it’s not just in the tools that your Ops can be more like Devs. There’s the working practices too.

The Ops team had already been using Kanban to prioritise work weekly. Whilst this worked to a degree the team still had an ever growing backlog of tickets and requests, and what went on the Kanban board each week still contained a considerable amount of reactive work.

We decided, as a team, that we would take our workflow a step further and apply more development principles to the management of our ticket queue. We decided we should align ourselves with our colleagues and move towards a greater form of Agile along scrum lines. We would start using sprints, planning, backlog pruning and prioritisation.

We began to work through our backlog by opting for two week sprints. We introduced sprint planning, and started to commit to a certain number of story points (issues) for the sprint, and, barring any major issues or emergencies (which we left slack for) we would stick to the committed work and do nothing else.

The impact of what was a pretty small change was huge. It took a few sprints, but, as our different product teams (who were all also doing sprints obviously), became aware that we working in the same way as them, emergency work and high priority issues out of the blue gradually declined.

Obviously it’s not always like that when you’re supporting live services as well, but, by aligning our working practices with our primary internal customers, there became a greater appreciation of how our backlog could be impacted just as theirs could be by altering the scope of the sprint.

This was indirect collaboration born on the back of working in a more aligned way with our peers. Our backlog went from over 100 tickets to less than 40.

Meanwhile, as we in Ops were being more like Devs, we started to share some of our Ops roles with Devs with a little help from our a friend called Canbot.

ChatOps sets you free

Candy Bot, or Canbot for short is our in-house name for Github’s Hubot. It sits in our dedicated Slack channel #chatops and when not providing us with amusing animated cat images he/she does things for the Devs and for Ops.

Canbot can tell us where servers are. This is vital as we use AWS so the environment can be fluid and dynamic. Canbot can deploy config changes for the Devs to each environment, including to live and it’s all totally transparent.

If someone changes the code base in our Puppet infrastructure then Canbot will tell #chatops about the commit and who did it. We also opened up the Puppet repository to the Devs and some of them change it every now and then. Shared responsibility after all.

Canbot can also execute commands on our infrastructure, but when it does it is never in secret. Transparency is the key feature here. What Canbot can do is also open across the teams for development. Primarily it is Ops that play with him, but there is nothing stopping a pull request from others internally.

Canbot has allowed our Devs to be a bit more like Ops. They can orchestrate production without having to have ssh access and it can be audited. No more tickets asking for information about production.

Embrace failure

Failure is an opportunity to learn, it is not an opportunity to point a finger of blame and start shouting at someone. DevOps mindsets should see each failure in these terms. Iterate the failure and eliminate it with either better toolings, better documentation or better gated processes.

When we celebrate failure we do it with KrispyKreme donuts!

Encourage Tech Culture

Most of the people that work in tech love tech. Few of us see our jobs as a mere means to an end. If you encourage your technical teams to collaborate with learning sessions too you can create a greater sense of being “one team of many disciplines” rather than single teams doing only one thing.

At Mind Candy we hold regularly weekly book clubs open to whoever wishes to join, where we go through a particular book on a technology matter. We also have Guilds where we present and share what we’re working on between teams.

Additionally we use our office as a host location for MeetUps across tech businesses. Next month we’re hosting a London Virtual Reality meetup. Sharing should not always just be in-house after all.

Wrapping things up

Obviously the list and experiences above are not exhaustive. There are so many little things that an organisation can do when adopting a DevOps approach. What’s important is to realise that you change the mindsets first and then you iterate and encourage greater collaboration. Once an IT organisation realises that it relies on mutual support to sustain itself change can come about quite rapidly.

Utilising AWS Lambda to migrate 25,000,000+ images S3 bucket

When AWS announced AWS Lambda at last year’s re:Invent, we were really excited about it here at Mind Candy. The concept of a zero-administration compute platform, that is very scalable, cheap and so easy to use AND at the same time integrates with so many AWS services through triggers is pretty exciting and potentially – very powerful.

Since then, we started using AWS Lambda in some of our products – PopJam being one of them. We use it to near-instantly generate thumbnails of all the amazing creations users of PopJam share through the app.

Recently, a quite interesting story surfaced on our sprint – we were to migrate one of the AWS S3 buckets PopJam uses, from US to EU (to bring it closer to the backend and users) without any downtime for users.

Now, you’ll think – “why that would be interesting?”

The answer is – 25,000,000+ – scale of this task.

The aforementioned AWS S3 bucket stores over 25,000,000 files (mostly images) and this number is growing faster every single day. Just running ‘s3cmd du’ on the bucket, took almost a day. When I tried to perform ‘s3cmd ls’ to count the number of keys in the bucket, I got bored before it finished (I had to write a simple Python script that utilises multi-processing and split the process of counting into 256 threads; only then would it finish within few minutes).

Obviously, any form of existing CLI command like s3cmd sync or AWS CLI s3 commands is out of question as before it finishes (after many, many hours), the source bucket will have tens of thousands of new files which haven’t been copied across and we’d have to re-run it again which would lead to the same situation.

I mentioned, AWS Lambda functions can be triggered by other AWS services; one of them being AWS S3. Essentially, we can configure an AWS S3 Bucket to invoke a Lambda function whenever a new object (key) is being created.

Given this, we could create a Lambda function on the old bucket that will be triggered whenever a new key is created (ObjectCreated event) that would copy over new keys to the new bucket. Then, we’d have to only sync the old bucket to the new one without having to worry about missing some keys on the way.

The proposed plan looked like this:

  1. Create new S3 bucket in EU
  2. Set up AWS Lambda Copy function and configure it to be triggered whenever a new key is added
  3. Run aws s3 sync command in background
  4. Wait, wait, wait…
  5. Reconfigure CDN to use the new bucket as origin
  6. Switch backend application to upload all images from now on, to the new S3 bucket in EU

This plan, also meant there should be zero downtime during the whole migration. Everyone likes zero downtime migrations, right?

The actual implementation, while not very painful, did uncover a few issues with the plan that had to be dealt with. These issues resulted in some learnings which I wanted to share here.

AWS Lambda copy object function

The Lambda function code to perform the copy happens to be pretty trivial.

var AWS = require(‘aws-sdk’);
var util = require(‘util’);

exports.handler = function(event, context) {
        var s3 = new AWS.S3(options = {region: “eu-west-1”});

        var params = {
                Bucket: ‘popjam-new-bucket’,
                CopySource: event.Records[0].s3.bucket.name + ‘/‘ + event.Records[0].s3.object.key,
                Key: event.Records[0].s3.object.key,
                ACL: ‘public-read’
        }

        s3.copyObject(params, function(err, data) {
                if (err) console.log(err, err.stack);  // an error occurred
                else     context.done();  // successful response
        });
};

It just works, but there’s one small culprit…

… what happens to S3 object ACLs should they be changed in the meantime?

We needed ACLs for particular objects to be in-sync (for various reasons, one of them being moderation).

Given the AWS Lambda function is triggered on ObjectCreated event (there sadly isn’t a way to trigger it on ObjectModify), should you need to change ACL there’s no way to do it through AWS Lambda at this stage.

We worked around this problem by writing a Python script that basically iterates through the S3 buckets, compares ACLs and tweaks them if there’s a need (as before, we had to parallelise it otherwise it’d take ages).

Beware of AWS Lambda limits!

While being pretty scalable, AWS Lambda has got some limits. We were bitten by the “Concurrent requests per account” and “Requests per second per account” limits a few times (fortunately we did just enough with AWS Lambda to get the attention of AWS Lambda product team and they kindly raised these limits for us).

For most of the use cases those limits should be fine, but in our case, when on top of the AWS Lambda copy function we were also triggering a series of functions to generate thumbnails, we hit these limits pretty quickly and had to temporarily throttle our migration scripts.

AWS Lambda is still pretty bleeding edge technology

AWS Lambda will work great for you most of the time. However, when it fails, troubleshooting can be quite … inconvenient to say the least.

Remember you can now view all AWS Lambda logs through CloudWatch – make use of them and don’t shy away from placing debug statements in your code.

The deployment of AWS Lambda is pretty tricky, too. While there are some options, it’s still in early stage and it feels like even AWS is still trying to figure it out (potentially through feedback from customers – if you use AWS Lambda do make sure to feedback to AWS).

The most interesting tool that I found out to support deployment and integrating with AWS Lambda in general is kappa

And all of this for what?

Let the graph speak for itself…

(the graph represents upload time to S3 bucket in US – green line, and S3 bucket in EU – orange line – after migration)

ReaderT 101

This blog post is about dependency injection (d.i.) using the Reader monad in Scala. I won’t explain what a monad is nor will I explore any category theory (mostly because I don’t know how to explain any of that). In this post I just want to show the mental model I have when using monadic style with ReaderT.

Note: this post turned out to be quite big. It’s not very dense though! Especially if you’re familiar with Scala you should be able to whisk through most of it.

Dependency injection

Code needs other code. That’s what d.i. is for me. We write separate pieces of code. Often one bit needs to use the other. I’ll use the following example source for this post:

case class Hero(name: String)

// imagine this is a database
case class Ooo(importantSettings: Unit) {
  private[this] val finn = Hero("Finn")
  private[this] val jake = Hero("Jake")

  def findHero(name: String): Hero = {
    // Imagine all kinds of database processing here
    finn
  }

  def friendsRegistry(): Map[String, Hero] = {
    // Moar processing
    Map(finn.name -> jake)
  }

  def evalAdventure(hero1: Hero, hero2: Hero): String = {
    // Jake always saves the day, he's a magic dog!
    if (hero1 == jake || hero2 == jake) "awesome" else "disappointing"
  }
}

// The instance of Ooo we want to inject everywhere
val ooo = Ooo(()) 

// This is a piece of 'business' logic
object AdventureTime {
  def getHero(name: String): Hero = ooo.findHero(name)

  def getBestFriend(hero: Hero): Hero = ooo.friendsRegistry()(hero.name)

  def goOnAdventure(hero1: Hero, hero2: Hero): String = {
    val result = ooo.evalAdventure(hero1, hero2)
    s"Adventure time with ${hero1.name} and ${hero2.name} was $result!"
  }
}
Adventure Time - Land of Ooo

Adventure Time – Land of Ooo

Instead of a stuffy real-world example I’m using Adventure Time. Think of Ooo as a database repository and AdventureTime as some piece of business logic. I assume this code is relatively simple and understandable. The problem is this: how does AdventureTime get a reference to Ooo? In other words, we want to inject Ooo into AdventureTime and possibly other parts of the code.

First, an example of how one could have an adventure:

import AdventureTime._

val hero1 = getHero("Finn")
val hero2 = getBestFriend(hero1)
val result = goOnAdventure(hero1, hero2) 

// result -> "Adventure time with Finn and Jake was awesome!"

A global variable and/or the Singleton

The example above illustrates one of the easiest ways of doing this: use a global variable and refer to that. This works great for small programs but when your program gets a bit larger, or your codebase is a bit older, this becomes very painful. Globals are difficult to maintain, they’re not very flexible, and they make code difficult to unit-test. You can also see in the example that the dependency is kind of hidden.

DI frameworks

Thankfully the industry has moved on from globals (right?) and frameworks like Spring and Guice have been invented to help. I won’t go into details about how they work, but they’re usually similar to constructor injection.

Constructor injection

In OO languages we can use the constructor of an object to provide it with the needed dependency. The AdventureTime object is now a class.

class AdventureTime(ooo: Ooo) {
  def getHero(name: String): Hero = ooo.findHero(name)

  def getBestFriend(hero: Hero): Hero = ooo.friendsRegistry()(hero.name)

  def goOnAdventure(hero1: Hero, hero2: Hero): String = {
    val result = ooo.evalAdventure(hero1, hero2)
    s"Adventure time with ${hero1.name} and ${hero2.name} was $result!"
  }
}

val at = new AdventureTime(ooo)

val hero1 = at.getHero("Finn")
val hero2 = at.getBestFriend(hero1)
val result = at.goOnAdventure(hero1, hero2)

This is a bit better than using global variables. Note that we still need some way to actually get ooo to where we create our at object, but in this post I want to focus on where the dependency is used. You can see that AdventureTime now has an explicit dependency on Ooo.

One caveat of this approach is that your class file should not become too large, otherwise you’re basically back to using a global variable! Constructor injection is not bad, it’s been used to create large systems. It’s fairly flexible, although you usually can’t change the dependency after it’s set. In order to test this you’d need to create a mock implementation or use a mocking library to mock the dependency.

What we actually want

We actually would like to pass the dependency as a parameter to every function that might need it.

object AdventureTime {
  def getHero(ooo: Ooo, name: String): Hero = ooo.findHero(name)

  def getBestFriend(ooo: Ooo, hero: Hero): Hero = {
    ooo.friendsRegistry()(hero.name)
  }

  def goOnAdventure(ooo: Ooo, hero1: Hero, hero2: Hero): Unit = {
    val result = ooo.evalAdventure(hero1, hero2)
    s"Adventure time with ${hero1.name} and ${hero2.name} was $result!"
  }
}
import AdventureTime._
val ooo = Ooo(())

val hero1 = getHero(ooo, "Finn")
val hero2 = getBestFriend(ooo, hero1)
val result = goOnAdventure(ooo, hero1, hero2)

This is a very flexible approach, we could change the dependency with each function call. We don’t need an instance variable to hold the dependency which makes this approach very suitable for, well, functions. We obviously see a pattern in these functions, but we can’t really abstract over it to remove the repetition.

Monads

Let’s see how we can use some functional programming and the Reader monad to improve this. Before we do that though, let’s quickly refresh how monads work. We use an all time favourite, the Option monad. Feel free to skip this explanation if you’re familiar with it.

The example code is actually not very null-safe.

val hero1 = getHero(ooo) // <- hero1 could be null
// which would probably make getBestFriend throw an NPE
val hero2 = getBestFriend(ooo, hero1)
// hero2 can also be null...
val result = goOnAdventure(ooo, hero1, hero2)

One way to handle this would be something like:

val hero1 = getHero(ooo, "Finn")
if (hero1 != null) {
  val hero2 = getBestFriend(ooo, hero1)
  if (hero2 != null) {
    val result = goOnAdventure(ooo, hero1, hero2)
  } else {
    println("No adventure today")
  }
} else {
  println("No adventure today")
}

This kind of clutters up things and distracts from what the code is actually trying to do. The Option monad represents the possibility that something can be null. We can encode this optional behaviour into the types. The monad then let’s us concentrate on the actual happy-path of the code while handling the boiler-plate around null-checking for us.

case class Ooo(importantSettings: Unit) {

  // It's possible the hero can't be found, so it's optional
  def findHero(name: String): Option[Hero] = {
    Some(finn)
  }

  def friendsRegistry(): Map[String, Hero] = {/* same as before */}

  def evalAdventure(hero1: Hero, hero2: Hero): String = {
    /* same as before */
  }
}

object AdventureTime {
  // Another Option here.
  def getHero(ooo: Ooo, name: String): Option[Hero] = ooo.findHero(name)

  // Yet another one. Types tend to ripple through a codebase
  def getBestFriend(ooo: Ooo, hero: Hero): Option[Hero] = {
    ooo.friendsRegistry().get(hero.name)
  }

  def goOnAdventure(ooo: Ooo, hero1: Hero, hero2: Hero): String = {
    /* same as before */
  }
}
import AdventureTime._
val ooo = Ooo(())

val result: Option[String] = for {
  hero1 <- getHero(ooo, "Finn")
  hero2 <- getBestFriend(ooo, hero1)
} yield goOnAdventure(ooo, hero1, hero2)

println(result.getOrElse("There was no adventure :("))

The Option monad does exactly what we want. If there are no nulls, everything works as before. If there is a null somewhere in the process, it kind of ‘sticks’. I.e., no subsequent code is executed and a None is returned. It’s not exactly ‘as before’, we’ve obviously switched to a for comprehension.

We’ve enhanced the return types of our functions to deal with a kind of ‘secondary’ logic so we can focus on the main functionality that we’d like to express. That sounds familiar. What if we could encode our dependency into the return type as well?

Enter the Reader

The Reader monad basically encodes a simple function. It’s type definition is:

type Reader[E, A] = ReaderT[Id, E, A]

Let’s forget the right hand side of that type alias for now. Reader just expresses a function that takes a parameter of type E and returns a value of type A. Think of it as:

def func(e: E): A = {
  // create some A using e
}
// or
val func = (e: E) => {
  new A(e.foo())
}

You see how we could use that to express a dependency. The first type parameter E stands for ‘environment’. In our code E is Ooo and A is whatever our functions return. E.g., an Option[Hero] or a String. The type signature of getHero would become def getHero(name: String): Reader[Ooo, Option[Hero]]. Read: “getHero is a function that returns a function. When the returned function is supplied an Ooo it will return an Option of Hero“.

Let’s add this to our example. Note that all the functions in AdventureTime have the same dependency, so we make a little type alias for it. I’m assuming the reader is familiar with the various ways of creating lambda functions in Scala.

// Warning: this is not the final example, don't write code like this!
type OooReader[X] = Reader[Ooo, X]
object AdventureTime {

  def getHero(name: String): OooReader[Option[Hero]] = Reader{
    (ooo: Ooo) => ooo.findHero(name)
  }

  def getBestFriend(hero: Hero): OooReader[Option[Hero]] = Reader{
    _.friendsRegistry().get(hero.name)
  }

  def goOnAdventure(h1: Hero, h2: Hero): OooReader[String] = Reader{
  (ooo: Ooo) =>
    val resultOfAdventure = ooo.evalAdventure(h1, h2)
    s"Adventure time with ${h1.name} and ${h2.name} was $resultOfAdventure!"
  }
}
import AdventureTime._

val res = for {
  hero1 <- getHero("Finn")
  hero2 <- getBestFriend(hero1.get) // .get !? ick...
  result <- goOnAdventure(hero1.get, hero2.get)
} yield result

This looks similar to before, but we’ve managed to remove all the ooo parameters. Hang on, where are we injecting ooo now? Well, we’re not. This code seems to not do anything. If you inspect the type of res you’ll see it’s scalaz.Kleisli[scalaz.Id.Id,Ooo,String]. 😱

Remember that getHero returns an OooReader, i.e., a function taking an Ooo and returning an Option[Hero]. getBestFriend actually has the same signature. Just like Option, using Reader in a for comprehension sequences the monads into a ‘bigger’ one. For Option this means combining potentially absent values. For Reader it just means: “keep passing the dependency to the next function”. We’ve basically combined all three function calls into one big Reader.

If we want to execute the code we need to supply it with an Ooo using the run function of Reader.

res.run(Ooo(()))
// --> scalaz.Id.Id[String] = Adventure time with Finn and Jake was awesome!
Monad Transformer

Monad Transformer

We’ve run into a problem though. We had to resort to the evil get function for unwrapping our Options. So the Reader basically undid all the Option monad goodness. Ideally the code should handle both monads at once. Fortunately there is a monad transformer for Reader called ReaderT.

What was that weird type signature and what is this Id stuff? Remember the right hand side of the Reader type alias? It was ReaderT[Id, E, A]. It turns out that instead of working with functions of type E => A, we usually work with functions like E => M[A], where M is some kind of monad. ReaderT expresses just that. Reader is actually an alias for ReaderT where M is the Id monad. I see Id as the ‘does nothing’ monad.
ReaderT looks like this:

type ReaderT[F[_], E, A] = Kleisli[F, E, A]

What? Another type alias? Yes, ReaderT is actually equivalent to Kleisli, which is what scalaz uses. Kleisli also adds many convenience functions for combining Kleislis.

Let’s rewrite our example using Kleisli instead:

object AdventureTime {
  // Kleisli[Option, Ooo, Hero] 'represents' Ooo => Option[Hero]
  def getHero(name: String) = kleisli[Option, Ooo, Hero](_.findHero(name))

  def getBestFriend(hero: Hero) = kleisli[Option, Ooo, Hero]{
    _.friendsRegistry().get(hero.name)
  }

  def goOnAdventure(h1: Hero, h2: Hero) = kleisli[Option, Ooo, String]{
  (ooo: Ooo) => 
    val resultOfAdventure = ooo.evalAdventure(h1, h2)
    Some(s"Adventure time with ${h1.name} and ${h2.name} " +
         s"was $resultOfAdventure!")
  }
}
import AdventureTime._

val res = for {
  hero1 <- getHero("Finn")
  hero2 <- getBestFriend(hero1)
  result <- goOnAdventure(hero1, hero2)
} yield result

res.run(Ooo(()))

Before we had Reader just wrapping a function that matches the desired type. There is no such constructor for ReaderT, probably just because kleisli already does exactly the same. In other words, one can create a ReaderT using the kleisli function. The type parameters in order are: the monad of the return value, the environment of the function, and the type of the return value.

The Future

This all looks nice but we might not be convinced yet. Sit tight, I’ll show you a great advantage of using Reader. We’ll have to go even more functional though.

Our for comprehension should belong in some function in the logic layer of our program. We’ve abstracted the dependency on Ooo through the Reader but the sample code still strongly couples to AdventureTime. Let’s remove that by passing the necessary functions as parameters instead!

object SomeFancyLogic {
  def startEpicAdventure(
    getHero: (String) => ReaderT[Option, Ooo, Hero],
    getBestFriend: (Hero) => ReaderT[Option, Ooo, Hero],
    goOnAdventure: (Hero, Hero) => ReaderT[Option, Ooo, String])
   (name: String): ReaderT[Option, Ooo, String] = {
    for {
      hero1 <- getHero(name)
      hero2 <- getBestFriend(hero1)
      result <- goOnAdventure(hero1, hero2)
    } yield result
  }
}

// We usually 'wire up' the parameter group containing the
// functions first
val startEpicAdventureWired = SomeFancyLogic.startEpicAdventure(
                                          AdventureTime.getHero _,
                                          AdventureTime.getBestFriend _,
                                          AdventureTime.goOnAdventure _) _

startEpicAdventureWired("Finn").run(Ooo(()))

Let’s also make our ‘database’ a bit more realistic. In the server world we like to avoid blocking, so APIs for external services usually return Futures.

// The land of Ooo of the future
case class Ooo(importantSettings: Unit) {

  // findHero now returns a Future
  // for simplicity I'm ignoring the Option stuff.
  def findHero(name: String): Future[Hero] = {
    Future.successful(finn) // again, just simulating here..
  }

  def friendsRegistry(): Future[Map[String, Hero]] = {
    Future.successful(Map(finn.name -> jake))
  }
  
  def evalAdventure(hero1: Hero, hero2: Hero): Future[String] = {
    Future.successful{
      if (hero1 == jake || hero2 == jake) "awesome" else "disappointing"
    }
  }
}

// The rest of the code stays almost the same!
// Just change the Monad type parameter from Option to Future

object AdventureTime {
  def getHero(name: String) = kleisli[Future, Ooo, Hero](_.findHero(name))

  def getBestFriend(hero: Hero) = kleisli[Future, Ooo, Hero]{
    _.friendsRegistry().map(_(hero.name))
  }

  def goOnAdventure(h1: Hero, h2: Hero) = kleisli[Future, Ooo, String]{
  (ooo: Ooo) =>
    ooo.evalAdventure(h1, h2).map{result =>
      s"Adventure time with ${h1.name} and ${h2.name} was $result!"
    }
  }
}

object SomeFancyLogic {
  def startEpicAdventure(
    getHero: (String) => ReaderT[Future, Ooo, Hero],
    getBestFriend: (Hero) => ReaderT[Future, Ooo, Hero],
    goOnAdventure: (Hero, Hero) => ReaderT[Future, Ooo, String]
  )(name: String): ReaderT[Future, Ooo, String] = {
    for {
      hero1 <- getHero(name)
      hero2 <- getBestFriend(hero1)
      result <- goOnAdventure(hero1, hero2)
    } yield result
  }
}

/* wiring as before, snipped for brevity o_O */

val future = startEpicAdventureWired("Finn").run(Ooo(()))
Await.result(future, 2.seconds)

A pattern is emerging here! We can actually abstract out the monad! We can also abstract away the dependency on Ooo. It looks like this:

object SomeFancyLogic {
  def startEpicAdventure[M[_]: Monad, E](
    getHero: (String) => ReaderT[M, E, Hero],
    getBestFriend: (Hero) => ReaderT[M, E, Hero],
    goOnAdventure: (Hero, Hero) => ReaderT[M, E, String]
  )(name: String): ReaderT[M, E, String] = {
    for {
      hero1 <- getHero(name)
      hero2 <- getBestFriend(hero1)
      result <- goOnAdventure(hero1, hero2)
    } yield result
  }
}

E is now the generic type for the dependency. M[_] is a type that is actually a type constructor. Look at it as a type with a hole that needs another type to be whole. E.g., Option[String] or Future[Hero]. We also specify that there needs to be an implementation for the Monad type class for M.

The cherry on top

Wildberry is not a cherry but she is pretty.

Wildberry is not a cherry but she is pretty.

Testing this piece of logic now becomes pretty easy. Of course the logic is really simple here.

A unit test should only test the code-under-test. With our new function parameters this means we can easily instruct our test without using any mock libraries. We test Popjam using ScalaCheck to do extensive property based testing. Also note that while the database is using Futures, we don’t actually want to test the asynchronous behaviour of the code, just the logic. Moreover, creating tests with concurrency in them usually leads to brittle time-dependent tests.

Here’s how we could test our logic:

def testEpicAdventure() = {
  // our 'mocked' functions. Usually we would make them return
  // more useful results obviously
  val getHero = (name: String) => kleisli[Id, Unit, Hero]{
    _ => Hero(name)
  }
  val getBestFriend = (h: Hero) => kleisli[Id, Unit, Hero]{
    _ => Hero("Jake")
  }
  val goOnAdventure = (h1: Hero, h2: Hero) => kleisli[Id, Unit, String]{
    _ => "Test adventure"
  }
  
  val wired = startEpicAdventure(getHero, getBestFriend, goOnAdventure) _
  val result = wired("Finn").run(())

  result aka "how did the adventure test go" should equal("Test adventure")
}

We can just use Id for our monad and Unit for the database. I’ve found this way of testing to be a lot more fun than setting up complicated mock, stub, or spy objects.

There are a lot more things we can do with scalaz and ReaderT. Like MonadReader ask for instance. I encourage you to go on that adventure yourself!

Testing with Amazon SQS

We all know how great Amazon SQS is, and here at Mind Candy we use it extensively in our projects.

Quite recently, we started making some changes to our Data Pipeline in order to speed up our Event Processing, and we found ourselves with the following problem: how can we generate thousands of messages (events) to benchmark it? The first solution that came into our minds was to use the AWS Command Line Interface, which is a very nifty tool and works great.

The AWS Command Line Interface SQS module comes with the ability to send out messages in batches, with a maximum of 10 messages per batch, so we said: “right, let’s write a bash script to send out some batches”, and so we did.

Problem

It worked alright, but it had some problems:

  • It was slow; because messages were being sent in batches of up to 10 messages and not in parallel
  • The JSON payload had to contain some metadata along with the same message repeated 10 times (1 for each message entry)
  • If you needed to send 15 messages, you would have to have 1 message batch with 10 entries and another one with 5 entries (2 JSON files)
  • Bash scripts are not the best thing in the world for maintenance

So, what did we do to solve it? We wrote our own command line program, of course!

Solution: meet sqs-postman

Writing command line applications in Node.js is very very easy, with the aid of the good old Commander.js. Luckily, AWS has an SDK for Node.js, so that means that we don’t need to worry about: AWS authentication, SQS API design, etc. Convenient? Absolutely!

Sqs-postman was designed with the following features out of the box:

  • Sends messages in batches of up to 10 messages at a time (AWS limit)
  • Batches are sent out in parallel using a default of 10 producers, which can be configured using the –concurrent-producers option
  • A single message is read from disk, and expanded into the total number of messages that need to be sent out
  • It supports AWS configuration and profiles

In order to solve the “messages in parallel” problem, we used the async library. We basically split the messages into batches and we then use eachLimit to determine how many batches can be executed in parallel, which starts with a default value of 10 but can be configured with an option.

Can I see it in action?

Of course you can! sqs-postman has been published to npm, so you can install it by running:

 npm install -g sqs-postman

Once installed, just follow these simple steps:

  • Make sure to configure AWS
  • Create a file containing the message, i.e. message.json with a dummy content
    {
       "message": "hello from sqs-postman"
    }
  • Run it
    $ postman message my-test-queue --message-source ./message.json --concurrent-producers 100 --total 1000

If you would like to see more information, the debug mode can be enabled by prepending DEBUG=sqs-postman postman…

Text is boring, show me some numbers!

You are absolutely right! If we don’t share some numbers, it will be hard to determine how good sqs-postman is.

Messages aws-cli sqs-postman
100 0m 4.956s 0m 0.90s
1000 2m 31.457s 0m 4.18s
10000 8m 30.715s 0m 30.83s

As you can appreciate, the difference in performance between aws-cli and sqs-postman is huge! Because of sqs-postman’s ability to process batches in parallel (async), the execution time can be reduced quite considerably.

These tests were performed on a Macbook Pro 15-inch, Mid 2012 with a 2.6 GHz Intel Core i7 Processor and 16 GB 1600 MHz DDR3 of RAM. And time was measured using Unix time.

Conclusion

Writing this Node.js module was very easy (and fun). It clearly shows the power of Node.js for writing command line applications and how extensive the module library is when it comes to reusing existing modules/components (e.g. AWS SDK).

The module has been open sourced and can be found here. Full documentation can be found in there too.

As usual, feel free to raise issues or better yet contribute (we love contributors!).

Office Music

Office music, some love it and some hate it. While I’m in the camp that’s for office music I can completely understand why some might not be in favour of it.

We here at Mind Candy find music in the workplace to be a mood enhancement, and in a way a bonding process. You find similarities between yourself and your peers and generate links that weren’t there previously. Music helps reduce those awkward silences filled with keyboard tapping, mouse clicking and the odd coughing fits, and introduces an atmosphere which is indusive to the culture we look to nurture and promote. There’s a great few articles out there which go into greater detail about whether music in the workplace is a good or bad thing, some can be found here.

Last year we started looking into a solution for playing music for the area in which our team sits, after some search engine fu we found Mopidy. Mopidy is an extensible MPD and HTTP server written in Python. Mopidy plays music from your local disk and radio streams while with the help from extensions, you can also play music from cloud services such as Spotify, SoundCloud and Google Play Music.

As we already have a few Spotify accounts we thought we’d toy with the idea of using Mopidy to play music from Spotify. In order to use Spotify you also need to use the Mopidy-Spotify extension.

Once we had both Mopidy and the Spotify extension working we then needed something to interact with it all. After looking through the Mopidy documentation we came across the web extensions section which suggests various web interfaces to interact with the HTTP side of the Mopidy server.

Initially we used Apollo Player. Apollo Player’s great as it allows anyone to log in using their Google Apps or Twitter credentials and then add music to a one time playlist meaning anyone can choose what music is playing. There is also a bombing feature so any music that’s been added can be skipped if bombed by three people. When no music has been selected it will default back to a playlist set in config.js which is found in the root directory of Apollo. The problem there is that once the default playlist has been played for the umpteenth time it can get pretty tedious and only people with access to the app’s root directory can change this. This led us to Mopify.

Mopify gives you much of the functionality that the Spotify client gives you e.g. Browse, Featured Playlists, New Releases, Playlists and Stations. You can log in with your own Spotify account or use the account that Mopidy-Spotify is utilising and use the playlists associated with either account. It gives you greater functionality and options than Apollo but then you lose the collaboration and unmanaged element you had with Apollo.

Finally we then needed to actually run Mopidy on something as it was no good having it run from my local machine. We decided to use a Raspberry Pi and plugged it into some speakers running along the cable trays above our heads. The Rasberry Pi is running Raspbian with Mopidy, Mopidy-Spotify and which ever web extension we’ve chosen. Another Raspberry Pi with Mopidy has been set up as a jukebox in our chillout/games area which works really well with Mobile devices due to most of the web extensions being bootstrapped. This gives employees the flexibility to easily play whatever music they feel like when they are in the communal area.

In our eyes, while music in the office isn’t a necessity, it is definitely beneficial, and it’s fantastic that all these open source tools and products give us the ability to do this.

And lets be honest, who can’t resist an impromptu sing along to Bohemian Rhapsody!

 

Mopidy – Extensible music server written in Python

Mopidy-Spotify – Mopidy extension for playing music from Spotify

Apollo Player – Mopidy web extension

Mopify –  Mopidy web extension

Raspberry Pi – ARM based computer running under GNU/Linux

 

Pi-tomation

Screen Automation – Selenium (and some other stuff), meets Raspberry Pi

Lets set the scene, you need to display some stuff on a screen so everyone in the office can see it. Easy, you mount a couple of TVs on the wall get a dvi-splitter and an old mac mini you had in the store room on the top shelf behind a roll of cat5 cable.

Set everything up, get the mac mini to auto login and mount a shared drive, then run a little script that uses selenium to open a browser and show pre-determined images of the stuff want to display, all stored on the same shared drive, done….

Fast forward a couple of years and you now have a lot more to display on a lot more screens, but what are you going do? It’s impractical – and expensive – to buy a bunch of mac minis just to run a script that opens a web browser. The end goal of all this is to have dashboards that are easily manageable by their respective teams.

 

Challenge Accepted

 

Have you heard of this new Raspberry Pi thing. Its a small ARM PC that’s the size of a credit card, and they’re cheap. What they’re also USB powered? Bonus now we can just power them from the TV itself and when the TV comes on the pi comes on. Now we just replace the mac mini with the pi and run the same script when it boots and we’re all done. Wait not so fast, the share isn’t public so we need a credentials to connect. That’s OK we can store them in a file locally and use fstab to connect. Yeah that works but we want to display different things on different screens so now I have to create different scripts and manually tell each Pi which one to use. OK that’s not too bad, the first time you set up each one just point it to the script it needs to run and then you can just update the script and reboot the pi. So far its shaky but it works, sometimes. One of the problems was that sometimes it would try to run the script on the network share before it was mounted properly and also running a script or (multiple at this point) over the network on a device with the processing power of about 7.4 hamsters isn’t really going to cut it. I’m getting tired of crowbarring fixes into something that wasn’t really designed for this use and troubleshooting seemingly random issues.

What do I actually want to accomplish here and how am I going to do it??

  1. Have the script run locally, its only managing a web browser after all.
  2. Config easily changeable and centrally managed.
  3. Get the pi to check for new config on startup.

Done, yes that’s it pretty simple, so here’s what I did.

Ingredients

  • bash script
  • json file. Lists the pages that the web browser should visit. Could also be local files loaded into the browser images etc.
  • python script. Loads the json ‘config’ and specifies how long each page should be displayed etc and does a bit of error checking.
  • Git (or other) repository

Method

Edit your rc.local to run a bash script that lives somewhere locally on the pi. eg /opt/scripts/ The bash script downloads selenium, firefox (actually iceweasel on debian) and facter (so we can get info really quickly)

I did consider using puppet for this whole thing at one point but that was a bit of overkill plus it had its own complications at the time try to run on on an ARM processor)

The bash script also uses facter to determine the mac address of the pi and remove the colons. (I must admit that facter may be a bit overkill here as well but hey, I’ve gotten used to having it around). It then searches your webserver (or other location) for files carrying its mac address as a name, ( I have a set of defaults that it uses if none are found). Have your webserver run a cron that pulls the repository of all your files. You could have each device pull the repository directly but the more screens you have the more inefficient that will be as you’ll be storing a whole repo on the pi just to get at 1 or 2 files. you could also have a web hook that only updates the web server when there are changes to the repo but I didn’t think it was worth it at this point. The json is self explanatory.

You can take a look at the principle here.

https://github.com/mindcandy/pi-screens.git

Plans for the future of this project includes a self service dashboard that will take the ingredients and mix them with the right config without the user necessarily having any coding knowledge.

HTML5 games in Mind Candy

HTML5 games in Mind Candy

HTML5 Games has always been a bit of a grey area, with the decline of the Flash Platform it still felt like Web Technologies were lagging behind what the Flash and Unity Player could do in the browser.

Over the last year or two this has all changed, since Steve Jobs declared war on Flash it’s been a bit of a bumpy ride but with companies such as Google, Mozilla, and Microsoft all getting behind HTML5, W3C finally declared the standard as ‘complete’ it suddenly feels like the technology has grown up.

HTML Games have also grown up, with Nintendo partnering with Unity and ImpactJS for their Web Framework, as well as the BBC and Nickelodeon investing a lot of money in to converting their existing Flash games to create new and exciting experiences for users on a wider range of devices.

Here at Mind Candy we always want to push things and try new technologies, however we also feel like whatever we do try has to work in a real world scenario and while HTML5 has been around for a while, we’ve never felt it a good fit for us until now.

Why HTML5?

With PopJam growing as a platform we always wanted to deliver games to our audience, however with the App store submission times releasing content frequently making the games natively within the App was completely out of the window, also having to support multiple platforms we needed something that was write once and deploy across all, this is where HTML5 came in for us.

Cross Platform

One of the huge benefits of using HTML5 was that it is truly cross platform, and while the performance of native will always be far greater, porting the games over to each platform would’ve destroyed us as a team.

When starting out with HTML5 we instantly noticed that even though we were cross platform, there were still hoops we needed to jump through to make things work in the way that we wanted, the main pain point being audio.

As we were targeting a mass of devices we needed to make sure that our games worked on all resolutions and inputs worked as expected, however it felt that once we’d broken this barrier we’d be okay.

iOS provided a UIWebView we could use out of the box, however we decided to use the Crosswalk Project for Android as it allowed us greater control than the one that comes built in to Android.

Fast Iterations

Using HTML5 means we were not bound by the App Store restrictions, meaning we can push new games and updates out incredibly fast. It’s not only deployments that are faster either, on of the most powerful things with making HTML5 games is that it’s a link to a page and the game can be played.

With JavaScript there is no compilation time, and you can debug in real time within the browser. This also meant we could develop some pretty cool in house tools that would speed up the development of our games and systems, whilst running within PopJam.

Starting out

One of the things with making HTML5 games is that there are so many things that need to be considered, such as asset loading, memory management, input, physics, 3D, 2D, animations and many more, we had to decide on the best way to deliver our games in the most optimal way possible.

On top of all of these decisions there are also multiple ways to render content within the browser:

  1. CanvasStarted as an experiment by Apple, is is now possibly the most widely supported standard for generating graphics on the web. Using canvas also eliminates a lot of cross compatibility issues that other methods may have. Performance tests on both iOS and Android worked out quite well for us.
  2. WebGLWebGL offers hardware accelerated graphics within the browser and on mobile is really still early days, while iOS implemented full support it still comes with some very interesting edge cases. Android support for WebGL is very different world as we found out when targeting low end devices.
  3. Divs / CSS TransitionsThe method of updating divs that are rendered on the page is an interesting one as it allows for nice affects using CSS3 transitions, however the lack of support across mobile browsers and different versions of mobile operating systems was a problem.

We tried all of the above methods and ultimately we ended up utilising all of them, it really came down to the content that was being presented to the user. We used WebGL where we could, and anything that didn’t support it we fell back to Canvas.

Anything that had relatively simple content we ended up manipulating divs and using various methods for transitioning elements to fix cross compatibility issues.

Choosing a Game Engine

One of the things that stood out when looking for a game engine is that there is a lot of them, and not only engines, there are also products out there that known as ‘Game Makers’ allow you to make games with little to no code such as Construct, Game Maker, and Game Salad. If you’re looking for something to try I can highly recommend this website.

We actually tried a couple of different engines, as well as allowing people who weren’t developers to use the ‘Game Makers’ to prototype ideas and test performance.

After evaluating our choices we decided to use Pixi.js from Goodboy Digital, an incredibly lightweight engine that offers an ActionScript like API as well as many other features such as:

  • Asset management
  • Multi Touch for Mobile
  • Sprite Sheet support
  • Full Scene Graphs
  • Third Party Libraries (Spine, Tiling)

it also allowed us to toggle effortlessly between Canvas and WebGL to allow for support on lower end devices.

Another thing that Pixi has is thorough tutorials, incredible documentation and a very active community which goes a long way when choosing something like an engine to use be it for games or software in general.

At the time of writing this article, Pixi have just announced v3 of the engine, and have provided a benchmark test to show off the performance. I would strongly urge you go check it out, even on a low end device it’s pretty impressive.

Tools

One of the things that came as a breath of fresh air when venturing in to the world of JavaScript is how far it’s come since I last used it, for the last few years I’ve had my head firmly planted in AS3 and Unity with C#.

With tools such as:

  1. YeomanYeoman allows you to start new projects, choosing from hundreds of generators that have been created it, you are able to scaffold new projects quickly whilst prescribing best practices and tools.
  2. BowerThis is one of the most lightweight package managers along with NPM I’ve used in my career, allowing us to manage dependencies across projects effectively and also allowed us to keep our repositories incredibly small.
  3. GruntUsing Grunt as our build system was one of the best decisions we made, allowing us to move incredibly fast when building our games, and automate a lot of tasks that done manually would’ve been incredibly laborious.

We were able to create a solid work flow from starting a project to releasing our content on to PopJam.

As well as using these tools, the JavaScript community is an incredibly talented community with a lot of libraries out there to use that help in every day web development.

It’s not all rosy

As amazing as things have been making games over the past few months, it has not been without its headaches and hair pulling moments, but this is why we love what we do, right? If it wasn’t a challenge then it would be boring.

Targeting multiple platforms comes with its own problems, however some of the biggest problems we had was with the hardware on Android, as there are a lot of cheaper low end devices that are prime for parents to buy for their children we encountered devices claiming they supported certain features however when running in the browser would crash the PopJam instantly leaving us in a state of flux and no logs to go on. We found a lot of this came down to the chipsets that the cheaper devices use.

It wasn’t only Android that caused us problems either, with the iPod Touch 4G being one of most used devices amongst children and some only supporting iOS6 this left us not being able to push performance as much as we wanted, as well as the iOS6 UIWebview implemented being very temperamental about what standards it supported.

The one thing that caused us the most headaches out of everything though was Audio, HTML5 Audio is still very limited and even more so on some of the cheaper devices with some only supporting the WAV format which means larger file sizes, any other format used would cause the whole application to crash as no other codec was available. It is recommended to use the

Conclusion

We’ve had some amazing fun creating some interesting games for PopJam using HTML5, not only because we got to make games but we also got to build some awesome internal technology and tools, create a pipeline from concept to production in just a few months, and most importantly we got to create some engaging experiences for our PopJam users.

 

 

 

Deployments using All the Things!

As we’ve mentioned in previous posts, we use AWS services extensively at Mind Candy. One of the services that we’ve blogged about before is CloudFormation. CloudFormation (CF) lets us template multiple AWS resources for a given product into a single file which can be easily version controlled in our internal Git implementation.

Our standard setup for production is to use CF to create Autoscaling Groups for all EC2 instances where, as Bart posted a while back, we mix and match our usage of on-demand instances and spot priced instances to get the maximum compute power for our money.

During load testing of the backend services of our games we did, however, notice a flaw in the way we’re doing things. Essentially, this was the speed with which we could scale up under rapid traffic surges, such as those generated by feature place in mobile app stores.

The core problem for us was that our process started with a base Amazon Image (AMI), after initial boot it would then call into Puppet to configure it from the ground up. This meant that a scaling up event could take many minutes to occur – even with SSD-backed instances – which isn’t ideal.

Not only could this take a long time – when it worked – but we were also dependent on third-party repositories being available, or praying that Ruby gem installations actually worked. If a third-party was not available then the instances would not even come up, which is worse position to be in than it just being slow.

The obvious answer to this problem is to cut an AMI of the whole system and use that for scaling up. However, this also poses another problem that you now make your AMI a cliff edge that sits outside of your configuration management system.

This is not a particularly new problem or conundrum of course. I can personally recall quite heated debates in previous companies about the merits of using AMIs versus a configuration management system alone.

We thought about this ourselves too and came the conclusion that instead of accepting this binary choice we’d split the difference and use both. We achieved this by modularising our deployment process for production and using a number of different tools.

The Tools

Teamcity – we were already using our continuous integration system as the initiator of our non-production deployments so we decided to leverage all the good stuff we already had there and, crucially, we could let our different product teams deploy their own builds to productions and we would just support the process.

Fabric – we’ve been using Fabric for deployments for quite some time already. Thanks to the excellent support for AWS through the Boto library we were easily able to utilise the Amazon API to programmatically determine our environments and services within our Fabric scripts.

Puppet – when you just have one server for a product using a push deploy method makes sense as its quick. However, this doesn’t scale. Bart created a custom Puppet provider that could retrieve a versioned deployment from S3 (pushed via Fabric) so we could pull our code deploys on to remote hosts.

Packer – we opted to use Packer to build our AMIs. With Packer, we could version control our environments and then build a stable image of a fully puppetized host which would also have the latest release of code running at boot, but could still run Puppet as normal as well. This meant we could remove the cliff edge with an AMI, because, at the very worst we would bring up the AMI and then gain anything that was missing but do so quickly as it was “pre-puppetized”.

Cloudformation – Once we had a working AMI we could then update our version controlled templates and poke the Amazon API to update them in CloudFormation. All scaling events would then occur using the new AMI containing the released version of code.

The Process – when you hit “Run” in Teamcity

  1. Checkout from git the Fabric repo, the Packer repo and the Cloudformation repo.
  2. Using a config file passed to Fabric that would run a task to query the Amazon API and discover our current live infrastructure for a given application/service.
  3. Administratively disable Puppet on the current live infrastructure so Puppet doesn’t deploy code from S3 outside of the deployment process.
  4. Push our new version of code to S3.
  5. Initiate a Packer build, launching an instance and deploying the new code release.
  6. Run some smoke tests on the Packer instance to confirm and validate deployment.
  7. Cut the AMI and capture its ID from the API when its complete.
  8. Re-enable and run Puppet on our running infrastructure thus deploying the new code.
  9. Update our Cloudformation template with the new AMI and push the updated template to the CloudFormation API.
  10. Check-in the template change to Git.
  11. Update our Packer configuration file to use the latest AMI as its base image for the next deploy.

What we’ve found with this set-up is, for the most part, a robust means of using Puppet to deploy our code in a controlled manner, and being able to take advantage of all the gains you get when autoscaling from baked AMI images.

Obviously we do run the risk of having a scaling event occur during deployment, however, by linking the AMI cutting process with Puppet we’re yet to experience this edge case, plus all our code deploys are (and should be) backwards compatible, so the edge case doesn’t pose that much of a risk in our set-up.