Circles Of Causality

The Misnomer of ‘Move Fast and Break Things’

image06For a while I’ve wanted to visualise the pain points of the development cycle so I could better explain to product & business owners why their new features take time to deliver. Also to squash the misnomer of the overused “Move fast, break things” mantra of Facebook.

 

So some of you may know that recently Facebook realised this wasn’t actually sustainable once you have your product stable and established. It might work in the early conception of a product, but later on it will come back to haunt you.

At the F8 Developers conference in 2014, Facebook announced they are now embracing the motto “Move Fast With Stable Infra.”

“We used to have this famous mantra … and the idea here is that as developers, moving quickly is so important that we were even willing to tolerate a few bugs in order to do it,” Zuckerberg said. “What we realized over time is that it wasn’t helping us to move faster because we had to slow down to fix these bugs was slowing us down and not improving our speed.”

I’ve recently been reading Peter Senge’s “The Fifth Discipline: The Art and Practice of the Learning Organization” and thought circles of causality would be a good way to express this subject. Using system thinking and looking at the whole picture can really help you stand back and see what’s going on.

circles of causality1Be it growing user acquisition or sales, a good place to start is what are you trying to achieve and how to exponentially increase it. In this case lets say that as a business you want to drive user engagement and grow your DAU.  One possible way is to add new features to your product.

So following the circle in this diagram we assume that by delivering new features, we increase user engagement which in turn leads to growing your DAU/Virality.

Lets assume the product has been soft launched, you’re acquiring users, A/B testing initial features and have begun growing your list of improvements and new features. Lets create a ‘Backlog’ of all this work, prioritise them, plan how we deliver those quickly using an Agile Scrum framework.

We want to deliver a MVP as quickly as possible, so lets do two week ‘Sprints’. The product team have lots of great ideas but far too many to put into one sprint and some of the new features require several sprints. Product owners & other business leaders debate the product roadmap and you have your sprint planning….simple right ?….Well to begin with yes

 

circles of causality2So lets look at what the size of the backlog does to delivery. In this diagram you see that the size of backlog directly affects how soon improvements/features are made. Why? Well because the backlog is also made up of bug fixes and technical debt, often inherited from your prototyping phase and deploying your MVP.

You’d love to tell the business that you’ll just work on new stuff; but hey worse case we deliver something every two weeks, but some features could take months to appear.

So with a relatively small backlog we are ok. Yes some business leaders are a bit frustrated their feature won’t get deployed in this sprint but the short term roadmap is clear right ?

Dev team gets their head down and get on with the sprint….but in the background product/business owners have moved on from yesterday’s must have feature to yet another new shiny idea or potential crisis; the backlog grows and the roadmap is changed. Features get pushed further down the priority list.

So the situation is we have X amount of resource and over time the business is getting frustrated at the pace of delivering changes to the product. Weeks have passed and only 25% of these ideas/features are shipped.

The Symptomatic Solution

So there could be two potential fixes for this…move faster and drop quality so we can ship stuff quicker or throw more people at it. Lets look at what happens with the “Move fast, break things” mantra. So to increase delivery time we cut corners, drop some testing, code reviews, developers pushed to make hacky solutions etc etc

circles of causality3As you see in this diagram, as you do this you create more bugs and the QA process takes longer. Any initial advantages are lost as this builds up.

 

 

Now we have also added a ‘side effect’. More bugs increase the size of the backlog creating the opposite effect you intended in the first place.

 

 

 

So lets put in more man hours (overtime) to get those bugs down and reduce this growing backlog. More overtime increases fatigue & the quality of the work. Developers get burnt out, they make more mistakes, quality of work suffers and again more bugs and are even more demoralised.

Lets look at the result of this on staff & the complexity of their work. In this diagram we see that by reducing quality we also increase code complexity which generates technical debt, which again slows down development. Tech debt is pretty demoralising, as usually no one is invested in fixing it and in most cases you just work around it.

Adding more developers has a different outcome with equally diminishing results. Big teams in an Agile framework, isn’t always a great idea. The typical strategy is to organize your larger team into a collection of smaller teams, and the most effective way to do so is around the architecture of your system.

The harder you push, the harder the system pushes back

When you look at the whole system, each part has a cause and effect. The harder you push one part, other parts are affected. Each one of these parts needs to be balanced against each other so that the system runs efficiently. It’s also important to step back and make sure you are solving the actual problem not trying to fix a symptom.

Balance

In this example the perceived view is that the team is moving slowly, whereas in fact they are moving at a pace that balances the system. Move fast, with stable infra is the sensible option. Use system diagrams like this to seek out counter balance to reinforcing circles.

Reference – “The Fifth Discipline: The Art and Practice of the Learning Organization” Peter Senge – Chapter 5 “A Shift Of Mind”

 

Dynamic-DynamoDB has been summoned!

Back in July, we posted about a tool for Amazon DynamoDB called Dynamic-DynamoDB that gives you the ability to dynamically increase and decrease your provisioned throughput in DynamoDB according to your current usage.

What you gain from using a tool like this, if you implement it correctly, is provisioned throughput in Dynamo that will ensure you have capacity when you need it.

Last week, we released our latest game, World of Warriors on the App Store, and, without going into the back-end systems in too much detail, we have a use case for DynamoDB in the game.

This is why we put the time into writing a Puppet module for DynamicDynamoDB which you can get from our Github account.

Since launch World of Warriors has performed incredibly. We received Editor’s Choice from Apple, and had over 2 million download in the first weekend alone, which made the game crack the top 50 grossing games in the US and become #1 role playing game for iPad in 80 countries.

As you can imagine, with statistics and rapid uptake like that you need have a good scaling. So did DynamicDynamoDB “do the business” for us over an extremely busy weekend?

We’ll let the following graph speak for itself. It shows our usage and scaling of DynamoDB from launch to yesterday.

scalingup

As you can see, our provisioned capacity compared to consumed capacity took a cautious approach.

We did this because what we didn’t want to happen was for us to not scale up fast enough and then get throttled by Amazon. Obviously this has a trade-off with cost, but for us this was acceptable thanks to what we had learned prior to hard launch.

So what had we learned?

We’d observed that altering throughput on DynamoDB tables is in no way immediate. This led us to the conclusion that to get the most out of dynamic scaling and avoid throttling you need to scale up early and do so by a significant amount.

In our case, we initiate a scaling up event when our consumed capacity reaches 60% of our provisioned capacity, and we scale back when we go below 30%. Each of these scaling events either increases or decreases provisioned capacity by 25%.

As the graph above shows, this strategy meant that at all times we had significant capacity for sudden bursts in traffic, whether due to virality, marketing, or any other sort of active promotional event that might occur.

Don’t forget to download World of Warriors on the AppStore, we’ll see you in Wildlands! Let battle commence!

Warriors_Map_Wildlandssmall

Cutting the AWS bill with spot instances

AWS has definitely changed the way we all approach infrastructures these days, especially here — at Mind Candy.

We’re finally not limited by the amount of available hardware, so we can get whatever amount of resources (well, nearly) we need, whenever we need, plus we get CloudFormations.

However, as exciting as spawning 100+ servers can be, as with many things, if you’re not cautious and smart, it can cost you a lot of money.

One way to save a bit of money on your AWS bill (and “a bit” is a serious understatement) is by utilising Spot Instances.

“Spot Instances allow you to name your own price for Amazon EC2 computing capacity. You simply bid on spare Amazon EC2 instances and run them whenever your bid exceeds the current Spot Price, which varies in real-time based on supply and demand.” – http://aws.amazon.com/ec2/purchasing-options/spot-instances/

How much can you save? Well, the c3.large instances which we use across the board for our application tier in on-demand pricing cost $0.12 per hour. When we use the same instance type with spot pricing we get them most of the time for around $0.02. That’s 6x cheaper compared to on-demand.

So what’s the trade-off? Well, if for some reason the spot-instance price exceeds your bid price, your spot reservations will get cancelled and your spot-instances will be killed. In short — your instances can and will die at random times and it’s not 100% guaranteed that you’ll get them when you want them.

That’s not good. Even if you use CloudFormations and auto-scaling as you could end up without instances when the spot price becomes too high – that could be almost the same as an AZ failure if you’re not prepared for it.

However, there’s a way to overcome that risk. In a single CloudFormation, you can create two launch configurations — one for on-demand instances and another one for spot-instances. With carefully tweaked scaling thresholds, you can make your spot-instances be preferred over on-demand instances, but still ensure on-demand takes over should spot-instances no longer be available at your bid price.

This way, if you can get spot-instances, your stack will be pretty much fully built using spot-instances. If (and when) the price goes over your bid price, spot instances will start getting killed and your on-demand instances will start booting up instead to cover the increased price. When the spot-price return beneath your bid price, spot instances will start booting up, slowly phasing out on-demand instances.

After few weeks of tests we managed to come up with a set of thresholds which work pretty well for us and keeps our stacks stable around the clock.

With on-demand, we always have a single instance running by setting the minimum to 1. Scale-up event happens when our average CPU usage exceeds 80% for a 5 minute period and we increase the on-demand autoscale group by 2 instances. We then scale down 1 instance at a time if the average CPU usage is less then 65% for a period of 5 minutes, and we ensure that a scale-down event only happens once in a 15 minute period.

With spot-instances, we also request a minimum of 1 instances but we set ourselves a bid price of $0.12 – remember, the bid price is not the price you pay, it’s the maximum you are willing to pay. Most of the time we have a spot-price cost of just $0.02!

As with on-demand we scale on average CPU in the spot-price autoscale group. However, we scale-up whenever we reach 50% (instead of 80%), and we also add 2 instances. We scale down and cancel our spot instances when we dip below 30% CPU usage.

The result is probably best as a picture from Ice (Ice is a great tool from Netflix that helps manage AWS costs). Below is the hourly cost of one of our app tiers before and after we started utilising spot instances.

Screen Shot 2014-10-15 at 11.45.23

For us, in the case of this specific stack, spot instances gave us savings up to 60%. Bear in mind the size of this specific stack is quite small (up to 10-12 instances at peak); so the bigger the stack, the more savings you’ll see!

To wrap up, I just wanted to share few tips and tricks we picked up along the way, that should help you:

  • bake AMIs; tools like Packer will greatly help you do this; this will let you minimise time required to boot up a new instances; it’ll give you much more, but the time is crucial when it comes to scale-up events, especially when spot-instances are being killed and you want on-demand instances to fill out the empty spaces ASAP. We managed to get time required to boot up a new instances down to around 75 seconds
  • use EBS based instances; they cost a fraction more (and yeah, EBS can be painful) but they’re boot time is significantly faster then the ephemeral-storage based instances
  • bid price = on-demand instance price; this way in worst case you’ll pay what you’d normally pay for on-demand instance
  • utilise reserved instances for the “base” on-demand instances in on-demand stacks
  • did I mention Ice from Netflix? Use it!

<shamelessplug> Obviously, the most important requirement is having an awesome application that is cloud-friendly. If you’re interested in building cloud-native applications and awesome infrastructures, we’d love to hear from you! ;-) </ shamelessplug>

That’ll be all folks! Happy spot-instancing!

Mindcandy is looking for an exceptional Android Engineer..

Mind Candy is looking for an exceptional Android Engineer to join the team building the worlds most exciting kids social networking app! The team moves rapidly, especially during the experimental phase of the project, and so will you. You’ll work closely with frontend designers and server engineers and members of the senior management team to design, implement and A/B test new features to make it fun for kids to connect with other kids. Culture is important to us – you’ll be humble but have huge aspirations. You’ll have a great work ethic, will thrive in a fast-paced environment and you’ll enjoy both autonomy and responsibility.

Responsibilities

Design and implement high quality, visually rich Android applications that work on a wide range of devices

Integrate with 1st and 3rd party online services for in-app purchases, analytics, a/b testing and contacts

Collaborate with other mobile engineers to identify best practices

Requirements BS degree in Computer Science or equivalent experience Professional experience building Android applications

Advantages

Experience working in an Agile environment.

Experience with static analysis tools.

Understanding of Gradle build system.

Apply Now

A Puppet module for Dynamic DynamoDB

As my colleagues have said in other posts, we make an extensive use of Amazon Web Services at Mind Candy. Recently we decided to use the AWS NoSQL offering DynamoDB for a specific use case in one of our products.

Whilst DynamoDB provides us with a highly distributed NoSQL solution, it works based on telling Amazon what read and write capacity you require via their API. If you find that you go over either of these value you begin to, potentially at least, lose queries if you have not factored in some sort of caching layer using, for example, Amazon SQS.

In the ideal world, Amazon would offer auto scaling features for DynamoDB, however at time of writing they don’t. Instead they advise people to use an independently developed tool called Dynamic DynamoDB written by Sebastian Dahlgren.

Dynamic DynamoDB is a tool written in Python that allows us to effectively auto scale our provisioned reads and writes. It use CloudWatch metrics to establish current usage and then based on the configuration option either scales up or down your provisioned capacity on a per table basis.

As I’ve posted before here, we use Puppet at Mind Candy, so the first point of call whenever a new tool comes along is to see if anyone has written, or started to write, a Puppet module for it. Sadly it didn’t look like anyone had, so we quickly wrote up our own, which is available on Github here.

Event Processing at Mind Candy

At Mind Candy we want to build great games that are fun and that captivate our audience. We gather a great deal of data from all of our products and analyse it to determine how our players interact with our games, and to find out how we can improve. The vast majority of this data consists of ‘events’; a blob of json that is fired by the client or server in response to an interesting action happening in the game.

This blog post is about the approach that we have taken at Mind Candy to gather and process these events, and scale the systems into the cloud using fluentd, akka, SQS, Redshift and other AWS Web Services.

What is an event?

From our point of view, an event is any arbitrary valid json that is fired (sent) to our Eventing service via a simple REST api.

When an event is received, it is enriched with some additional fields which includes a ‘fired_ts’ of when the event was received, a unique uuid and, importantly, the game name, version, and event name taken from the endpoint. These three together form what we call the ‘event key’.

This service is extremely lean, and does not itself expect or enforce a rigid taxonomy. It simply writes the enriched events to disk. As a result, the service is incredibly easy to scale and to achieve high availability.

Validation and processing

We then use fluentd, an open source data collector and aggregator, to take the enriched data written to disk and place it onto an SQS queue. Currently, we use a single queue (per environment) which receives data from many different eventing servers.

Once all that data is arriving on the queue, we need to do something useful with it! This is where our home grown event processing engine, Whirlpool, comes into play.

Whirlpool is a scala and akka based service which retrieves messages from SQS and processes and validates them accordingly. It uses a variant of the akka work-pull pattern with dedicated workers for pre-fetching, processing, and writing events, communicating with a master worker. The number of workers and other parameters can be tweaked for maximum throughput.

Where does the metadata for processing come from? We have a shared ‘data model’ which contains information on what an event should look like for a specific game and version. This is essentially a scala library that reads from a backing Postgres store.

The structure of that schema is (simplified):

Screen Shot 2014-07-25 at 15.49.50

An event field is a single field to be found in the json of the sent event. It has a number of different properties, for example whether it is mandatory or not, and whether it should be expanded (exploded out into multiple events), and the json path to where that field should be expected. The point of the eventversion table is to provide a history, so that all changes to all events are recorded over time so we have a rollback, as well as an audit trail for free.

An event destination configures where an event should end up in our warehouse. It can be copied to any number of schemas and tables as we require.

Whirlpool retrieves the metadata for an event based on the extracted event key. It then passes the event through a series of validation steps. If it fails at any level, the reason why is recorded. If it completes all validations, the event can be processed as expected.

The processMessage function looks like this:

Screen Shot 2014-07-25 at 16.49.28

We use Argonaut as our JSON processing library. It is a fully functional library written in Scala that is very nice to work with, as well as having the added benefit that our resident Mind Candy, Sean, is a contributor!

After our events have been validated, they are either a successful event for a particular game and version, or a failure. At this point we make use of fluentd again with a modified version of the Redshift plugin to load them into our Redshift data warehouse. Here they are available for querying by our data scientists and data analysts. Typically, the period from an event being received to being queryable within the data warehouse is measured in seconds, and in any case within a couple of minutes in normal cases.

Configuring events

To actually setup the metadata for what constitutes an event, we have created a simple GUI that can be accessed by all game teams. Any changes are picked up within a few minutes by Whirlpool, and those events will start to flow through our pipeline.

We also needed to solve one large problem with the configuration, namely: “How do you avoid having to create a mapping for every single game version when the events haven’t changed, and how do you accommodate for changes when they do occur?”

It took us a while to find a nice balance for solving this, but what we have now is a mapping from any POSIX regex which is matched against an incoming game version, to a specific version that should be used for retrieving the metadata (this is the purpose of the ‘configmapping’ table in the schema). So, when we release 1.0 of our game, we can create metadata that applies to “1.x”. If in version 1.5 we introduce a new event, we can create a new config at that point to apply to all later versions, while still having versions 1.0-1.4 processed correctly.

Handling Failure

Events can fail for a large variety of reasons. Currently there are 17 specific types of these, with a couple being:

  • The event is malformed; it does not contain the fields that we expect
  • The event is unknown

A failure is captured by the following class:

Screen Shot 2014-07-25 at 16.49.46

The FailureType here is another case class corresponding to the specific failure that was generated, and the fields contain some additional attributes which may or may not be extracted from the failure.

We treat failures separately from processed events, but they still make their way into Redshift in a separate schema. Each failure contains enough information to identity the problem with the event, which can then be fixed in most cases in the metadata; typically, event failures occur during development, and are a rare occurrence in production.

Scaling our infrastructure

We make heavy use of AWS at Mind Candy, and the eventing pipeline is no exception. All the eventing servers are described via Cloud Formation, and setup in an autoscale group fronted by an ELB. As a result, the number of servers deployed scales up and down in response to rising and waning demand.

The use of SQS also separates out our event gathering and event processing infrastructure. This means that Whirlpool instances do not have to scale as aggressively, as the queue provides a natural buffer to iron out fluctuations in the event stream due to peaks of traffic. For Redshift, we have a 6XL node cluster which we can scale up when required, thanks to the awesome features provided by Amazon.

Performance

We’ve benchmarked each of our eventing servers comfortably processing 5k events/sec, on m1.medium instances.

Whirlpool does a little more work, but we are currently running a configuration offering a sustained rate of just over 3k events/sec per instance, on c1.medium instances, with a quick ramp up time.

Instances of both Eventing and Whirlpool operate independently, so we scale horizontally as required.

Screen Shot 2014-07-25 at 16.24.54

The Future

We have real-time dashboards that run aggregations against our event data and display it on screens around the office. It’s very useful, but is only the first incarnation. Currently we’re working on streaming processed events from Whirlpool into Spark via Kafka, to complete our lambda architecture and greatly reduce the load on our Redshift cluster. We’re also improving the structure of how we store events in Redshift, based on our learnings over the last year or so! At some point when we have more time, we would also like to open-source Whirlpool into the community.

 

Advanced Scala Meetup at Mind Candy

We’re very happy to announce the Advanced Scala Meetup in collaboration with the London Scala Users’ Group. This is a new regular meet up for proficient Scala developers to share out their problems, solutions and experience to make for better code.

At Mind Candy, all our new games and products use Scala, so we’re very interested in Scala and happy to host the meetup!

For more details and to sign up see the event page: http://www.meetup.com/london-scala/events/195004482/

If you’d like to learn more about joining us at Mind Candy to work on highly scalable servers for our new wave of games, check out our Careers page for current vacancies. We’re hiring!

How I Learned to Stop Worrying and Love AWS CloudFormation

We love using AWS CloudFormation, here, at Mind Candy. Last year we moved all our cloud-based products application stacks to CloudFormations. We have learned, sometimes the hard way, how to design and use them in the best possible way for us. In this post I’m trying to summarize how we build and operate CloudFormations and what are the DOs and DON’Ts when using this technology. Throughout this post I will refer to CloudFormation as CF, to save some precious typing time.

First of all, you need to get to know cloud formation templates. This are just blocks of JSON, and as such are not really nice to edit (remember – no comments allowed in JSON). Because of that we use a helper technology – a templating tool to build CF templates. We decided to use tuxpiper’s cloudcast library (we are a python shop). You can take a peek or download it here https://github.com/tuxpiper/cloudcast. If your primary language is different than python you can easily find or write your own templating tool – it was pointed to me by a former colleague that CFNDSL is a good starting point for rubyists (https://github.com/howech/cfndsl). So lesson one is – don’t use plain JSON to write your CF templates. You will save yourself a lot of tedious time.

Once you have your first stack up and running you’ll realise how easy it is to modify and use it. But wait, what about testing the changes? That’s one of the biggest flaws of the CF technology. There is no other way to test your template than to apply it. CF does not give you a second chance – you can easily terminate/recreate your whole stack by changing of single line in your template. The good practice we try to adhere to is to test every single change in the template using different AWS account (we use separate AWS accounts for our development, integration, staging and production environments) or region, i.e. launch identical stack first in another AWS location and then perform the change on it to test if we end up in the desired state.

To make it possible to launch identical stacks in different accounts or regions one can leverage CF mappings and parameters. We don’t use parameters yet, but we use mapping heavily. That allows us to use a single CF template file to create multiple stacks in different environments. All you have to do is to define environment-specific properties within a global mapping on top of our template and then use CF’s “Fn::FindInMap” intrinsic function (actually, cloudcast does it for you). Also, use CF Outputs – they will allow you to programmatically access the resources created in your CF.

Next one is a set of more generic hints for those who work with AWS, still 100% valid for CF. First, use IAM roles to launch your stacks/instances. Let me quote AWS IAM official documentation here:

A role is an entity that has its own set of permissions, but that isn’t a user or group. Roles also don’t have their own permanent set of credentials the way IAM users do. Instead, a role is assumed by other entities. Credentials are then either associated with the assuming identity, or IAM dynamically provides temporary credentials (in the case of Amazon EC2)“.

That will make your environment more secure and save you misery of maintaining IAM users and keys. Bear in mind that once the instance is created you cannot assign it to an IAM role, so if you’re not using IAM roles yet you should create IAM role with an “empty” policy now and use it for all your resources until you’re ready to benefit from full-fat IAM roles.

Secondly, use a minimalistic user data – make it identical for your whole estate. Delegate environment/application specific settings to your configuration management system. This will just make your life easier. Get familiar with and start using auto-scaling groups, even if you’re launching a single instance (in that case you can have an auto-scaling group with minimum and maximum number of instances equal to 1). You’ll benefit from that approach later, once your service starts to scale up.

Finally, use AWS tags to tag your AWS resources. Tags allow you to do a lot of funky stuff with your AWS resources (let me only mention grouping, accounting, monitoring and reporting here).

Now, a few DON’Ts for your CF:

  • Don’t mix VPC and non-VPC regions in your mappings – CF uses different set of properties for EC2-VPC resources than for EC2-classic resources
  • Don’t ever specify resource name properties in your CF template. Using auto-generated names makes your stack easily portable. Thus, you can copy your existing stack to another environment or launch a completely new stack (say your canary stack) using the same template. Also some of AWS resource names need to be globally/regionally unique, so defining a name in your stack is not such a good idea. Finally, virtually any resource which allows you to set its name will require replacement on update – just imagine your whole stack relaunching from scratch when someone comes with a clever idea to rename resources in line with a new naming convention or a new product name?
  • Don’t use existing (non-CF built) AWS objects in your stack, if you can. Using existing resources also makes your stack non-portable. A lot here depends on the use case (i.e. we have a couple of security groups which we use in our stacks, but even then we provide their names/ids in the mappings or parameters, rather than using them directly in resource declaration).

Know your limits – CF is great orchestration tool, but it has its limits. You cannot create or update some AWS resources (e.g. EC2 keypairs). You cannot self-reference security groups in their definitions, which sucks (how do I open all my cassandra nodes for inter-node communication on port 7001 within the CF?). Stacks are difficult to maintain, as there are no incremental changes. For the above and other, obvious, reasons – don’t forget to source control your CF stacks (we have a dedicated git repository for that).

Finally, the last, and maybe most important, point – separate your applications into multiple CF stacks. One can easily get excited about CF and create a single stack for the whole application (network, databases, application servers, load balancers, caches, queues and so one). That’s not a good idea – you don’t want your database servers to relaunch when you decide to modify the properties of the auto-scaling group for you application layer. The solution is simple – create multiple CF stacks for your four application stack. Make your database layer a separate CF stack, then your distribution (app server auto-scaling groups and ELBs) a second CF stack and so on. This will give you the flexibility of CF without taking a risk of unwanted service disruption, due to CF stack update (been there, done that…). It’s very tempting to create very sophisticated CF stack, with many inter-dependent components, but I cannot stress enough how important is not to do it.

What’s next?

We are all the time looking to improve our stacks and processes, so definitely we are only at the beginning of our CF journey. One of my colleagues is looking at another CF templating library (https://github.com/cloudtools/troposphere) to help us automate our processes of CF creation even more. We will very likely start to protect our CF resources in production using stack policies soon. We will start working with CF parameters and dependencies more to make our templates 100% independent of our account/regional settings. Finally, we need to research if Custom Resources are fit for our purposes.

Great British Summer Game Jam

https://www.facebook.com/GBSGameJam/timeline

Ready… Steady… JAM!
Autodesk and Mind Candy are combining forces to host an epic GAME JAM at Mindcandy’s famous central London HQ.
Keep a very British stiff upper lip whilst you get to grips with the best Game Developer technology in the business, such as Autodesk MAYA LT and Scaleform for Unity, and be ready to get your next game discovered as you put your skills to the test in this year’s most exciting game jam!

Tell your friends, form your team, learn the tools- then compete to win!
The full agenda of speakers is still to be announced, but make sure you save the date and stay tuned to Twitter (@GBSGameJam) and right here on Facebook, as the #GBSGameJam is not something to be missed!

Working from home in AWS (with access to everything)

Ever since we started moving parts of our services into EC2, we’ve been faced with a growing problem. It’s important that our team can access nodes directly in a troubleshooting situation, even at 3am from Poland if necessary. With resources in AWS, this can mean that before you log into a node, you first have to log into the console (with two-factor auth), find a relevant security group, find your public IP, and then give yourself access via SSH. This can make troubleshooting among nodes in Amazon take much longer than it should. So, we’ve been trying out a different approach. We already have a mechanism for authenticating our employees when working from home – our VPN. We can be reasonably confident that Cisco have done a good job at making this secure, and we’re able to assume that anyone who successfully logs in is probably worthy of access to other resources off-site, in EC2. So, we developed a tool which checks the VPN session database every minute, by connecting to one of our Cisco ASAs and extracting the username, tunnel-group and IP address of all logged-in users. In this way, it’s possible to manage security-groups in EC2, such that you can automatically give users access to the resources they need, based upon their tunnel-group on the Cisco. Essentially, we now no longer need to think about remote access; it just works. There are two components to this system; the first polls the ASA for information every minute, and creates a hash containing user information. It then sends this to an HTTP endpoint, where the correct security groups are updated. The latter part of this is embedded in our ‘mission control’ system, but is really quite basic, in that it simply uses boto to create security groups based on tunnel-group names, and keeps track of when the user was last seen on the VPN in a simple database, so that inactive users can be removed from the groups. The Cisco part is perhaps a little trickier, so we’ve put this on GitHub, in case it comes in handy. You can find it here:

http://github.com/mindcandy/gatekeeper