My First Puppet Manifest

A few months back I was tasked with deploying Uptime which is a remote monitoring application using Node.js, MongoDB, and Twitter Bootstrap. The reason behind wanting to use Uptime was to gain a greater level of data around the uptime of our internal infrastructure and systems for retrospective viewing.

The great thing about this task was that it gave me the opportunity to build it all via Puppet and really understand the workings and best practises around Puppet.

 

Implementation

We started off with the essentials which was creating a node manifest for the Uptime server and then looked in Puppet Forge for relevant modules that we didn’t have already in our repo. Once we had found the necessary modules we included them in (such as MongoDB and NVM) and then looked to grab the Uptime repo from GitHub.

With all the necessaries installed we could then look to configure the app via a yaml file and drop it in. To further automate the application we used a pre-existing service script and modified it to startup the app on boot. At this point we now have a working app that is accessible and useable, however we wanted to apply some form of authentication to access the application.

For this layer of authentication we decided to use Apache along with the ldap, proxy and ssl modules to then utilise our current LDAP and provide encryption. We installed Apache and configured an uptime vhost with a proxy to the box locally as we have set Uptime to be accessed via localhost only. Once the config file was dropped onto the box and Apache was running all, requests were being redirected to the app via localhost over ssl, and then being prompted for their credentials before accessing the Uptime dashboard.

 

Learnings

While the above is terribly simple it was a fantastic opportunity for myself to learn about Puppet, automation and best practises. There’s a few things I learnt or ideas that were reinforced during the whole process, so take a step back guys and girls as I’m about to drop some knowledge:

 

NIHS

Not invented here syndrome is an issue we’ve all faced before, whether it’s ourselves or a colleague of yours. We’ve all been there, sucking air through our gritted teeth while muttering “I wouldn’t have done it that way”. While there may be a case for completely re-doing something, in the majority of situations it’s just not necessary.

For this particular project the NIHS came into module usage as we have plenty of modules (self made and community made) at our disposal. Instead of creating my own or making a rather convoluted manifest/module I decided to go with ones we already use or popular community written ones, which helped speed everything up and hopefully lessen any potential tech debt.

Automation

While you want to automate everything as much as possible not everything is worth ripping your hair out over and therefore wasting time. From bringing up a new VM and Puppetising it, the current node manifest will automate the entire installation and setup of the application except for the installation of the app dependencies. Even Though I tried to automate this, it became a blocker and while it is most definitely a solvable solution I decided not to concentrate on this.

My reason being is that it would only have to be run the once and this would only be during the initial installation of the app. There is no reason for large amounts of time to be spent on this when it’s only a task that is to be done the once, and in the event of a rebuild, the hours spent solving this compared to the 5 minutes doing this quite simply doesn’t match up. Automation is there to remedy laborious tasks and free up time, not soak up your time even further.

Documentation

Aside from the automation aspect of Puppet it’s the documentation you receive from it as well. By writing out my node manifest accompanied with notes, any member of the team can look at my code and figure out exactly how it’s configured and what each bit does. This also helps with any debugging and, if you were to look back at this in future remember how you pieced it together.

Simplicity

When things are over engineered they become harder to pick apart when problems arise and typically are more prone to to go wrong (from my personal experience). By keeping your manifests simple and modular you can chop and change bits out of it without breaking the entire thing. My personal opinion is that simpler is better, as there is less to go wrong. That is not to say that complexities are avoidable but try and keep them down to a minimum.  

Re-use Code

There are plenty of modules and manifests that have been written by my colleagues which I tend to delve into to re-use snippets of code. As mentioned previously, NIHS just isn’t necessary as these previously written working bits of code can be used for whatever you’re doing. It will save you time and stress, as someone’s done the legwork for you. Don’t be too proud.

Vagrant

When I was writing the manifest I used Vagrant to allow me to test my changes locally without constantly pushing to live. It gave me the opportunity to trash and rebuild the box within minutes and test the automation side of the manifest. I was able to make quick and drastic changes without any risk of upsetting the live puppet repo. Any changes I needed to make I could verify them within minutes. For me it’s an invaluable tool and it’s my go to software to safely test.

 

Reflections

Over the past few years I have been exposed to Puppet, but never really delved that deeply into it. Here’s some top level Pro’s and Con’s I’ve cobbled together out of my experience so far:

 

Pros

  • Puppet allows for automation of your nodes
  • There is a Puppet large community due to the market share and customers base and the level of documentation and support is vast.
  • The ability gain documentation through configuration is fantastic.
  • Securely store and transfer passwords via hiera
  • Puppet supports many platforms out of the box
  • Open source
  • Supports both Puppet and pure Ruby when writing your manifests and modules.
  • Clear and understandable errors on Puppet run fails for debugging.

Cons

  • Introducing new OS’s to if you haven’t allowed for OS agnosticism can be a nuisance if you haven’t initially written manifests/modules with this in mind.
  • Repos can become messy overtime but that comes down to housekeeping more than Puppet
  • Shouldn’t use Puppet for large file transfers
  • Mismatch of Ruby and Gem versions can be a colossal pain to fix especially in terms of Mac OS X agents.
  • Without version control you cannot see what was previously applied to the box.

 

Even though my time with Puppet and configuration management/automation software has been limited, I’ve now caught the bug and want to automate all the things. If you haven’t looked into Puppet or it’s cousins I’d definitely recommend it. Puppet won’t solve all your problems but it’s a good start.

 

Open source is the best source

As always this wouldn’t have been possible without the fantastic open source community.

 

London Python Dojo Returns to Mind Candy

Last night saw the return of the London Python Dojo to the Mind Candy office. For those who are unfamiliar, the dojo is a monthly meetup for python enthusiasts that really covers the full range of “What’s Python?” to “I’ve been using Python for 24 years”, where we meet and come up with ideas/scenarios/problems that can be solved with Python in a couple of hours (usually after some refreshments).resized1.resized2

Ideas are written on a whiteboard an then voted on for what to project to take on for the evening. We then break of in to teams and code for about an hour or so and then each team presents what they’ve done and the ideas behind what they were trying to accomplish (even if you don’t have working code).

resized3(Last nights board)

Last night’s task was to implement battleship logic/strategy which then could be played against other teams. You can see the (unofficial) winning team’s code here.

I am really biased as I was in fact a part of Team 1.

Overall a very good evening in good company creatively coding. (as it usually is).

For more information on the London Python Dojo check out http://ldnpydojo.org.uk/ and follow @ldnpydojo on twitter.

You can also join the Python UK mailing list here.

A DevOps Journey

Over the past few years Mind Candy has gone through a DevOps transformation. We did this because we knew that we had to improve the delivery of our products and we knew that where we wanted to be involved having the following three things in place.

1. Shared goals and practices by aligning our different teams.
2. Unified tool sets, again we needed to align around a common set of tools.
3. Collaborative learning – knowledge sharing was and remains vitally important to us.

Obviously, achieving something like this cannot happen overnight. It had to be an iterative process just as software development is, and its starting point required changing the mindset of people across the teams so that we began to do DevOps.

These are some of the practical things we did on that journey.

Familiarity doesn’t breed contempt

In Aesop’s fable of the Fox and the Lion, we’re taught the moral that familiarity breeds contempt. However, in an organisation trying to transform towards a DevOps way of thinking we turned the fable on its head, acknowledging that it’s not familiarity that breeds contempt but separation does in the form of silos.

For us this didn’t mean that we needed everyone to know or be familiar with everything about everything. Unicorns don’t exist. What it meant was making our physical working proximity closer. It’s pretty amazing how, when different teams can hear each other – from Dev through to QA and onto Ops – how much more readily they interact and collaborate organically.

We found that technical decision-making became a much more shared process. Closer working environments encourage greater mutual support between teams.

It’s good to talk

Email is a wonderful thing. Instant messaging and relay chats are even better once you’re in a good DevOps place. However, if you’re trying to shift attitude and thinking email is not a substitute for getting up and talking to someone or having a phone/video call.

It might not always be possible across timezones, but it doesn’t take a genius to realise that intonation can easily be lost in the written word even if someone uses a emoticon.

The slowest and most problematic IT organisations I’ve known have tended to be ones where everyone hides behind email, resulting in bubbling tensions, and often leading to escalation and wars over who can CC the most senior people in. Change is able to be effected but only based on who has the loudest shout or clout.

Meanwhile, the best and least problematic IT organisations tend to be the ones where different functional teams not only sit physically close to each other but where they also walk across the office to talk to each other instead of sending snippet of easily misinterpreted text over the Internet. Obviously when you have no choice you have to use electronic communications, but when you don’t need to you probably shouldn’t.

Investment in knowledge pays the best interest

When you look up a typical DevOps venn diagram online, it will be one where DevOps sits as the joined intersection of Dev, QA and Ops. Acknowledging this intersection is crucial in moving an organisation’s mindset. The intersection represents all the things that you do that have a shared interest and investment in them. This is the place that you need to align across teams.

Take code deployment as the classic example.

During any software cycle, each team will deploy to different environments and it’s highly likely that there may be differences in the process due to the scale of environments, whether they operate under SLA, or under any internal governance controls like change management.

The tools used to deploy, and the process followed are an excellent starting point in any DevOps transformation. They not only encourage collaboration between teams, but also enable you to unify your toolset under known standards, something we have done at Mind Candy that I blogged about previously.

This has empowered tech teams to collaborate on a shared interest and shared investment, whilst also carrying a shared responsibility for its maintenance. The tool is as much a “product” as the product that it ships.

The net result of this investment is that code deployment becomes so trivial that it widens the scope of who can “push to live” to pretty much anyone. This shouldn’t be mistaken for anyone should (or does) deploy to live. That would be silly. Rather it should be seen in the terms that a robust deployment process can eliminate the lone rock star engineer being a single point of failure.

As Mazz Mosley said at Monki Gras 2013 when talking about how GDS built gov.uk, “rockstars are not webscale”.

This approach doesn’t negate strict change control and governance in the organisation (if you have it). It simply removes blockers from your delivery pipeline. Thats a win for the business as much as it is a win for those who have shared and gained knowledge through collaboration.

Devs as Ops and Ops as Devs

Once we had shared ownership and responsibility of tooling like deployment spanning across teams in the organisation it was clear that the reality of the DevOps intersection is one where Devs are Ops and Ops are Devs

This doesn’t mean that either team does the others job. This is not the full stack unicorn. Sysadmins are not dead and nor are developers, It just means that where the things they do have alignment they can learn from each other.

Take the traditional sysadmin position. They will often be quick to tell you that they’re not a developer. They may even say it with a sense of disgust that you even dared to ask the question. The sad truth is that they’re actually in denial.

They might not like it, but when writing short scripts, or declaring something in a configuration management system, they are developing, and, as the saying goes they’re doing “infrastructure as code”.

The only difference really is that frequently they have made life hard for themselves by lovingly hacking systems and creating the snowflake server. It’s great for job security of course, but it’s terrible for the business – rock star ninja single points of failure again.

At the very least they need to be using some sort of version control for the infrastructure, and what is version control if not a development tool? However, it’s not just in the tools that your Ops can be more like Devs. There’s the working practices too.

The Ops team had already been using Kanban to prioritise work weekly. Whilst this worked to a degree the team still had an ever growing backlog of tickets and requests, and what went on the Kanban board each week still contained a considerable amount of reactive work.

We decided, as a team, that we would take our workflow a step further and apply more development principles to the management of our ticket queue. We decided we should align ourselves with our colleagues and move towards a greater form of Agile along scrum lines. We would start using sprints, planning, backlog pruning and prioritisation.

We began to work through our backlog by opting for two week sprints. We introduced sprint planning, and started to commit to a certain number of story points (issues) for the sprint, and, barring any major issues or emergencies (which we left slack for) we would stick to the committed work and do nothing else.

The impact of what was a pretty small change was huge. It took a few sprints, but, as our different product teams (who were all also doing sprints obviously), became aware that we working in the same way as them, emergency work and high priority issues out of the blue gradually declined.

Obviously it’s not always like that when you’re supporting live services as well, but, by aligning our working practices with our primary internal customers, there became a greater appreciation of how our backlog could be impacted just as theirs could be by altering the scope of the sprint.

This was indirect collaboration born on the back of working in a more aligned way with our peers. Our backlog went from over 100 tickets to less than 40.

Meanwhile, as we in Ops were being more like Devs, we started to share some of our Ops roles with Devs with a little help from our a friend called Canbot.

ChatOps sets you free

Candy Bot, or Canbot for short is our in-house name for Github’s Hubot. It sits in our dedicated Slack channel #chatops and when not providing us with amusing animated cat images he/she does things for the Devs and for Ops.

Canbot can tell us where servers are. This is vital as we use AWS so the environment can be fluid and dynamic. Canbot can deploy config changes for the Devs to each environment, including to live and it’s all totally transparent.

If someone changes the code base in our Puppet infrastructure then Canbot will tell #chatops about the commit and who did it. We also opened up the Puppet repository to the Devs and some of them change it every now and then. Shared responsibility after all.

Canbot can also execute commands on our infrastructure, but when it does it is never in secret. Transparency is the key feature here. What Canbot can do is also open across the teams for development. Primarily it is Ops that play with him, but there is nothing stopping a pull request from others internally.

Canbot has allowed our Devs to be a bit more like Ops. They can orchestrate production without having to have ssh access and it can be audited. No more tickets asking for information about production.

Embrace failure

Failure is an opportunity to learn, it is not an opportunity to point a finger of blame and start shouting at someone. DevOps mindsets should see each failure in these terms. Iterate the failure and eliminate it with either better toolings, better documentation or better gated processes.

When we celebrate failure we do it with KrispyKreme donuts!

Encourage Tech Culture

Most of the people that work in tech love tech. Few of us see our jobs as a mere means to an end. If you encourage your technical teams to collaborate with learning sessions too you can create a greater sense of being “one team of many disciplines” rather than single teams doing only one thing.

At Mind Candy we hold regularly weekly book clubs open to whoever wishes to join, where we go through a particular book on a technology matter. We also have Guilds where we present and share what we’re working on between teams.

Additionally we use our office as a host location for MeetUps across tech businesses. Next month we’re hosting a London Virtual Reality meetup. Sharing should not always just be in-house after all.

Wrapping things up

Obviously the list and experiences above are not exhaustive. There are so many little things that an organisation can do when adopting a DevOps approach. What’s important is to realise that you change the mindsets first and then you iterate and encourage greater collaboration. Once an IT organisation realises that it relies on mutual support to sustain itself change can come about quite rapidly.

Utilising AWS Lambda to migrate 25,000,000+ images S3 bucket

When AWS announced AWS Lambda at last year’s re:Invent, we were really excited about it here at Mind Candy. The concept of a zero-administration compute platform, that is very scalable, cheap and so easy to use AND at the same time integrates with so many AWS services through triggers is pretty exciting and potentially – very powerful.

Since then, we started using AWS Lambda in some of our products – PopJam being one of them. We use it to near-instantly generate thumbnails of all the amazing creations users of PopJam share through the app.

Recently, a quite interesting story surfaced on our sprint – we were to migrate one of the AWS S3 buckets PopJam uses, from US to EU (to bring it closer to the backend and users) without any downtime for users.

Now, you’ll think – “why that would be interesting?”

The answer is – 25,000,000+ – scale of this task.

The aforementioned AWS S3 bucket stores over 25,000,000 files (mostly images) and this number is growing faster every single day. Just running ‘s3cmd du’ on the bucket, took almost a day. When I tried to perform ‘s3cmd ls’ to count the number of keys in the bucket, I got bored before it finished (I had to write a simple Python script that utilises multi-processing and split the process of counting into 256 threads; only then would it finish within few minutes).

Obviously, any form of existing CLI command like s3cmd sync or AWS CLI s3 commands is out of question as before it finishes (after many, many hours), the source bucket will have tens of thousands of new files which haven’t been copied across and we’d have to re-run it again which would lead to the same situation.

I mentioned, AWS Lambda functions can be triggered by other AWS services; one of them being AWS S3. Essentially, we can configure an AWS S3 Bucket to invoke a Lambda function whenever a new object (key) is being created.

Given this, we could create a Lambda function on the old bucket that will be triggered whenever a new key is created (ObjectCreated event) that would copy over new keys to the new bucket. Then, we’d have to only sync the old bucket to the new one without having to worry about missing some keys on the way.

The proposed plan looked like this:

  1. Create new S3 bucket in EU
  2. Set up AWS Lambda Copy function and configure it to be triggered whenever a new key is added
  3. Run aws s3 sync command in background
  4. Wait, wait, wait…
  5. Reconfigure CDN to use the new bucket as origin
  6. Switch backend application to upload all images from now on, to the new S3 bucket in EU

This plan, also meant there should be zero downtime during the whole migration. Everyone likes zero downtime migrations, right?

The actual implementation, while not very painful, did uncover a few issues with the plan that had to be dealt with. These issues resulted in some learnings which I wanted to share here.

AWS Lambda copy object function

The Lambda function code to perform the copy happens to be pretty trivial.

var AWS = require(‘aws-sdk’);
var util = require(‘util’);

exports.handler = function(event, context) {
        var s3 = new AWS.S3(options = {region: “eu-west-1”});

        var params = {
                Bucket: ‘popjam-new-bucket’,
                CopySource: event.Records[0].s3.bucket.name + ‘/‘ + event.Records[0].s3.object.key,
                Key: event.Records[0].s3.object.key,
                ACL: ‘public-read’
        }

        s3.copyObject(params, function(err, data) {
                if (err) console.log(err, err.stack);  // an error occurred
                else     context.done();  // successful response
        });
};

It just works, but there’s one small culprit…

… what happens to S3 object ACLs should they be changed in the meantime?

We needed ACLs for particular objects to be in-sync (for various reasons, one of them being moderation).

Given the AWS Lambda function is triggered on ObjectCreated event (there sadly isn’t a way to trigger it on ObjectModify), should you need to change ACL there’s no way to do it through AWS Lambda at this stage.

We worked around this problem by writing a Python script that basically iterates through the S3 buckets, compares ACLs and tweaks them if there’s a need (as before, we had to parallelise it otherwise it’d take ages).

Beware of AWS Lambda limits!

While being pretty scalable, AWS Lambda has got some limits. We were bitten by the “Concurrent requests per account” and “Requests per second per account” limits a few times (fortunately we did just enough with AWS Lambda to get the attention of AWS Lambda product team and they kindly raised these limits for us).

For most of the use cases those limits should be fine, but in our case, when on top of the AWS Lambda copy function we were also triggering a series of functions to generate thumbnails, we hit these limits pretty quickly and had to temporarily throttle our migration scripts.

AWS Lambda is still pretty bleeding edge technology

AWS Lambda will work great for you most of the time. However, when it fails, troubleshooting can be quite … inconvenient to say the least.

Remember you can now view all AWS Lambda logs through CloudWatch – make use of them and don’t shy away from placing debug statements in your code.

The deployment of AWS Lambda is pretty tricky, too. While there are some options, it’s still in early stage and it feels like even AWS is still trying to figure it out (potentially through feedback from customers – if you use AWS Lambda do make sure to feedback to AWS).

The most interesting tool that I found out to support deployment and integrating with AWS Lambda in general is kappa

And all of this for what?

Let the graph speak for itself…

(the graph represents upload time to S3 bucket in US – green line, and S3 bucket in EU – orange line – after migration)

Office Music

Office music, some love it and some hate it. While I’m in the camp that’s for office music I can completely understand why some might not be in favour of it.

We here at Mind Candy find music in the workplace to be a mood enhancement, and in a way a bonding process. You find similarities between yourself and your peers and generate links that weren’t there previously. Music helps reduce those awkward silences filled with keyboard tapping, mouse clicking and the odd coughing fits, and introduces an atmosphere which is indusive to the culture we look to nurture and promote. There’s a great few articles out there which go into greater detail about whether music in the workplace is a good or bad thing, some can be found here.

Last year we started looking into a solution for playing music for the area in which our team sits, after some search engine fu we found Mopidy. Mopidy is an extensible MPD and HTTP server written in Python. Mopidy plays music from your local disk and radio streams while with the help from extensions, you can also play music from cloud services such as Spotify, SoundCloud and Google Play Music.

As we already have a few Spotify accounts we thought we’d toy with the idea of using Mopidy to play music from Spotify. In order to use Spotify you also need to use the Mopidy-Spotify extension.

Once we had both Mopidy and the Spotify extension working we then needed something to interact with it all. After looking through the Mopidy documentation we came across the web extensions section which suggests various web interfaces to interact with the HTTP side of the Mopidy server.

Initially we used Apollo Player. Apollo Player’s great as it allows anyone to log in using their Google Apps or Twitter credentials and then add music to a one time playlist meaning anyone can choose what music is playing. There is also a bombing feature so any music that’s been added can be skipped if bombed by three people. When no music has been selected it will default back to a playlist set in config.js which is found in the root directory of Apollo. The problem there is that once the default playlist has been played for the umpteenth time it can get pretty tedious and only people with access to the app’s root directory can change this. This led us to Mopify.

Mopify gives you much of the functionality that the Spotify client gives you e.g. Browse, Featured Playlists, New Releases, Playlists and Stations. You can log in with your own Spotify account or use the account that Mopidy-Spotify is utilising and use the playlists associated with either account. It gives you greater functionality and options than Apollo but then you lose the collaboration and unmanaged element you had with Apollo.

Finally we then needed to actually run Mopidy on something as it was no good having it run from my local machine. We decided to use a Raspberry Pi and plugged it into some speakers running along the cable trays above our heads. The Rasberry Pi is running Raspbian with Mopidy, Mopidy-Spotify and which ever web extension we’ve chosen. Another Raspberry Pi with Mopidy has been set up as a jukebox in our chillout/games area which works really well with Mobile devices due to most of the web extensions being bootstrapped. This gives employees the flexibility to easily play whatever music they feel like when they are in the communal area.

In our eyes, while music in the office isn’t a necessity, it is definitely beneficial, and it’s fantastic that all these open source tools and products give us the ability to do this.

And lets be honest, who can’t resist an impromptu sing along to Bohemian Rhapsody!

 

Mopidy – Extensible music server written in Python

Mopidy-Spotify – Mopidy extension for playing music from Spotify

Apollo Player – Mopidy web extension

Mopify –  Mopidy web extension

Raspberry Pi – ARM based computer running under GNU/Linux

 

Pi-tomation

Screen Automation – Selenium (and some other stuff), meets Raspberry Pi

Lets set the scene, you need to display some stuff on a screen so everyone in the office can see it. Easy, you mount a couple of TVs on the wall get a dvi-splitter and an old mac mini you had in the store room on the top shelf behind a roll of cat5 cable.

Set everything up, get the mac mini to auto login and mount a shared drive, then run a little script that uses selenium to open a browser and show pre-determined images of the stuff want to display, all stored on the same shared drive, done….

Fast forward a couple of years and you now have a lot more to display on a lot more screens, but what are you going do? It’s impractical – and expensive – to buy a bunch of mac minis just to run a script that opens a web browser. The end goal of all this is to have dashboards that are easily manageable by their respective teams.

 

Challenge Accepted

 

Have you heard of this new Raspberry Pi thing. Its a small ARM PC that’s the size of a credit card, and they’re cheap. What they’re also USB powered? Bonus now we can just power them from the TV itself and when the TV comes on the pi comes on. Now we just replace the mac mini with the pi and run the same script when it boots and we’re all done. Wait not so fast, the share isn’t public so we need a credentials to connect. That’s OK we can store them in a file locally and use fstab to connect. Yeah that works but we want to display different things on different screens so now I have to create different scripts and manually tell each Pi which one to use. OK that’s not too bad, the first time you set up each one just point it to the script it needs to run and then you can just update the script and reboot the pi. So far its shaky but it works, sometimes. One of the problems was that sometimes it would try to run the script on the network share before it was mounted properly and also running a script or (multiple at this point) over the network on a device with the processing power of about 7.4 hamsters isn’t really going to cut it. I’m getting tired of crowbarring fixes into something that wasn’t really designed for this use and troubleshooting seemingly random issues.

What do I actually want to accomplish here and how am I going to do it??

  1. Have the script run locally, its only managing a web browser after all.
  2. Config easily changeable and centrally managed.
  3. Get the pi to check for new config on startup.

Done, yes that’s it pretty simple, so here’s what I did.

Ingredients

  • bash script
  • json file. Lists the pages that the web browser should visit. Could also be local files loaded into the browser images etc.
  • python script. Loads the json ‘config’ and specifies how long each page should be displayed etc and does a bit of error checking.
  • Git (or other) repository

Method

Edit your rc.local to run a bash script that lives somewhere locally on the pi. eg /opt/scripts/ The bash script downloads selenium, firefox (actually iceweasel on debian) and facter (so we can get info really quickly)

I did consider using puppet for this whole thing at one point but that was a bit of overkill plus it had its own complications at the time try to run on on an ARM processor)

The bash script also uses facter to determine the mac address of the pi and remove the colons. (I must admit that facter may be a bit overkill here as well but hey, I’ve gotten used to having it around). It then searches your webserver (or other location) for files carrying its mac address as a name, ( I have a set of defaults that it uses if none are found). Have your webserver run a cron that pulls the repository of all your files. You could have each device pull the repository directly but the more screens you have the more inefficient that will be as you’ll be storing a whole repo on the pi just to get at 1 or 2 files. you could also have a web hook that only updates the web server when there are changes to the repo but I didn’t think it was worth it at this point. The json is self explanatory.

You can take a look at the principle here.

https://github.com/mindcandy/pi-screens.git

Plans for the future of this project includes a self service dashboard that will take the ingredients and mix them with the right config without the user necessarily having any coding knowledge.

HTML5 games in Mind Candy

HTML5 games in Mind Candy

HTML5 Games has always been a bit of a grey area, with the decline of the Flash Platform it still felt like Web Technologies were lagging behind what the Flash and Unity Player could do in the browser.

Over the last year or two this has all changed, since Steve Jobs declared war on Flash it’s been a bit of a bumpy ride but with companies such as Google, Mozilla, and Microsoft all getting behind HTML5, W3C finally declared the standard as ‘complete’ it suddenly feels like the technology has grown up.

HTML Games have also grown up, with Nintendo partnering with Unity and ImpactJS for their Web Framework, as well as the BBC and Nickelodeon investing a lot of money in to converting their existing Flash games to create new and exciting experiences for users on a wider range of devices.

Here at Mind Candy we always want to push things and try new technologies, however we also feel like whatever we do try has to work in a real world scenario and while HTML5 has been around for a while, we’ve never felt it a good fit for us until now.

Why HTML5?

With PopJam growing as a platform we always wanted to deliver games to our audience, however with the App store submission times releasing content frequently making the games natively within the App was completely out of the window, also having to support multiple platforms we needed something that was write once and deploy across all, this is where HTML5 came in for us.

Cross Platform

One of the huge benefits of using HTML5 was that it is truly cross platform, and while the performance of native will always be far greater, porting the games over to each platform would’ve destroyed us as a team.

When starting out with HTML5 we instantly noticed that even though we were cross platform, there were still hoops we needed to jump through to make things work in the way that we wanted, the main pain point being audio.

As we were targeting a mass of devices we needed to make sure that our games worked on all resolutions and inputs worked as expected, however it felt that once we’d broken this barrier we’d be okay.

iOS provided a UIWebView we could use out of the box, however we decided to use the Crosswalk Project for Android as it allowed us greater control than the one that comes built in to Android.

Fast Iterations

Using HTML5 means we were not bound by the App Store restrictions, meaning we can push new games and updates out incredibly fast. It’s not only deployments that are faster either, on of the most powerful things with making HTML5 games is that it’s a link to a page and the game can be played.

With JavaScript there is no compilation time, and you can debug in real time within the browser. This also meant we could develop some pretty cool in house tools that would speed up the development of our games and systems, whilst running within PopJam.

Starting out

One of the things with making HTML5 games is that there are so many things that need to be considered, such as asset loading, memory management, input, physics, 3D, 2D, animations and many more, we had to decide on the best way to deliver our games in the most optimal way possible.

On top of all of these decisions there are also multiple ways to render content within the browser:

  1. CanvasStarted as an experiment by Apple, is is now possibly the most widely supported standard for generating graphics on the web. Using canvas also eliminates a lot of cross compatibility issues that other methods may have. Performance tests on both iOS and Android worked out quite well for us.
  2. WebGLWebGL offers hardware accelerated graphics within the browser and on mobile is really still early days, while iOS implemented full support it still comes with some very interesting edge cases. Android support for WebGL is very different world as we found out when targeting low end devices.
  3. Divs / CSS TransitionsThe method of updating divs that are rendered on the page is an interesting one as it allows for nice affects using CSS3 transitions, however the lack of support across mobile browsers and different versions of mobile operating systems was a problem.

We tried all of the above methods and ultimately we ended up utilising all of them, it really came down to the content that was being presented to the user. We used WebGL where we could, and anything that didn’t support it we fell back to Canvas.

Anything that had relatively simple content we ended up manipulating divs and using various methods for transitioning elements to fix cross compatibility issues.

Choosing a Game Engine

One of the things that stood out when looking for a game engine is that there is a lot of them, and not only engines, there are also products out there that known as ‘Game Makers’ allow you to make games with little to no code such as Construct, Game Maker, and Game Salad. If you’re looking for something to try I can highly recommend this website.

We actually tried a couple of different engines, as well as allowing people who weren’t developers to use the ‘Game Makers’ to prototype ideas and test performance.

After evaluating our choices we decided to use Pixi.js from Goodboy Digital, an incredibly lightweight engine that offers an ActionScript like API as well as many other features such as:

  • Asset management
  • Multi Touch for Mobile
  • Sprite Sheet support
  • Full Scene Graphs
  • Third Party Libraries (Spine, Tiling)

it also allowed us to toggle effortlessly between Canvas and WebGL to allow for support on lower end devices.

Another thing that Pixi has is thorough tutorials, incredible documentation and a very active community which goes a long way when choosing something like an engine to use be it for games or software in general.

At the time of writing this article, Pixi have just announced v3 of the engine, and have provided a benchmark test to show off the performance. I would strongly urge you go check it out, even on a low end device it’s pretty impressive.

Tools

One of the things that came as a breath of fresh air when venturing in to the world of JavaScript is how far it’s come since I last used it, for the last few years I’ve had my head firmly planted in AS3 and Unity with C#.

With tools such as:

  1. YeomanYeoman allows you to start new projects, choosing from hundreds of generators that have been created it, you are able to scaffold new projects quickly whilst prescribing best practices and tools.
  2. BowerThis is one of the most lightweight package managers along with NPM I’ve used in my career, allowing us to manage dependencies across projects effectively and also allowed us to keep our repositories incredibly small.
  3. GruntUsing Grunt as our build system was one of the best decisions we made, allowing us to move incredibly fast when building our games, and automate a lot of tasks that done manually would’ve been incredibly laborious.

We were able to create a solid work flow from starting a project to releasing our content on to PopJam.

As well as using these tools, the JavaScript community is an incredibly talented community with a lot of libraries out there to use that help in every day web development.

It’s not all rosy

As amazing as things have been making games over the past few months, it has not been without its headaches and hair pulling moments, but this is why we love what we do, right? If it wasn’t a challenge then it would be boring.

Targeting multiple platforms comes with its own problems, however some of the biggest problems we had was with the hardware on Android, as there are a lot of cheaper low end devices that are prime for parents to buy for their children we encountered devices claiming they supported certain features however when running in the browser would crash the PopJam instantly leaving us in a state of flux and no logs to go on. We found a lot of this came down to the chipsets that the cheaper devices use.

It wasn’t only Android that caused us problems either, with the iPod Touch 4G being one of most used devices amongst children and some only supporting iOS6 this left us not being able to push performance as much as we wanted, as well as the iOS6 UIWebview implemented being very temperamental about what standards it supported.

The one thing that caused us the most headaches out of everything though was Audio, HTML5 Audio is still very limited and even more so on some of the cheaper devices with some only supporting the WAV format which means larger file sizes, any other format used would cause the whole application to crash as no other codec was available. It is recommended to use the

Conclusion

We’ve had some amazing fun creating some interesting games for PopJam using HTML5, not only because we got to make games but we also got to build some awesome internal technology and tools, create a pipeline from concept to production in just a few months, and most importantly we got to create some engaging experiences for our PopJam users.

 

 

 

Deployments using All the Things!

As we’ve mentioned in previous posts, we use AWS services extensively at Mind Candy. One of the services that we’ve blogged about before is CloudFormation. CloudFormation (CF) lets us template multiple AWS resources for a given product into a single file which can be easily version controlled in our internal Git implementation.

Our standard setup for production is to use CF to create Autoscaling Groups for all EC2 instances where, as Bart posted a while back, we mix and match our usage of on-demand instances and spot priced instances to get the maximum compute power for our money.

During load testing of the backend services of our games we did, however, notice a flaw in the way we’re doing things. Essentially, this was the speed with which we could scale up under rapid traffic surges, such as those generated by feature place in mobile app stores.

The core problem for us was that our process started with a base Amazon Image (AMI), after initial boot it would then call into Puppet to configure it from the ground up. This meant that a scaling up event could take many minutes to occur – even with SSD-backed instances – which isn’t ideal.

Not only could this take a long time – when it worked – but we were also dependent on third-party repositories being available, or praying that Ruby gem installations actually worked. If a third-party was not available then the instances would not even come up, which is worse position to be in than it just being slow.

The obvious answer to this problem is to cut an AMI of the whole system and use that for scaling up. However, this also poses another problem that you now make your AMI a cliff edge that sits outside of your configuration management system.

This is not a particularly new problem or conundrum of course. I can personally recall quite heated debates in previous companies about the merits of using AMIs versus a configuration management system alone.

We thought about this ourselves too and came the conclusion that instead of accepting this binary choice we’d split the difference and use both. We achieved this by modularising our deployment process for production and using a number of different tools.

The Tools

Teamcity – we were already using our continuous integration system as the initiator of our non-production deployments so we decided to leverage all the good stuff we already had there and, crucially, we could let our different product teams deploy their own builds to productions and we would just support the process.

Fabric – we’ve been using Fabric for deployments for quite some time already. Thanks to the excellent support for AWS through the Boto library we were easily able to utilise the Amazon API to programmatically determine our environments and services within our Fabric scripts.

Puppet – when you just have one server for a product using a push deploy method makes sense as its quick. However, this doesn’t scale. Bart created a custom Puppet provider that could retrieve a versioned deployment from S3 (pushed via Fabric) so we could pull our code deploys on to remote hosts.

Packer – we opted to use Packer to build our AMIs. With Packer, we could version control our environments and then build a stable image of a fully puppetized host which would also have the latest release of code running at boot, but could still run Puppet as normal as well. This meant we could remove the cliff edge with an AMI, because, at the very worst we would bring up the AMI and then gain anything that was missing but do so quickly as it was “pre-puppetized”.

Cloudformation – Once we had a working AMI we could then update our version controlled templates and poke the Amazon API to update them in CloudFormation. All scaling events would then occur using the new AMI containing the released version of code.

The Process – when you hit “Run” in Teamcity

  1. Checkout from git the Fabric repo, the Packer repo and the Cloudformation repo.
  2. Using a config file passed to Fabric that would run a task to query the Amazon API and discover our current live infrastructure for a given application/service.
  3. Administratively disable Puppet on the current live infrastructure so Puppet doesn’t deploy code from S3 outside of the deployment process.
  4. Push our new version of code to S3.
  5. Initiate a Packer build, launching an instance and deploying the new code release.
  6. Run some smoke tests on the Packer instance to confirm and validate deployment.
  7. Cut the AMI and capture its ID from the API when its complete.
  8. Re-enable and run Puppet on our running infrastructure thus deploying the new code.
  9. Update our Cloudformation template with the new AMI and push the updated template to the CloudFormation API.
  10. Check-in the template change to Git.
  11. Update our Packer configuration file to use the latest AMI as its base image for the next deploy.

What we’ve found with this set-up is, for the most part, a robust means of using Puppet to deploy our code in a controlled manner, and being able to take advantage of all the gains you get when autoscaling from baked AMI images.

Obviously we do run the risk of having a scaling event occur during deployment, however, by linking the AMI cutting process with Puppet we’re yet to experience this edge case, plus all our code deploys are (and should be) backwards compatible, so the edge case doesn’t pose that much of a risk in our set-up.

 

Replicating from AWS RDS MySQL to an external slave (without downtime)

We’ve recently needed to create an external copy of a large database running on Amazon RDS, with minimal or no downtime. The database is a backend to a busy site and our goal was to create a replica in our data centre without causing any disruptions to our users. With Amazon adding support for MySQL 5.6 this meant that we’re able to access the binary logs from an external location, which wasn’t possible before.

As MySQL replication only works from a lower version to an equal or higher version, we had to ensure that both our databases were on MySQL 5.6. This was simple with regards to the external slave but not as easy with the RDS instance, which was on MySQL 5.1. Upgrading the RDS instance would require a reboot after upgrading to each version i.e. 5.1 ->  5.5 -> 5.6. As per the recommendation in the Amazon upgrade guide we created a read replica and upgraded it to 5.6. With the replica synced up, we needed to enable automated backups before it was in a state where it could be used as a replication source.

Creating an initial database dump proved tricky, as the actual time to create the backup was around 40-50 minutes. The import time into the external slave was around 3-4 hours and with the site being as active as it is, the binary log and position changes pretty quickly. The best option would be to stop the RDS slave while the backup is happening. Due to the permissions given to the ‘master’ user by Amazon, running a STOP SLAVE command would return a

ERROR 1045 (28000): Access denied for user ‘admin’@’%’ (using password: YES)

Luckily there’s a stored procedure which can be used to stop replication –mysql.rds_stop_replication

mysql> CALL mysql.rds_stop_replication;
+—————————+
| Message |
+—————————+
| Slave is down or disabled |
+—————————+
1 row in set (1.08 sec)

Query OK, 0 rows affected (1.08 sec)

With replication on the RDS slave stopped, we can start creating the backup assured that no changes will be made during the process and any locking of tables won’t affect any users browsing the website.

Once the backup completes, we’d want to start up replication again but before doing this we’ll be able to get the binlog file log and position:

mysql> show master status;
+—————————-+———-+————–+——————+——————-+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+—————————-+———-+————–+——————+——————-+
| mysql-bin-changelog.074036 | 11653042 | | | |
+—————————-+———-+————–+——————+——————-+

This will be required when setting up the external slave later on. Now that we have the relevant information we can start the replication.Again, we’d need to use the RDS mapping of START SLAVE:

mysql> CALL mysql.rds_start_replication;
+————————-+
| Message |
+————————-+
| Slave running normally. |
+————————-+

 

Once the dump has been imported we can set the the new master on the external slave with the values previously recorded:

CHANGE MASTER TO MASTER_HOST=’AWS_RDS_SLAVE’, MASTER_PASSWORD=’SOMEPASS’, MASTER_USER=’REPL_USER’, MASTER_LOG_FILE=’mysql-bin-changelog.074036′, MASTER_LOG_POS=11653042;

Before we start the replication, we need to add a few more settings to the external slave’s my.cnf:

  • a unique server-id i.e. one that’s not being used by any of the other mysql DBs
  • the database(s) you want to replicate with replicate-do-db. This stops the slave trying to replicate the mysql table and other potential RDS related stuff. Thanks to Phil for picking that up.

So something like:

server-id = 739472
replicate-do-db=myreplicateddb
replicate-do-db=mysecondreplicateddb (if more than one db needs to be replicated)

Start up replication on the external slave – START SLAVE; 

This should start updating the slave, which you can monitor via

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Log_File: mysql-bin-changelog.075782
Read_Master_Log_Pos: 113973
Relay_Log_File: relaylog.014354
Relay_Log_Pos: 114089
Relay_Master_Log_File: mysql-bin-changelog.074036
Slave_SQL_Running_State: updating

The above values are the most important from the sea of information that the command returns. You’ll be waiting for MASTER_LOG_FILE and Relay_Master_Log_File to be identical and Slave_SQL_Running_State having a status of Slave has read all relay log; waiting for the slave I/O thread to update it

Once that syncs up, an external replica has been created with zero downtime!