Office music, some love it and some hate it. While I’m in the camp that’s for office music I can completely understand why some might not be in favour of it.
We here at Mind Candy find music in the workplace to be a mood enhancement, and in a way a bonding process. You find similarities between yourself and your peers and generate links that weren’t there previously. Music helps reduce those awkward silences filled with keyboard tapping, mouse clicking and the odd coughing fits, and introduces an atmosphere which is indusive to the culture we look to nurture and promote. There’s a great few articles out there which go into greater detail about whether music in the workplace is a good or bad thing, some can be found here.
Last year we started looking into a solution for playing music for the area in which our team sits, after some search engine fu we found Mopidy. Mopidy is an extensible MPD and HTTP server written in Python. Mopidy plays music from your local disk and radio streams while with the help from extensions, you can also play music from cloud services such as Spotify, SoundCloud and Google Play Music.
As we already have a few Spotify accounts we thought we’d toy with the idea of using Mopidy to play music from Spotify. In order to use Spotify you also need to use the Mopidy-Spotify extension.
Once we had both Mopidy and the Spotify extension working we then needed something to interact with it all. After looking through the Mopidy documentation we came across the web extensions section which suggests various web interfaces to interact with the HTTP side of the Mopidy server.
Initially we used Apollo Player. Apollo Player’s great as it allows anyone to log in using their Google Apps or Twitter credentials and then add music to a one time playlist meaning anyone can choose what music is playing. There is also a bombing feature so any music that’s been added can be skipped if bombed by three people. When no music has been selected it will default back to a playlist set in config.js which is found in the root directory of Apollo. The problem there is that once the default playlist has been played for the umpteenth time it can get pretty tedious and only people with access to the app’s root directory can change this. This led us to Mopify.
Mopify gives you much of the functionality that the Spotify client gives you e.g. Browse, Featured Playlists, New Releases, Playlists and Stations. You can log in with your own Spotify account or use the account that Mopidy-Spotify is utilising and use the playlists associated with either account. It gives you greater functionality and options than Apollo but then you lose the collaboration and unmanaged element you had with Apollo.
Finally we then needed to actually run Mopidy on something as it was no good having it run from my local machine. We decided to use a Raspberry Pi and plugged it into some speakers running along the cable trays above our heads. The Rasberry Pi is running Raspbian with Mopidy, Mopidy-Spotify and which ever web extension we’ve chosen. Another Raspberry Pi with Mopidy has been set up as a jukebox in our chillout/games area which works really well with Mobile devices due to most of the web extensions being bootstrapped. This gives employees the flexibility to easily play whatever music they feel like when they are in the communal area.
In our eyes, while music in the office isn’t a necessity, it is definitely beneficial, and it’s fantastic that all these open source tools and products give us the ability to do this.
And lets be honest, who can’t resist an impromptu sing along to Bohemian Rhapsody!
Mopidy - Extensible music server written in Python
Screen Automation – Selenium (and some other stuff), meets Raspberry Pi
Lets set the scene, you need to display some stuff on a screen so everyone in the office can see it. Easy, you mount a couple of TVs on the wall get a dvi-splitter and an old mac mini you had in the store room on the top shelf behind a roll of cat5 cable.
Set everything up, get the mac mini to auto login and mount a shared drive, then run a little script that uses selenium to open a browser and show pre-determined images of the stuff want to display, all stored on the same shared drive, done….
Fast forward a couple of years and you now have a lot more to display on a lot more screens, but what are you going do? It’s impractical – and expensive – to buy a bunch of mac minis just to run a script that opens a web browser. The end goal of all this is to have dashboards that are easily manageable by their respective teams.
Have you heard of this new Raspberry Pi thing. Its a small ARM PC that’s the size of a credit card, and they’re cheap. What they’re also USB powered? Bonus now we can just power them from the TV itself and when the TV comes on the pi comes on. Now we just replace the mac mini with the pi and run the same script when it boots and we’re all done. Wait not so fast, the share isn’t public so we need a credentials to connect. That’s OK we can store them in a file locally and use fstab to connect. Yeah that works but we want to display different things on different screens so now I have to create different scripts and manually tell each Pi which one to use. OK that’s not too bad, the first time you set up each one just point it to the script it needs to run and then you can just update the script and reboot the pi. So far its shaky but it works, sometimes. One of the problems was that sometimes it would try to run the script on the network share before it was mounted properly and also running a script or (multiple at this point) over the network on a device with the processing power of about 7.4 hamsters isn’t really going to cut it. I’m getting tired of crowbarring fixes into something that wasn’t really designed for this use and troubleshooting seemingly random issues.
What do I actually want to accomplish here and how am I going to do it??
Have the script run locally, its only managing a web browser after all.
Config easily changeable and centrally managed.
Get the pi to check for new config on startup.
Done, yes that’s it pretty simple, so here’s what I did.
json file. Lists the pages that the web browser should visit. Could also be local files loaded into the browser images etc.
python script. Loads the json ‘config’ and specifies how long each page should be displayed etc and does a bit of error checking.
Git (or other) repository
Edit your rc.local to run a bash script that lives somewhere locally on the pi. eg /opt/scripts/ The bash script downloads selenium, firefox (actually iceweasel on debian) and facter (so we can get info really quickly)
I did consider using puppet for this whole thing at one point but that was a bit of overkill plus it had its own complications at the time try to run on on an ARM processor)
The bash script also uses facter to determine the mac address of the pi and remove the colons. (I must admit that facter may be a bit overkill here as well but hey, I’ve gotten used to having it around). It then searches your webserver (or other location) for files carrying its mac address as a name, ( I have a set of defaults that it uses if none are found). Have your webserver run a cron that pulls the repository of all your files. You could have each device pull the repository directly but the more screens you have the more inefficient that will be as you’ll be storing a whole repo on the pi just to get at 1 or 2 files. you could also have a web hook that only updates the web server when there are changes to the repo but I didn’t think it was worth it at this point. The json is self explanatory.
HTML5 Games has always been a bit of a grey area, with the decline of the Flash Platform it still felt like Web Technologies were lagging behind what the Flash and Unity Player could do in the browser.
Over the last year or two this has all changed, since Steve Jobs declared war on Flash it’s been a bit of a bumpy ride but with companies such as Google, Mozilla, and Microsoft all getting behind HTML5, W3C finally declared the standard as ‘complete’ it suddenly feels like the technology has grown up.
HTML Games have also grown up, with Nintendo partnering with Unity and ImpactJS for their Web Framework, as well as the BBC and Nickelodeon investing a lot of money in to converting their existing Flash games to create new and exciting experiences for users on a wider range of devices.
Here at Mind Candy we always want to push things and try new technologies, however we also feel like whatever we do try has to work in a real world scenario and while HTML5 has been around for a while, we’ve never felt it a good fit for us until now.
With PopJam growing as a platform we always wanted to deliver games to our audience, however with the App store submission times releasing content frequently making the games natively within the App was completely out of the window, also having to support multiple platforms we needed something that was write once and deploy across all, this is where HTML5 came in for us.
One of the huge benefits of using HTML5 was that it is truly cross platform, and while the performance of native will always be far greater, porting the games over to each platform would’ve destroyed us as a team.
When starting out with HTML5 we instantly noticed that even though we were cross platform, there were still hoops we needed to jump through to make things work in the way that we wanted, the main pain point being audio.
As we were targeting a mass of devices we needed to make sure that our games worked on all resolutions and inputs worked as expected, however it felt that once we’d broken this barrier we’d be okay.
iOS provided a UIWebView we could use out of the box, however we decided to use the Crosswalk Project for Android as it allowed us greater control than the one that comes built in to Android.
Using HTML5 means we were not bound by the App Store restrictions, meaning we can push new games and updates out incredibly fast. It’s not only deployments that are faster either, on of the most powerful things with making HTML5 games is that it’s a link to a page and the game can be played.
One of the things with making HTML5 games is that there are so many things that need to be considered, such as asset loading, memory management, input, physics, 3D, 2D, animations and many more, we had to decide on the best way to deliver our games in the most optimal way possible.
On top of all of these decisions there are also multiple ways to render content within the browser:
CanvasStarted as an experiment by Apple, is is now possibly the most widely supported standard for generating graphics on the web. Using canvas also eliminates a lot of cross compatibility issues that other methods may have. Performance tests on both iOS and Android worked out quite well for us.
WebGLWebGL offers hardware accelerated graphics within the browser and on mobile is really still early days, while iOS implemented full support it still comes with some very interesting edge cases. Android support for WebGL is very different world as we found out when targeting low end devices.
Divs / CSS TransitionsThe method of updating divs that are rendered on the page is an interesting one as it allows for nice affects using CSS3 transitions, however the lack of support across mobile browsers and different versions of mobile operating systems was a problem.
We tried all of the above methods and ultimately we ended up utilising all of them, it really came down to the content that was being presented to the user. We used WebGL where we could, and anything that didn’t support it we fell back to Canvas.
Anything that had relatively simple content we ended up manipulating divs and using various methods for transitioning elements to fix cross compatibility issues.
Choosing a Game Engine
One of the things that stood out when looking for a game engine is that there is a lot of them, and not only engines, there are also products out there that known as ‘Game Makers’ allow you to make games with little to no code such as Construct, Game Maker, and Game Salad. If you’re looking for something to try I can highly recommend this website.
We actually tried a couple of different engines, as well as allowing people who weren’t developers to use the ‘Game Makers’ to prototype ideas and test performance.
After evaluating our choices we decided to use Pixi.js from Goodboy Digital, an incredibly lightweight engine that offers an ActionScript like API as well as many other features such as:
Multi Touch for Mobile
Sprite Sheet support
Full Scene Graphs
Third Party Libraries (Spine, Tiling)
it also allowed us to toggle effortlessly between Canvas and WebGL to allow for support on lower end devices.
Another thing that Pixi has is thorough tutorials, incredible documentation and a very active community which goes a long way when choosing something like an engine to use be it for games or software in general.
At the time of writing this article, Pixi have just announced v3 of the engine, and have provided a benchmark test to show off the performance. I would strongly urge you go check it out, even on a low end device it’s pretty impressive.
With tools such as:
YeomanYeoman allows you to start new projects, choosing from hundreds of generators that have been created it, you are able to scaffold new projects quickly whilst prescribing best practices and tools.
BowerThis is one of the most lightweight package managers along with NPM I’ve used in my career, allowing us to manage dependencies across projects effectively and also allowed us to keep our repositories incredibly small.
GruntUsing Grunt as our build system was one of the best decisions we made, allowing us to move incredibly fast when building our games, and automate a lot of tasks that done manually would’ve been incredibly laborious.
We were able to create a solid work flow from starting a project to releasing our content on to PopJam.
It’s not all rosy
As amazing as things have been making games over the past few months, it has not been without its headaches and hair pulling moments, but this is why we love what we do, right? If it wasn’t a challenge then it would be boring.
Targeting multiple platforms comes with its own problems, however some of the biggest problems we had was with the hardware on Android, as there are a lot of cheaper low end devices that are prime for parents to buy for their children we encountered devices claiming they supported certain features however when running in the browser would crash the PopJam instantly leaving us in a state of flux and no logs to go on. We found a lot of this came down to the chipsets that the cheaper devices use.
It wasn’t only Android that caused us problems either, with the iPod Touch 4G being one of most used devices amongst children and some only supporting iOS6 this left us not being able to push performance as much as we wanted, as well as the iOS6 UIWebview implemented being very temperamental about what standards it supported.
The one thing that caused us the most headaches out of everything though was Audio, HTML5 Audio is still very limited and even more so on some of the cheaper devices with some only supporting the WAV format which means larger file sizes, any other format used would cause the whole application to crash as no other codec was available. It is recommended to use the
We’ve had some amazing fun creating some interesting games for PopJam using HTML5, not only because we got to make games but we also got to build some awesome internal technology and tools, create a pipeline from concept to production in just a few months, and most importantly we got to create some engaging experiences for our PopJam users.
As we’ve mentioned in previous posts, we use AWS services extensively at Mind Candy. One of the services that we’ve blogged about before is CloudFormation. CloudFormation (CF) lets us template multiple AWS resources for a given product into a single file which can be easily version controlled in our internal Git implementation.
Our standard setup for production is to use CF to create Autoscaling Groups for all EC2 instances where, as Bart posted a while back, we mix and match our usage of on-demand instances and spot priced instances to get the maximum compute power for our money.
During load testing of the backend services of our games we did, however, notice a flaw in the way we’re doing things. Essentially, this was the speed with which we could scale up under rapid traffic surges, such as those generated by feature place in mobile app stores.
The core problem for us was that our process started with a base Amazon Image (AMI), after initial boot it would then call into Puppet to configure it from the ground up. This meant that a scaling up event could take many minutes to occur – even with SSD-backed instances – which isn’t ideal.
Not only could this take a long time – when it worked – but we were also dependent on third-party repositories being available, or praying that Ruby gem installations actually worked. If a third-party was not available then the instances would not even come up, which is worse position to be in than it just being slow.
The obvious answer to this problem is to cut an AMI of the whole system and use that for scaling up. However, this also poses another problem that you now make your AMI a cliff edge that sits outside of your configuration management system.
This is not a particularly new problem or conundrum of course. I can personally recall quite heated debates in previous companies about the merits of using AMIs versus a configuration management system alone.
We thought about this ourselves too and came the conclusion that instead of accepting this binary choice we’d split the difference and use both. We achieved this by modularising our deployment process for production and using a number of different tools.
Teamcity – we were already using our continuous integration system as the initiator of our non-production deployments so we decided to leverage all the good stuff we already had there and, crucially, we could let our different product teams deploy their own builds to productions and we would just support the process.
Fabric – we’ve been using Fabric for deployments for quite some time already. Thanks to the excellent support for AWS through the Boto library we were easily able to utilise the Amazon API to programmatically determine our environments and services within our Fabric scripts.
Puppet – when you just have one server for a product using a push deploy method makes sense as its quick. However, this doesn’t scale. Bart created a custom Puppet provider that could retrieve a versioned deployment from S3 (pushed via Fabric) so we could pull our code deploys on to remote hosts.
Packer – we opted to use Packer to build our AMIs. With Packer, we could version control our environments and then build a stable image of a fully puppetized host which would also have the latest release of code running at boot, but could still run Puppet as normal as well. This meant we could remove the cliff edge with an AMI, because, at the very worst we would bring up the AMI and then gain anything that was missing but do so quickly as it was “pre-puppetized”.
Cloudformation – Once we had a working AMI we could then update our version controlled templates and poke the Amazon API to update them in CloudFormation. All scaling events would then occur using the new AMI containing the released version of code.
The Process – when you hit “Run” in Teamcity
Checkout from git the Fabric repo, the Packer repo and the Cloudformation repo.
Using a config file passed to Fabric that would run a task to query the Amazon API and discover our current live infrastructure for a given application/service.
Administratively disable Puppet on the current live infrastructure so Puppet doesn’t deploy code from S3 outside of the deployment process.
Push our new version of code to S3.
Initiate a Packer build, launching an instance and deploying the new code release.
Run some smoke tests on the Packer instance to confirm and validate deployment.
Cut the AMI and capture its ID from the API when its complete.
Re-enable and run Puppet on our running infrastructure thus deploying the new code.
Update our Cloudformation template with the new AMI and push the updated template to the CloudFormation API.
Check-in the template change to Git.
Update our Packer configuration file to use the latest AMI as its base image for the next deploy.
What we’ve found with this set-up is, for the most part, a robust means of using Puppet to deploy our code in a controlled manner, and being able to take advantage of all the gains you get when autoscaling from baked AMI images.
Obviously we do run the risk of having a scaling event occur during deployment, however, by linking the AMI cutting process with Puppet we’re yet to experience this edge case, plus all our code deploys are (and should be) backwards compatible, so the edge case doesn’t pose that much of a risk in our set-up.
We’ve recently needed to create an external copy of a large database running on Amazon RDS, with minimal or no downtime. The database is a backend to a busy site and our goal was to create a replica in our data centre without causing any disruptions to our users. With Amazon adding support for MySQL 5.6 this meant that we’re able to access the binary logs from an external location, which wasn’t possible before.
As MySQL replication only works from a lower version to an equal or higher version, we had to ensure that both our databases were on MySQL 5.6. This was simple with regards to the external slave but not as easy with the RDS instance, which was on MySQL 5.1. Upgrading the RDS instance would require a reboot after upgrading to each version i.e. 5.1 -> 5.5 -> 5.6. As per the recommendation in the Amazon upgrade guide we created a read replica and upgraded it to 5.6. With the replica synced up, we needed to enable automated backups before it was in a state where it could be used as a replication source.
Creating an initial database dump proved tricky, as the actual time to create the backup was around 40-50 minutes. The import time into the external slave was around 3-4 hours and with the site being as active as it is, the binary log and position changes pretty quickly. The best option would be to stop the RDS slave while the backup is happening. Due to the permissions given to the ‘master’ user by Amazon, running a STOP SLAVE command would return a
ERROR 1045 (28000): Access denied for user ‘admin’@’%’ (using password: YES)
mysql> CALL mysql.rds_stop_replication;
| Message |
| Slave is down or disabled |
1 row in set (1.08 sec)
Query OK, 0 rows affected (1.08 sec)
With replication on the RDS slave stopped, we can start creating the backup assured that no changes will be made during the process and any locking of tables won’t affect any users browsing the website.
Once the backup completes, we’d want to start up replication again but before doing this we’ll be able to get the binlog file log and position:
Once the dump has been imported we can set the the new master on the external slave with the values previously recorded:
CHANGE MASTER TO MASTER_HOST=’AWS_RDS_SLAVE’, MASTER_PASSWORD=’SOMEPASS’, MASTER_USER=’REPL_USER’, MASTER_LOG_FILE=’mysql-bin-changelog.074036′, MASTER_LOG_POS=11653042;
Before we start the replication, we need to add a few more settings to the external slave’s my.cnf:
a unique server-id i.e. one that’s not being used by any of the other mysql DBs
the database(s) you want to replicate with replicate-do-db. This stops the slave trying to replicate the mysql table and other potential RDS related stuff. Thanks to Phil for picking that up.
So something like:
server-id = 739472
replicate-do-db=mysecondreplicateddb (if more than one db needs to be replicated)
Start up replication on the external slave – START SLAVE;
This should start updating the slave, which you can monitor via
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
The above values are the most important from the sea of information that the command returns. You’ll be waiting for MASTER_LOG_FILE and Relay_Master_Log_File to be identical and Slave_SQL_Running_State having a status of Slave has read all relay log; waiting for the slave I/O thread to update it
Once that syncs up, an external replica has been created with zero downtime!
In the beginning Clonezilla was used for imaging the Macs and for asset management we solely used GLPI. Having come from a Windows background with a little bit of Debian and Red Hat experience, this lack of control and visibility of the machines you were supporting was completely alien to me. Soon after joining I was given the task of reviewing our imaging process and look to see what other technologies were available. Along came DeployStudio.
DeployStudio is a fantastic deployment tool which allows you to image and deploy Macs and PCs. As opposed to just creating and deploying images it allows you to create advanced workflows and fully automate your deployment from start to finish. In order to run DeployStudio you will need OS X Server for NetBoot deployments. We initially installed DeployStudio on a Mac Mini which ran fine but then set up another Deploystudio Server on a second Mac Mini as a replica for redundancy purposes.
We didn’t want to get into the habit of creating multiple images for different models of machines but instead have one vanilla image with varying workflows depending on the purpose of the Mac it will be deployed to. In order to create a vanilla image you download the latest Mac OS X installer via the App Store and then use AutoDMG to build a system image ready for use with the latest updates. Once we have our base image we can then look to create a workflow. This workflow will then apply all the changes needed over the top of the vanilla image such as localising the Mac, setting the computer name, creating the local users, installing necessary packages, running scripts and lastly it then runs software update.
We were then left with the problem of, how do we preload and manage software on all our Macs? DeployStudio can install packages but they then quickly become out of date and, you aren’t able to manage the already installed software on the machines on your network.
Munki is a package management system which allows you to distribute software via the use of a web server and client side tools. Using catalogs and manifests you can install and remove software on your clients depending on what manifest has been applied to what machine. Munki also rather handily collects information on the clients and stores it on the Munki server each time Munki runs such as the user, hostname, IP, failed Munki runs, installed software etc.
The problem we found is that while Munki works well, the software uploaded to Munki Server will soon become out of date and managing the repository of software becomes a task in itself, here’s where AutoPKG and Jenkins comes in. AutoPkg is an automation framework for OS X software packaging and distribution and Jenkins is a continuous integration server. AutoPkg uses recipes to download the latest create the relevant metadata ready for importing to the Munki server via Munki tools. Jenkins is then used to schedule the AutoPkg recipes and Munki import as builds everyday in the early hours of the morning. The end product is Munki containing the most up-to-date software that you’d expect on all of your clients e.g. Chrome, Firefox, Munki tools, Antivirus, Mac Office.
After playing with Munki and being really impressed with it I decided to see what else I could find that would potentially hook into it. After some search I found Sal which is the creation of a guy called Graham Gilbert. Graham is an active contributor in the Mac management world and I personally am a big advocate of his, you can find his GitHub page here. Sal is a Munki and Puppet reporting tool which helps give you, not only a broad overview of your Macs via a dashboard but also an in depth look at each client via the stats collected by Munki and Facter.
Once Sal has been configured on the client, each time Munki runs it triggers a sal submit script which gathers information and stores it on the Sal server. The highly customisable Sal dashboard uses plugins to display high level counters to then display for example which machine is running on which version of OS X or when the clients last checked in. There are custom plugins available on GitHub or you can even create your own. If you wanted to you could rearrange the order of the plugins and hide certain plugins if you had multiple dashboards. Due to the use of Munki and Facter it gives you the ability to drill down into an individual client and retrieve hourly status updates as well as static information about the machine.
In terms of Asset management not much has changed as GLPI is fit for purpose so we’ve had no need to replace it however, we do now use Meraki in conjunction with it.
Meraki is MDM tool which we initially used for our mobile devices but as Meraki grew their Mac and Windows management tools improved. By default now all machines are managed by Meraki as well as the other tools mentioned previously. With Meraki you can use their extremely powerful tools such as location tracking, remote desktop, viewing network stats, fetching process list, sending notifications etc. While on a daily basis they aren’t necessarily needed, they do come in handy for those odd situations.
Via all of the technologies above we now have an imaging process which takes up to 15 minutes with only 30 seconds of actual input. In that imaging process you have a DeployStudio workflow which images the Mac, configures it for the individual user, runs software update and installs Munki/Sal. Once booted Munki then runs and installs all the up-to-date software (thanks to AutoPKG & Jenkins) that the client’s manifest has specified. Going forward you then have monitoring (Sal) and management (Munki & Meraki) of that Mac which then gives you control and visibility of the machine.
The best part is this hasn’t cost us a penny! This is thanks to the ever passionate and generous open source community. See below for the fantastic folks who created the amazing tools I mentioned in the post.
AutoDMG – Builds a system image, suitable for deployment with DeployStudio
London PostgreSQL Group meetup is a unofficial PostgreSQL community event happening quarterly. The meetup agenda is very relaxed but it always involves a lot of good PostgreSQL discussions over some pizza and beer.
The event is always open to everyone and usually announced well in advance through meetup.com website — http://www.meetup.com/London-PostgreSQL-Meetup-Group
Mind Candy had a pleasure of hosting the meetup on the 21 January 2015. We actually had a record attendance which was awesome; thank you to everyone who came!
We had two really good talks. First one was a joint talk by Howard Rolph & Giovanni Ciolli about key features of recently released PostgreSQL 9.4 followed by an awesome talk by Rachid Belaid about full-text search capabilities  (with proper deep-dive into technical details and how to do it) in PostgreSQL. Apparently you don’t really need to build a totally separate Elasticsearch cluster if you want to store documents and perform most usual operations on them; Postgres will do just as well! Who knew!
Howard talking about new key features in PostgreSQL 9.4
Again, thanks everyone for coming and especially to the great speakers and see you all next time!
For a while I’ve wanted to visualise the pain points of the development cycle so I could better explain to product & business owners why their new features take time to deliver. Also to squash the misnomer of the overused “Move fast, break things” mantra of Facebook.
So some of you may know that recently Facebook realised this wasn’t actually sustainable once you have your product stable and established. It might work in the early conception of a product, but later on it will come back to haunt you.
At the F8 Developers conference in 2014, Facebook announced they are now embracing the motto “Move Fast With Stable Infra.”
“We used to have this famous mantra … and the idea here is that as developers, moving quickly is so important that we were even willing to tolerate a few bugs in order to do it,” Zuckerberg said. “What we realized over time is that it wasn’t helping us to move faster because we had to slow down to fix these bugs was slowing us down and not improving our speed.”
I’ve recently been reading Peter Senge’s “The Fifth Discipline: The Art and Practice of the Learning Organization” and thought circles of causality would be a good way to express this subject. Using system thinking and looking at the whole picture can really help you stand back and see what’s going on.
Be it growing user acquisition or sales, a good place to start is what are you trying to achieve and how to exponentially increase it. In this case lets say that as a business you want to drive user engagement and grow your DAU. One possible way is to add new features to your product.
So following the circle in this diagram we assume that by delivering new features, we increase user engagement which in turn leads to growing your DAU/Virality.
Lets assume the product has been soft launched, you’re acquiring users, A/B testing initial features and have begun growing your list of improvements and new features. Lets create a ‘Backlog’ of all this work, prioritise them, plan how we deliver those quickly using an Agile Scrum framework.
We want to deliver a MVP as quickly as possible, so lets do two week ‘Sprints’. The product team have lots of great ideas but far too many to put into one sprint and some of the new features require several sprints. Product owners & other business leaders debate the product roadmap and you have your sprint planning….simple right ?….Well to begin with yes
So lets look at what the size of the backlog does to delivery. In this diagram you see that the size of backlog directly affects how soon improvements/features are made. Why? Well because the backlog is also made up of bug fixes and technical debt, often inherited from your prototyping phase and deploying your MVP.
You’d love to tell the business that you’ll just work on new stuff; but hey worse case we deliver something every two weeks, but some features could take months to appear.
So with a relatively small backlog we are ok. Yes some business leaders are a bit frustrated their feature won’t get deployed in this sprint but the short term roadmap is clear right ?
Dev team gets their head down and get on with the sprint….but in the background product/business owners have moved on from yesterday’s must have feature to yet another new shiny idea or potential crisis; the backlog grows and the roadmap is changed. Features get pushed further down the priority list.
So the situation is we have X amount of resource and over time the business is getting frustrated at the pace of delivering changes to the product. Weeks have passed and only 25% of these ideas/features are shipped.
The Symptomatic Solution
So there could be two potential fixes for this…move faster and drop quality so we can ship stuff quicker or throw more people at it. Lets look at what happens with the “Move fast, break things” mantra. So to increase delivery time we cut corners, drop some testing, code reviews, developers pushed to make hacky solutions etc etc
As you see in this diagram, as you do this you create more bugs and the QA process takes longer. Any initial advantages are lost as this builds up.
Now we have also added a ‘side effect’. More bugs increase the size of the backlog creating the opposite effect you intended in the first place.
So lets put in more man hours (overtime) to get those bugs down and reduce this growing backlog. More overtime increases fatigue & the quality of the work. Developers get burnt out, they make more mistakes, quality of work suffers and again more bugs and are even more demoralised.
Lets look at the result of this on staff & the complexity of their work. In this diagram we see that by reducing quality we also increase code complexity which generates technical debt, which again slows down development. Tech debt is pretty demoralising, as usually no one is invested in fixing it and in most cases you just work around it.
Adding more developers has a different outcome with equally diminishing results. Big teams in an Agile framework, isn’t always a great idea. The typical strategy is to organize your larger team into a collection of smaller teams, and the most effective way to do so is around the architecture of your system.
The harder you push, the harder the system pushes back
When you look at the whole system, each part has a cause and effect. The harder you push one part, other parts are affected. Each one of these parts needs to be balanced against each other so that the system runs efficiently. It’s also important to step back and make sure you are solving the actual problem not trying to fix a symptom.
In this example the perceived view is that the team is moving slowly, whereas in fact they are moving at a pace that balances the system. Move fast, with stable infra is the sensible option. Use system diagrams like this to seek out counter balance to reinforcing circles.
Back in July, we posted about a tool for Amazon DynamoDB called Dynamic-DynamoDB that gives you the ability to dynamically increase and decrease your provisioned throughput in DynamoDB according to your current usage.
What you gain from using a tool like this, if you implement it correctly, is provisioned throughput in Dynamo that will ensure you have capacity when you need it.
Last week, we released our latest game, World of Warriors on the App Store, and, without going into the back-end systems in too much detail, we have a use case for DynamoDB in the game.
This is why we put the time into writing a Puppet module for DynamicDynamoDB which you can get from our Github account.
Since launch World of Warriors has performed incredibly. We received Editor’s Choice from Apple, and had over 2 million download in the first weekend alone, which made the game crack the top 50 grossing games in the US and become #1 role playing game for iPad in 80 countries.
As you can imagine, with statistics and rapid uptake like that you need have a good scaling. So did DynamicDynamoDB “do the business” for us over an extremely busy weekend?
We’ll let the following graph speak for itself. It shows our usage and scaling of DynamoDB from launch to yesterday.
As you can see, our provisioned capacity compared to consumed capacity took a cautious approach.
We did this because what we didn’t want to happen was for us to not scale up fast enough and then get throttled by Amazon. Obviously this has a trade-off with cost, but for us this was acceptable thanks to what we had learned prior to hard launch.
So what had we learned?
We’d observed that altering throughput on DynamoDB tables is in no way immediate. This led us to the conclusion that to get the most out of dynamic scaling and avoid throttling you need to scale up early and do so by a significant amount.
In our case, we initiate a scaling up event when our consumed capacity reaches 60% of our provisioned capacity, and we scale back when we go below 30%. Each of these scaling events either increases or decreases provisioned capacity by 25%.
As the graph above shows, this strategy meant that at all times we had significant capacity for sudden bursts in traffic, whether due to virality, marketing, or any other sort of active promotional event that might occur.
Don’t forget to download World of Warriors on the AppStore, we’ll see you in Wildlands! Let battle commence!
AWS has definitely changed the way we all approach infrastructures these days, especially here — at Mind Candy.
We’re finally not limited by the amount of available hardware, so we can get whatever amount of resources (well, nearly) we need, whenever we need, plus we get CloudFormations.
However, as exciting as spawning 100+ servers can be, as with many things, if you’re not cautious and smart, it can cost you a lot of money.
One way to save a bit of money on your AWS bill (and “a bit” is a serious understatement) is by utilising Spot Instances.
“Spot Instances allow you to name your own price for Amazon EC2 computing capacity. You simply bid on spare Amazon EC2 instances and run them whenever your bid exceeds the current Spot Price, which varies in real-time based on supply and demand.” – http://aws.amazon.com/ec2/purchasing-options/spot-instances/
How much can you save? Well, the c3.large instances which we use across the board for our application tier in on-demand pricing cost $0.12 per hour. When we use the same instance type with spot pricing we get them most of the time for around $0.02. That’s 6x cheaper compared to on-demand.
So what’s the trade-off? Well, if for some reason the spot-instance price exceeds your bid price, your spot reservations will get cancelled and your spot-instances will be killed. In short — your instances can and will die at random times and it’s not 100% guaranteed that you’ll get them when you want them.
That’s not good. Even if you use CloudFormations and auto-scaling as you could end up without instances when the spot price becomes too high – that could be almost the same as an AZ failure if you’re not prepared for it.
However, there’s a way to overcome that risk. In a single CloudFormation, you can create two launch configurations — one for on-demand instances and another one for spot-instances. With carefully tweaked scaling thresholds, you can make your spot-instances be preferred over on-demand instances, but still ensure on-demand takes over should spot-instances no longer be available at your bid price.
This way, if you can get spot-instances, your stack will be pretty much fully built using spot-instances. If (and when) the price goes over your bid price, spot instances will start getting killed and your on-demand instances will start booting up instead to cover the increased price. When the spot-price return beneath your bid price, spot instances will start booting up, slowly phasing out on-demand instances.
After few weeks of tests we managed to come up with a set of thresholds which work pretty well for us and keeps our stacks stable around the clock.
With on-demand, we always have a single instance running by setting the minimum to 1. Scale-up event happens when our average CPU usage exceeds 80% for a 5 minute period and we increase the on-demand autoscale group by 2 instances. We then scale down 1 instance at a time if the average CPU usage is less then 65% for a period of 5 minutes, and we ensure that a scale-down event only happens once in a 15 minute period.
With spot-instances, we also request a minimum of 1 instances but we set ourselves a bid price of $0.12 – remember, the bid price is not the price you pay, it’s the maximum you are willing to pay. Most of the time we have a spot-price cost of just $0.02!
As with on-demand we scale on average CPU in the spot-price autoscale group. However, we scale-up whenever we reach 50% (instead of 80%), and we also add 2 instances. We scale down and cancel our spot instances when we dip below 30% CPU usage.
The result is probably best as a picture from Ice (Ice is a great tool from Netflix that helps manage AWS costs). Below is the hourly cost of one of our app tiers before and after we started utilising spot instances.
For us, in the case of this specific stack, spot instances gave us savings up to 60%. Bear in mind the size of this specific stack is quite small (up to 10-12 instances at peak); so the bigger the stack, the more savings you’ll see!
To wrap up, I just wanted to share few tips and tricks we picked up along the way, that should help you:
bake AMIs; tools like Packer will greatly help you do this; this will let you minimise time required to boot up a new instances; it’ll give you much more, but the time is crucial when it comes to scale-up events, especially when spot-instances are being killed and you want on-demand instances to fill out the empty spaces ASAP. We managed to get time required to boot up a new instances down to around 75 seconds
use EBS based instances; they cost a fraction more (and yeah, EBS can be painful) but they’re boot time is significantly faster then the ephemeral-storage based instances
bid price = on-demand instance price; this way in worst case you’ll pay what you’d normally pay for on-demand instance
<shamelessplug> Obviously, the most important requirement is having an awesome application that is cloud-friendly. If you’re interested in building cloud-native applications and awesome infrastructures, we’d love to hear from you! ;-) </ shamelessplug>