Testing with Amazon SQS

We all know how great Amazon SQS is, and here at Mind Candy we use it extensively in our projects.

Quite recently, we started making some changes to our Data Pipeline in order to speed up our Event Processing, and we found ourselves with the following problem: how can we generate thousands of messages (events) to benchmark it? The first solution that came into our minds was to use the AWS Command Line Interface, which is a very nifty tool and works great.

The AWS Command Line Interface SQS module comes with the ability to send out messages in batches, with a maximum of 10 messages per batch, so we said: “right, let’s write a bash script to send out some batches”, and so we did.

Problem

It worked alright, but it had some problems:

  • It was slow; because messages were being sent in batches of up to 10 messages and not in parallel
  • The JSON payload had to contain some metadata along with the same message repeated 10 times (1 for each message entry)
  • If you needed to send 15 messages, you would have to have 1 message batch with 10 entries and another one with 5 entries (2 JSON files)
  • Bash scripts are not the best thing in the world for maintenance

So, what did we do to solve it? We wrote our own command line program, of course!

Solution: meet sqs-postman

Writing command line applications in Node.js is very very easy, with the aid of the good old Commander.js. Luckily, AWS has an SDK for Node.js, so that means that we don’t need to worry about: AWS authentication, SQS API design, etc. Convenient? Absolutely!

Sqs-postman was designed with the following features out of the box:

  • Sends messages in batches of up to 10 messages at a time (AWS limit)
  • Batches are sent out in parallel using a default of 10 producers, which can be configured using the –concurrent-producers option
  • A single message is read from disk, and expanded into the total number of messages that need to be sent out
  • It supports AWS configuration and profiles

In order to solve the “messages in parallel” problem, we used the async library. We basically split the messages into batches and we then use eachLimit to determine how many batches can be executed in parallel, which starts with a default value of 10 but can be configured with an option.

Can I see it in action?

Of course you can! sqs-postman has been published to npm, so you can install it by running:

 npm install -g sqs-postman

Once installed, just follow these simple steps:

  • Make sure to configure AWS
  • Create a file containing the message, i.e. message.json with a dummy content
    {
       "message": "hello from sqs-postman"
    }
  • Run it
    $ postman message my-test-queue --message-source ./message.json --concurrent-producers 100 --total 1000

If you would like to see more information, the debug mode can be enabled by prepending DEBUG=sqs-postman postman…

Text is boring, show me some numbers!

You are absolutely right! If we don’t share some numbers, it will be hard to determine how good sqs-postman is.

Messages aws-cli sqs-postman
100 0m 4.956s 0m 0.90s
1000 2m 31.457s 0m 4.18s
10000 8m 30.715s 0m 30.83s

As you can appreciate, the difference in performance between aws-cli and sqs-postman is huge! Because of sqs-postman’s ability to process batches in parallel (async), the execution time can be reduced quite considerably.

These tests were performed on a Macbook Pro 15-inch, Mid 2012 with a 2.6 GHz Intel Core i7 Processor and 16 GB 1600 MHz DDR3 of RAM. And time was measured using Unix time.

Conclusion

Writing this Node.js module was very easy (and fun). It clearly shows the power of Node.js for writing command line applications and how extensive the module library is when it comes to reusing existing modules/components (e.g. AWS SDK).

The module has been open sourced and can be found here. Full documentation can be found in there too.

As usual, feel free to raise issues or better yet contribute (we love contributors!).

Building testing tools against REST APIs with node.js and Coffeescript

During the dev cycle here at Mind Candy it is useful to have tools to help automate some of the more repetitive tasks, for example, registering a user. The great thing about test tools is that they are a great excuse to try out new technologies too! In this blog post, I’ll be telling you about a tool we wrote to register new users. It usesĀ node.js, the non-blocking I/O server side JavaScript platform that is powered by Google’s V8 VM.

The reasons why I chose node.js for this project are as follows:

  1. It makes working with I/O operations an absolute breeze. Every I/O operation is required to be non-blocking, so its perfectly valid to keep a request open from a client such as a web browser while you make a bunch of (sometimes concurrent) calls off to REST APIs, databases, external systems etc, wait for them to call back to your code and then return a response. All of this happens without occupying a thread per request, unlike your standard java servlet. In fact, node.js is single threaded and uses an event loop, so as long as your expensive I/O operations are non blocking, the event loop just keeps ticking over and your system stays responsive.
  2. While node.js is a comparatively new technology, it has a HUGE and vibrant community behind it. There are so many 3rd party modules that if you need to interface with any other thing, there is most likely a module already written! Node also has an awesome package manager in npm, which makes declaring and downloading modules super easy.
  3. The holy nirvana – the same language running on the server and the client. Since the app is going to be deployed for the web, and therefore going to be powered by javascript, there are no problems with things such as serialising and deserialising objects between client and server (they all natively speak JSON), or different paradigms between languages giving you an impedance mismatch.

The whole project is actually written in a language called Coffescript. For those who haven’t heard of it, Coffeescript is a language that compiles down to javascript. It abstracts away some of the more grizzly parts of javascript, has a nice, clear and concise syntax and has a bunch of useful syntax features built in, such as loop comprehensions, conditional and multiple assignment and a simple class system. It’s like a bunch of best practices for javascript!

So let’s have a look at some of the code. For example, here is how we talk to the Adoption endpoint:

request = require('request')
xmlbuilder = require('xmlbuilder')

class Adoption
    constructor: (@host) ->

    start: (username, password, email, cb) ->
        adoption = xmlbuilder.create()

        adoption.begin('adoption')
            .ele('email').txt(email).up()
            .ele('password').txt(password).up()
            .ele('username').txt(username).up()

        adoptionXml = adoption.toString({pretty: true})

        request({
            method: 'POST',
            uri: "http://#{@host}/my/rest/service",
            body: adoptionXml,
            headers: {
                'Content-type': 'application/xml'
            }
        }, (err, resp, body) ->
            cb(err) if err

            if(body.search(/<error name="username" rule="is_not_unique"\/>/) > 0)
                usernameError = {
                    message: 'username is already taken'
                }

            cb(usernameError)
        )

module.exports = Adoption

First thing to note is indenting and whitespace is important! You MUST use spaces instead of tabs here, otherwise the compiler complains! Here we’re creating a ‘class’ called Adoption. Javascript doesn’t really have classes, but Coffeescript translates that into some javascript that gives you class-like behaviour. At line 5, we declare the class constructor function. In Coffeescript, functions are a very small declaration: just the function arguments in brackets and then an arrow. Anything after the arrow is the function body. The constructor is very simple, all it’s doing is setting the arguments of the function as member variables of that object. Looking at the javascript generated from the Coffeescript compiler illustrates this:

function Adoption(host) {
    this.host = host;
}

The start function (line 7) takes a bunch of parameters of data from the user and a callback function as the last argument. In node.js, if we are doing any async operation such as calling a REST endpoint, we cannot return the response data from that function since that would block the event loop. Instead, we are provided with a callback function which we can then call with the response once the server responds.

On line 10 we build up the xml payload to the Moshi Monsters REST endpoint, using a module called xmlbuilder. It would be a lot simpler if the endpoint accepted JSON! Next (line 17) we send the request to the endpoint itself. Here, we use the excellent request module. If you are familiar with how you perform ajax requests with jQuery, this should look quite familiar to you! Its another example of how node.js makes full stack development a lot less trouble for your developers as so many of the techniques used on the client side can be applied to your server side code.

The request function takes an object with the options for that request, and a callback that gets called upon error or success. The convention in node.js is to always expect the first argument to your callback function as a possible error, since if the async operation fails, you can check that parameter for the exception. On line 25 there is an example of Coffeescript’s postfix form of the if operator.

We then check for some response xml with a regular expression (line 27) and call the callback with the possible error object if the regex matches. Notice we do not have to declare variables before we use them. If we did this in raw javascript, they would end up becoming global variables, but Coffeescript handily declares them up front for us.
The last line is how we expose our class to the rest of the program. Node.js uses the CommonJS module system, so every file loaded is self contained as a module. We can expose our class by assigning it to the module.exports variable. This allows us to instantiate an Adoption object in another file:

Adoption = require('./path/to/adoption.coffee')

I used the brilliant express http server to serve this webapp. It has the concept of ‘middleware’ – effectively a bunch of functions that every request and response pass through. This means it is super easy to add functionality to express like caching, static file serving, and even asset pipelining as we will see later! We can set up a handler for our adoption request like so:

express = require('express')
Adoption = require('./adoption') #note that the file extension is optional!
app = express.createServer()

app.set('view engine', 'jade')

app.use(express.bodyParser())

adoption = new Adoption('www.moshimonsters.com')

app.post('/adopt', (req, res) ->
    username = req.body.username
    password = req.body.password
    email = req.body.email
    adoption.start(username, password, email, (err) ->
        if(err)
            res.json(err.message, 500)
        else
            res.send()
   )
)

app.listen(3000)

We’re using a template language called jade for the client side HTML markup. Jade is a simple, lightweight alternative to html. Here’s an example straight from their website:

doctype 5
html(lang="en")
  head
    title= pageTitle
    script(type='text/javascript')
      if (foo) {
         bar()
      }
  body
    h1 Jade - node template engine
    #container
      if youAreUsingJade
        p You are amazing
      else
        p Get on it!

Jade is nice and lightweight, but you can also do more hardcore stuff like scripting in the templates if you wish. It supports layouts and partials and all kinds of other nice stuff. Check out the website to learn more about it!

The client side javascript for this tool is written in Coffeescript too! But how does the browser understand it? The answer is that it doesn’t – we have to compile it first into javascript. You could do this as part of the build, but we have a better solution available to us.

There is a middleware module for express called connect-assets. This middleware adds asset pipelining to connect, so that you can write your code in Coffeescript and it will compile it on the fly and serve it to the browser, without you having to do anything! It can even minify the resulting javascript. You add it like this:

connectAssets = require('connect-assets')

...

# set build to true to minify the js
app.use(connectAssets({build: false}))

…and then we add a macro into our jade template:

doctype 5
html(lang="en")
  head
    // add macro in the head of your html document
    != js('adoption')

    // rest of your markup below

…passing in the name of your Coffeescript file (minus the .coffee) extension.

Obviously this is not the whole of the source code of the tool, but hopefully it has been a taster of how awesome modern javascript development can be! In the last few years, javascript has gone from being an unliked toy language into something a lot more powerful and expressive. Here at Mind Candy we hope to leverage amazing new tools like node.js and coffeescript in our future work to allow us to become a more happy and productive development team!