Scala at Mind Candy

Reading some recent negative commentary about Scala with interest I felt like it would be good to share our experiences with Scala.

The Good.

Scala is an expressive language – It often results in a lot less code getting in the way of what you want to do or achieve. The simple example for something like this would be a simple bit of code like this:

case class Person(id: Int, name: String)
def lookupPerson(id: Int): Person = {
  new Person(id, "Test" + id)
} // Token implementation for the example.
val ids = Seq(1,10,100)
ids.filter(id => id < 50).map(lookupPerson)

To write this in Java would require a whole load of boilerplate, which includes the case class generating the stock class methods like toString, multiple lines to create a collection and then transforming a collection.

More powerful ways to write tests – This can fall under the Spiderman grouping of power admittedly, but libraries like specs2 and scalacheck make it easier to create the kind of tests we wanted. Scalacheck is the stand-out example of this where a generator for creating Person objects from above is as easy as this:

object TestGenerators {
  val personGenerator = for {
    id <- arbitrary[Int]
    name <- alphaStr
  } yield new Person(id, name)
}

That’s all it takes and that object can be imported into all of your test code that needs to generate Person objects.

Less magic – A lot of libraries like Spring and Hibernate need to use byte code modification or reflection to achieve what they do, which means that when they fail it can be really hard to diagnose or fix the problems. We’ve even seen some of these do things non-deterministically, which has caused hours of bemusement. Contrary to this, Scala libraries just tend to use the type system to achieve these ends which means that in our case we catch problems at compile-time, not at run-time.

The REPL – The idea scratchpad and general utility will be your friend in times of need. Use this as an education tool to step through an idea with some else. Use it to test some code interactively if you want to confirm something but you’re not quite sure how to code it or what results you’ll get. Use it to solve those gnarly Project Euler problems without having to create a whole new build for each one.

SBT – Controversial one this may be, but it manages to give you the sensible build model and plugin system that Maven has while allowing you to easily create custom tasks. If nothing else being able to run a command, for example the ‘test’ task, on each save is the most useful thing I’ve seen in a while.

POWAH! – There’s an elegance that comes with time when using Scala, in much the same way that it does with a lot of languages, that means code slots together so cleanly and with little friction. For me personally the Option class was the beginning of this change in thinking, where I realised that representing the possible lack of something without using a null made a lot more sense.

The Bad.

SBT – It’s a double edged sword in you’ll need to understand a bit of Scala to be able to do non-trivial configuration in it. Documentation for this has improved massively in recent times, it can still be somewhat impenetrable, especially to someone new to Scala.

Somewhat idiomatic libraries – Databinder Dispatch is a good example of this, writing a custom handler to parse a HTTP response is just unnecessarily puzzling. As with all libraries how easy they are to use and extend should be evaluated, so don’t be blinded by those libraries just because they’re written in Scala. It’s better to pimp a stock Java library that already works well than to use one that is badly written in Scala.

Binary compatibility – This is the stock issue that is often complained about, fortunately SBT does notice when two versions of the same library that relate to two different Scala versions are pulled into the dependencies. The way others have presented this is as a major pain point, it’s generally only so much of an issue as it is with Maven dependencies with a little more granularity. Also if you’re using SBT it’s possible to create dependencies that tie to the Scala version used automatically.

Knowledge – There’s a couple of aspects to this. The first is that Scala is a “new” language and as such there is one learning curve which relates to the language, SBT, the libraries and how to use them all effectively. Beyond this is that some functional programming concepts are foreign to a lot of programmers and this can be a wall that isn’t scalable in a short period of time for a lot of people. Hopefully with time this will become less of an issue but at the moment there aren’t a lot of Scala developers that can hit the ground running.

The Ugly?

As with all new things, there is a learning curve with Scala, which can be problematic, but the benefit of the design is that it’s possible to do something the “wrong way” as the language is very flexible. People with a history in languages like Java can start out writing code that looks not that much different but still get benefits like better collections. Then with time progress onto using more the powerful features in the language like pattern matching and implicits. For the foreseeable future Scala is a tool we intend to keep using, as it’s been of great benefit to us (this week I parsed a 37GB log file with a couple of lines of code in the REPL), maybe you should too…

99 Bottles of JMeter on the wall

I’ve recently had to do some performance testing on a couple of our new web services. I know of a few handy open source tools available for this. Tsung, Grinder and JMeter spring to mind. I find that I can get up and running in JMeter quicker than I can with the other tools and it tends to be the one I use most. One of the web services I wanted to test required some dynamic content to be generated in the body of the HTTP POST. Normally I’ve been able to get away with using the standard counters and config variables provided by JMeter, but this time I needed a bit more control. Luckily, JMeter allows you to do some scripted preprocessing of HTTP requests and I was able to use this to generate dynamic content within JMeter. I found the JMeter documentation to be a bit lacking in this area, so I created this blog post to give a quick explanation of how I managed to do this.

I’ve created a demo project that you can follow along with to see how it all works. Download the source code here: https://github.com/groodt/99bottles-jmeter Follow the README on GitHub to get everything setup and running. All you need is git, Python and JMeter. Open up the file “Test Plan.jmx” in JMeter to follow along.

The demo
The demo project is a simple web service that parses JSON payloads and prints a modified version of the “99 Bottles of beer on the wall” song onto the console. The JSON payload looks something like this:

{“drink”:”beer”, “bottles”:”99″, “date”:”1321024778956″, “thread”:”4″}

The server then parses these payloads and prints them out to the console:

JMeter then aggregates the response times in the summary report:

The HTTP POST
If you navigate to the “HTTP Request” node in the example you can see the JSON POST body being constructed:

The variables ${drink}, ${bottles}, ${date} and ${thread} are generated dynamically by a script that JMeter executes for each request.

The BSF PreProcessor
The BSF PreProcessor is run before each HTTP request to generate the dynamic content mentioned earlier. The BSF PreProcessor allows you to run Javascript, Python, Tcl and a few other languages inside JMeter. I decided to write my script in Javascript.

If you navigate to the “BSF PreProcessor” node in the example you can see the script that is used:

The Javascript
The simple Javascript basically places 4 variables in scope that are then available for JMeter.

// Calculate number
var count=vars.get("count");
var bottles=99-count;
vars.put("bottles",bottles);

// Calculate drink
var random=new Packages.java.util.Random();
var number = random.nextInt(4);
var drink = vars.get("drink"+number);
vars.put("drink", drink);

// Calculate date
var date=new Packages.java.util.Date().getTime();
vars.put("date",date);

// Calculate thread
var thread=ctx.getThreadNum();
vars.put("thread",thread);
  • In lines 1 to 4, the counter is read from JMeter then 99 is subtracted and the value is placed into scope under the name “bottles”.
  • In lines 6 to 10, a random number from 0 upto 4, it then uses this number as a lookup into the names of drinks (beer, wine, mead, cider) defined in the JMeter General Variables. It then stores this value in a variable named “drink”. It makes uses of java.util.Random to generate the random integer.
  • In lines 12 to 14, java.util.Date is used to generate a timestamp in milliseconds. This value is stored in a variable named “date”.
  • In lines 16 to 18, the JMeter thread number is read from the JMeter context and then stores this value into a variable named “thread”.

Executing Java libraries within the scripts
If you noticed in the scripts above, the Java libraries are exposed in JMeter under Packages.*. This allows you to execute any of the Java standard libraries or Java code in the classpath of JMeter. I think you can also write your own Java code and place it in the JMeter classpath and expose it in the same way.

Putting it all together
Putting all of that together gives you a handy way of doing reasonably complex performance testing in JMeter and I hope you find it useful.

NOSQL Exchange

This is a quick run through of the NOSQL exchange that Ciaran & I attended on Nov 2 at SkillsMatter, which featured 8 speakers and links to all talks are included.

A lot of people were asking which NoSQL solution to use?

This was the advice given by the speakers…. There is no silver bullet. Is there a need for reading/writing lots of Big data? Think about the shape of the data and how are you going to query your data to help understand which NOSQL solution fits best. Also understand the trade-offs when you choose your solution. Finally at the talks there was a lot of evidence of people using NOSQL solutions when a SQL solution would have sufficed.

1) THE STATE OF NOSQL TODAY by Emil Eifrem
This was the best talk of the day and anyone interested in NOSQL should watch the talk.

NOSQL stands for Not Only SQL.

Main types of NOSQL:

  1. Key-value originated from Amazon’s paper on Dynamo e.g. Riak, Voldemort (used in Linkedin)
  2. Column Family e.g. Cassandra, Hbase, Hyper table
  3. Document databases (most popular) descended from Lotus notes. e.g. CouchDb & MongoDb
  4. Graph Databases (nodes with properties) originated from Euler and Graph theory. e.g. infinitegraph, Neo4J

Documents are superset of Key-values. Graphs are supersets of documents and thus all others. Does this imply you should use Graph NOSQL solutions for all your NOSQL concerns? The graph NOSQL advocates think so.

Trends:

  • Acidity is increasing e.g. MongoDb adding durable logging storage, Cassandra adding stronger consistency
  • More query languages – Cassandra -CQL, CouchDb UnQL, Neo4J – Cyper, Mongo.
  • Potentially more schemas?

NoSql challenges:

  • Mindshare
  • Tool support
  • Middleware support

Oracle now adopting NOSQL with a KeyValue solution despite debunking NOSQL in May this year. NOSQL seems to be following similar historical trends to SQL. SQL which had many vendors to begin with, over time resulted in 4 large vendors. Could NOSQL result in a similar situation in the near future?

2) HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS by Russell Brown
Key quote from this talk: “Large systems are always in some degree of failure”

Problem: According to CAP: Consistency, Availability & Partition tolerance – you can’t have all 3. Have to compromise by picking 2.
PACELC:
In the case of a partition (P), trade availability (A) for consistency (C)
Else (E) trade latency for consistency (C)

Riak inspired by Dynamo. Built in Erlang/OTP. Has features such as MapReduce, links, full text search. Uses vector clocks not timestamps. Statebox for automation of resolving conflicts.
Uses a wheel for storing clustered data.

3) MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL by Brendan McAdams (creator of Casbah)

MongoDb is not suited for highly transactional applications or ad-hoc intelligence that requires SQL support. MongoDb resolves around memory mapped files. Mongo has an autosharding system.

Things to remember:
The datastore is a servant to the application not vice-versa
Don’t frankenshard

4) REAL LIFE CASSANDRA by Dave Gardner (from Cassandra user group)

  • Elastic – Read/Write throughput increases as scale horizontally.
  • Decentralised no master node.
  • Based on Amazon’s Dynamo paper
  • Rich data set model
  • Tunable
  • High write performance

If your requirements are big data, high availability high number of writes then use Cassandra.
When data modelling, start from your queries and work backwards.
Has expiring columns.
Avoid read before write & locking by safely mutating individual columns
Avoid super columns, instead use composite columns
Use Brisk (uses hadoop) for analysing data directly from Cassandra cluster.

5) DOCTOR WHO AND NEO4J by Ian Robinson
Although it was a fairly slick presentation it seemed to focus too much on modelling Doctor Who and his universe as a working example of graphs & Neo4J. Could this be to hide some shortcomings in Neo4J?

  • Neo4J is a fully ACID replacement for Mysql/Oracle.
  • Neo4j is a NOSQL solution that tries to sell itself as the most enterprise ready solution.
  • Has master/slave nodes.
  • Has 3 licenses: Community/Advanced/Enterprise.

With mentions of 2 phase commits, other than the advantage of modelling relationships such as social networks, there seemed little benefit from moving away from a relational database.
Having spoken to the Neo4J guys afterwards, it seems that the DB loses its ACIDity once you cluster it, and becomes another eventually-consistent store!

6) BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT by Aleksa Vukotic

CouchDb:

  • Written in Erlang has Lift support (Scala framework)
  • Exposes REST/JSON endpoints
  • Eventually consistent
  • Versioning appends only updates
  • Mapreduce for querying using views

7) ROBERT REES ON POLYGLOT PERSISTENCE
A muddled presentation talking about mixing graph NOSQL solution with a document based one.

8) THE FUTURE OF NOSQL AND BIG DATA STORAGE by Tom Wilkie
Rather than using the out of the box storage engines for NOSQL solutions, there can be dramatic throughput gains for using alternative storage engines such as Tokutek and Acunu (Castle).

Flash on the Beach 2011

Last month I went to Flash on the Beach in Brighton. It was the first time I’ve to this particular event and I really enjoyed a lot. For the few of you that don’t know, FOTB is an annual conference full of code, design and inspiration.

The three days in Brighton were full of inspiring stuff, like Carlos Ulloa new WebGL project, Lights, the Seb Lee-Delisle live experiment mixing OpenFrameworks, HTML5 and Phones, as well as lots of new people of the Elevator Pitches. Just check out the impressive main titles video created by Gmunk, creator of some of the FX for Tron Legacy.

Adobe showed some of the new stuff like EDGE, a Flash-like productivity tool to create HTML 5 canvas animations, and talked about Molehill/Stage 3D. They also talked about a new hardware accelerated framework for 2D animation in Flash recently released under the name of Starling. But all that doesn’t mean that that the “Flash is over” controversy wasn’t in the air, just check out the slide of Adobe’s official position on the question “When must I use Flash and when not?”.

Adobe posture on the Flash-HTML debate

Adobe posture on the Flash-HTML debate

Remy Sharp talked about this question too in “HTML5: Where Flash isn’t needed anymore”. He spoke about the fallback solutions when some features don’t work (usually using Flash), support of different parts of the specification in different browsers/platforms and the pitfalls that we will find on our path embracing this new standard.

In the closing speech John Davey, the creator and main organiser of the event, revealed that this is going to be the last Flash on the Beach! But don’t panic yet! It’s just a change of title and (maybe) format, to allow the conference to not be related with a single technology (i.e. the Flash platform) and be more based on design and creative coding (maybe something like the move of Flash Belt to The Eyeo Festival?)

Of course I’m leaving out a lot of things, like the interesting chat about kids games playability done by Jon Howard, or the demos of the new Away 3D using the new Stage 3D API, so don’t forget to check out the list of speakers and check out their individual sites.

Unite11 – Exporting to Flash from Unity

Last week, four of the Mind Candy tech team ventured over to San Francisco for the Unite conference. One of the things I was most keen to learn about is the upcoming Flash export.

On the Tuesday, we attended the Flash afternoon where Lucas Meijer and Ralph Hauwert detailed the latest progress on Unity’s export to Flash. On first impressions, the games they’ve been working on look fantastic! One of the reasons I’ve been keen on working with Unity over Flash for 3D games dev is the awesome toolset, and these guys have really nailed the Flash export.

As long as your code isn’t doing anything too “exotic”, Unity will convert your code (c#, js (strict) and boo) to ActionScript. It will then compile this, along with your assets, into a swf using the mxmlc compiler. One thing to note here is that literally everything is compiled into that one swf. There was mention of them using asset bundles at a later date to generate individual swfs for content such as scenes. Since everything is in one swf, that means file sizes could get pretty large really quickly. One of the demo apps generating simple spheres compiled to a ~1.4mb (iirc) swf which is pretty chunky for something so simple.

The generated AS3 code is pretty readable although since there’s no method overloading in Flash, they’ve had to compromise on function names somewhat! That said, it should be pretty easy to debug your exported app.

The features currently supported in Flash are:

  • PhysX
  • Particle system
  • Custom and standard shaders

Things that won’t work in the first version include:

  • Anything needing depth textures
  • Advanced audio such as doppler and reverb
  • Dynamic shadows
  • Mouse lock
  • Unity networking (you’ll have to use a Flash networking solution for now)

Looking at the performance of the games, they seemed to run pretty well. They mentioned trying to push anything you can to the GPU rather than CPU. An example given would be to animate textures in a shader rather than scrolling it in a script, as well as the usual light baking and occlusion culling. Basically, as Lucas put it: “The best way to make your game go faster, is to have it not do stuff”.

So to summarise – from a first glance, the Flash export looks pretty awesome, and the Unity guys have done a great job getting it working. There are still a fair few features missing, but there should be enough there for most simple Unity games to be exported. Unfortunately, both the release date and the price were listed as “TBA”, so I guess you’ll have to wait a little longer before you can try it out.

Learning to K.I.S.S (part 2)

In the last post we discussed the importance of immutable objects in a code base and took a critical look at some common techniques for creating immutable objects.  In this post we’ll be looking at some design patterns/ idioms that try to get around those problems whilst still maintaining immutability.  I highly suggest reading the previous post, found here before continuing.

The Parameter/ Data/ Value Object (aka The Object of many names)

In the previous post we saw some problems with passing multiple arguments through in a constructor; that it is difficult to read and prone to error.  That code can be re-factored using ‘Introduce Parameter Object’ you can find more information on this in Martin Fowlers refactoring guide.

  1. Create a value object which temporarily holds all the values required to initialize your object.
  2. Pass this object into the constructor of your object
  3. Set private fields using the data object.
  4. Don’t keep a reference to the data object.

Note step 4: a common mistake would be to hold a reference to the value/ parameter object, however this would void your mutability because an external class could set values on the parameter object and make returned values unpredictable.

This technique is considered fairly expensive because with every new immutable object you must also create a value/ parameter object.  In addition this method does not hide away the implementation details like a factory would, however it can be used nicely in conjunction with a factory.  This pattern does encourage code reuse as VO’s can be pooled if they are frequently used, or using something like the flyweight pattern.

/**
* An example showing incorrect usage of the Value Object pattern
*/
public class MonsterController {

    private var monster : MonsterVO;

    public function MonsterController (monster : MonsterVO)
    {
        this.monster = monster;
    }
}

public class GameController {

    public function GameController()
    {
        const monster : MonsterVO = new MonsterVO();
        monster.setIsScary(true);
        monster.setName("trulyScaryMonster");

        new MonsterController(monster);
        monster.setIsScary(false); //not immutable, we've just set scary after construction!
    }
}

/**
* An example showing correct usage of the Value Object pattern
**/
public class MonsterController {

    private var name : String;
    private var isScary : Boolean
    public function MonsterController (monster : MonsterVO)
    {
         name = monster.getName();
         isScary = monster.getIsScary();
        //aha any changes to the monster object will not affect this class!
    }
}

The builder

The builder by name is designed for construction only and uses a fluent interface to build an object.  It abstracts complex construction logic and produces an immutable object once built. The pattern adds a layer of security to your code; input values may be validated before an object is constructed. In my opinion this is a better solution than a value object because of the incredibly descriptive interface and validation.

This pattern does not need to add superfluous code to the codebase; since the builder and its object are often tightly coupled, it is fine to create an internal class. The code below is based on the builder pattern described by Joshua Bloch in his book Effective Java. Note that actionscript does not support nested classes, (there are work arounds, see this interesting post!), we can however add classes within the same file and internal fields will be accessible by classes only within the same file.

package com.mindcandy.example.immutability
{
    public class Monster
    {
        private var name : String;
        private var isScary : Boolean;

        public function Monster(builder : MonsterBuilder)
        {
            name = builder.name;
            isScary = builder.isScary;
        }

        public static function getBuilder() : MonsterBuilder
        {
            return new MonsterBuilder();
        }

        public function toString() : String
        {
            return " test Monster: " + this.name + " is scary : " + this.isScary;
        }
    }
}

import com.mindcandy.util.preconditions.checkNotNull;
import com.mindcandy.example.immutablity.Monster;
class MonsterBuilder {
       internal var name : String;
       internal var isScary : Boolean;

       public function setName(name : String) : MonsterBuilder
       {
            this.name = name;
            return this;
        }

        public function setIsScary(scary : Boolean) : MonsterBuilder
        {
            this.isScary = scary;
            return this;
        }

        private function validate() : void
        {
           checkNotNull("name must be set for monster", name);
        }

        public function build() : Monster
        {
            validate();
            return new Monster(this);
        }
    }

Another nice builder style pattern is the Immutem pattern which also delays instantiation of an object until initialization parameters have been set.  In my opinion it is inferior to the builder as it lacks the fluent interface (a major advantage) of a builder.

The immutem pattern cannot be implemented in actionscript because the language does not support nested static classes.  A work around by including a private class in the same file (see example above) would still not work because the ‘nested’ class would not have access to the parents private fields. This pattern highlights some of the limitations of actionscript in comparison with older languages such as Java.

A criticism of these patterns are that they are great for code-generator tools but not so great for humans, I do however, tend to disagree with this statement as they add a great deal of value in terms of readability which far out-weighs the additional cost of maintenance (providing the construction is complex enough to warrant a builder).

Immutability via Bubble Wrap

A number of patterns can be used to wrap up your object in an protective bubble wrapped blanket of immutability.  These patterns are generally more flexible and can be coded to allow different access levels and permissions for the object.

The immutable wrapper interface wraps an object in an interface which provides only getters thus the object appears immutable.  I say appears because the fundamental weakness of this pattern is that objects may be upcast and then mutated.

public interface ImmutableWrapper
{
    function getGrowl() : GrowlType
}

public class MutableObject implements ImmutableInterface
{
    private var growl : GrowlType;

    function getGrowl() : GrowlType
    {
        return growl;
    }

    function setGrowl(growl : GrowlType) : void
    {
        this. growl = growl;
    }
}
//abuse of the interface
const immutableObject : ImmutableInterface = getWrappedObject();
(immutableObject as MutableObject).setGrowl(GrowlType.CHEEKY);

Patterns that deal with composition to filter access do not suffer from the same weakness as the immutable wrapper.  Both the proxy and decorator can be used to wrap a mutable object and limit access to fields, they are not immutable patterns as such, but control data access.  Both are able to control access to an object via composition, by storing a reference to the mutable object the proxy or decorator expose an api which can make decisions as to which method calls to foward to the object.  Note typically the decorator pattern would add responsibilities to an object, whereas the proxy would control access.

These particular patterns become a preferred choice when we are trying to limit mutability by controlling various access levels or mutating according to state.  The protection proxy pattern could control access by an agreed list of friends or by the current state of the application.

Consider a document where users have different access privilidges, i.e read-only, read-write, etc.

public final class DocumentProxy
{
    private var mutableDocument : Mutable;
    private var currentState : StateType = StateType.MUTABLE;

    public function saveText(text : String) : void
    {
        if(StateType.MUTABLE)
        {
            mutableDocument.addText(text);
        }
        else
        {
            application.displayWarning("you do not have write privilidges");
        }
    }
}

At this point we are starting to delve into access patterns which will control access to mutable objects, thus limiting mutability, however the objects are not immutable. Limiting mutability is still a good idea and can lead to high encapsulation, predictable state and cohesivness.  Its beyond the scope of this post to delve into those patterns, but examples include the proxy pattern, friendship and locking patterns where conditions prevent or grant access to an object, perhaps depending on a key or predefined friendship.

Frozen code

Popsical immunity describes an object which may continue to mutate until it is set to a frozen state; at this point the state is frozen to further mutations.  The object may have fields mutated multiple times until this state has set and may be useful in scenarios where we are building up a complex object involving multistage calculations (although usually here a builder would do). Popsical immunity could be achieved using the proxy pattern and would follow the following steps:

  1. Create a mutable object
  2. Mutate the values until required state is reached
  3. Create a proxy for the object
  4. Make the object available only via the proxy and throw away references to the original object

Another popular pattern explicitly sets a locked state and performs validation checks at mutating methods.  This code can be quite messy as the multiple checks can add a lot of noise to the code, although this could be abstracted to a helper method.  The following code is an example of how this check might be achieved, although for the purposes of brevity the null/ undefined checks are incomplete.  Other examples are more explicit in their checks and simply set a ‘frozen’ boolean field.

final class Monster
{
    private var growl : GrowlType;

    public function getGrowl() : GrowlType
    {
        return growl;
    }

    public function setGrowl(growl : GrowlType) : void
    {
        Popsical.setOrThrow(growl, value);
    }
}

//Immutable object helper class
public class Popsical
{
    public static function setOrThrow(property, value) : void
    {
        if(property.isNull)
        {
            property = value;
        }
        else
        {
            Log.warn("could not set property on immutable value + property + value");
        }
    }
}

Personally I don’t think this is a great way to enforce immutability and any of the afore mentioned patterns would be a better substitute.  This method would need very clearly documenting and is likely to confuse an API; how is the programmer to know that this object can be set only once? The implementation of this pattern could lead to either:

  • incomplete construction of objects via the example above
  • or, using the freeze method, objects which are not frozen due to a forgetful programmer creating a very difficult to trace bug.

Documentation

Revisiting the reason why we are striving for immutablity: we are trying to communicate, secure and protect the code by limiting state.  If for some reason immutability is unreasonable to implement or refactor then documentation can provide a fail safe.  Well documented code with descriptive methods can at the very least signal some author intent and warn against mutating an object.

/**
* @param growl this should really only be set once because ...
* TODO : refactor this class to be immutable
*/
public function setGrowl(growl : GrowlType) : void

Your documentation may not get read, but at least you tried!

Summary

Immutable objects won’t save you from every pitfall, but its a good place to start. Not all objects should be immutable : view objects being the first that springs to mind.  However thinking about the implementation of immutable objects should in the very least get you thinking about object encapsulation and SRP.  Code should feel simpler, cleaner and easier to read.

Our dev team prides itself on the simplicity of its code; any level of programmer can jump onto an area of the codebase and quickly start working.  In my opinion this is a worthy ongoing goal : let’s K.I.S.S!

The topics discussed in this blog post are an amalgamation of techniques and ideas discussed by various industry experts which I cannot take credit for, I heartily recommend the books listed below!

Accessing the Twitter Streaming API with OAuth

I was doing some investigation into a few datastores for a project here at Mind Candy. I wanted to do a small realistic test so that I could assess the performance, ease of use and suitability of the datastores for the project. I needed some sample data to use for these tests and in the end I decided that it would be fun to use the Twitter Streaming API to give me some random data with a realistic payload. I would have loved to use the full Twitter Firehose, but they aren’t handing out access to that anymore. In the end I settled on using the sample stream that gives you a random sample of public statuses. This blog post describes the steps required to access the Twitter Streaming API, along with a sample script that shows how to do it.

If you have not used the Twitter Streaming API before, there are 2 ways it can be accessed. There is the easy way (HTTP Basic), and the hard way (3-legged OAuth). This post is about the hard way, because the hard way is more fun! Also, Twitter is apparently going to stop supporting HTTP Basic at some point, so its worth knowing how to access their APIs with OAuth (Open Authorization).

3-legged OAuth in a nutshell
OAuth is a delegated Authorization protocol. This blog post describes OAuth 1.0a 3-legged OAuth and explains it with a simple script. The 3 ‘legs’ are the User, the Consumer and the Service Provider. Using the script in this blog post as an example: the User is you, the Consumer is the script itself and the Service Provider is Twitter. OAuth enables the script (the consumer) to access Twitter resources on your behalf (only if you give it consent), without you giving the script your Twitter credentials. This is very powerful and is the protocol that enables you to link 3rd-party applications to your Twitter accounts in a safe way. The User is given complete control in the authorization flow and is able to revoke access as well. If you want to find out more about OAuth there are some comprehensive docs here: The Authoritative Guide to OAuth 1.0. There is also a new OAuth 2.0 specification brewing that is being used by Facebook and others already although the specification is still in draft. The concepts and approach used in OAuth 2.0 are still similar to OAuth 1.0, so you will be prepared for this new specification when it is more widely adopted.

Please! Show me the easy way, OAuth sounds scary!
OAuth is not scary, it’s badly explained. But ok, if you only want to know the quick and easy way to access the Streaming API with HTTP Basic, here it is:

curl http://user:pass@stream.twitter.com/1/statuses/sample.json

Not very satisfying is it? You also wouldn’t be able to use this method in an App or Web App since nobody should be giving out their Twitter username and password to you or a 3rd party. Lets take a look at how to do this using OAuth.

Following along
I have created a small script on github that demonstrates how to access the streaming API using 3-legged OAuth. The source can be found on github: https://github.com/mindcandy/twitter-oauth-streaming

To get the script working, you need to do the following:

  • Clone the git repository
  • Setup the Python dependencies described in the README
  • Setup a Twitter application as described in the README
  • Run the script and authorize it with your Twitter credentials
  • You should now see a stream of tweets flying by on your console
  • Use a green-on-black console theme and watch out for glitches in the Matrix!

I hope it’s not too tricky to get the script running. If it works, you should be seeing tweets fly by on your console. I can try help you get it working if you run into trouble. I’ve got it working on OSX Snow Leopard and Ubuntu 11.04.

Lets look at the code
To follow along, I’m going to assume that you have cloned the repo on github and have followed the steps in the README to get the code running. I’ve added lots of comments and debug output so that hopefully its easy to understand what the script is doing.

Entry point of the script
This code snippet is where the execution starts.

if __name__ == '__main__':
    # Check if we have saved an access token before.
    try:
        f = open(ACCESS_TOKEN_FILE)
    except IOError:
        # No saved access token. Do the 3-legged OAuth dance and fetch one.
        (access_token_key, access_token_secret) = fetch_access_token()
        # Save the access token for next time.
        save_access_token(access_token_key, access_token_secret)

    # Load access token from disk.
    access_token = load_access_token()
  • Line 123 – A simple piece of code that attempts to open a local file to read a saved access token.
  • Line 126 – If the file does not exist, it means the script has not been authorized and the OAuth authorization flow needs to happen. ‘fetch_access_token’ is the important method and we will look at it later.
  • Line 128 – The access token is saved to a file for future use.
  • Line 131 – The access token is loaded from disk. The ‘access_token’ is the important piece in the puzzle and it is used to sign requests so that the application can access Twitter on the User’s behalf.

Getting User consent to access Twitter on their behalf
This code snippet is the most important function in the script. It shows the steps needed for 3-legged OAuth.

def fetch_access_token():
    client = oauth.Client(CONSUMER)

    # Step 1: Get a request token.
    resp, content = client.request(TWITTER_REQUEST_TOKEN_URL, "GET")
    if resp['status'] != '200':
        raise Exception("Invalid response %s." % resp['status'])
    request_token = dict(urlparse.parse_qsl(content))
    print "Request Token:"
    print " oauth_token = %s" % request_token['oauth_token']
    print " oauth_token_secret = %s" % request_token['oauth_token_secret']

    # Step 2: User must authorize application.
    auth_url = "%s?oauth_token=%s" % (TWITTER_AUTHORIZE_URL, request_token['oauth_token'])
    webbrowser.open_new_tab(auth_url)
    print "Go to the following link in your browser:"
    print auth_url
    pin = raw_input('What is the PIN? ')
    token = oauth.Token(request_token['oauth_token'],request_token['oauth_token_secret'])
    token.set_verifier(pin)

    # Step 3: Get access token.
    client = oauth.Client(CONSUMER, token)
    resp, content = client.request(TWITTER_ACCESS_TOKEN_URL, "POST")
    if resp['status'] != '200':
        raise Exception("Invalid response %s." % resp['status'])
    access_token = dict(urlparse.parse_qsl(content))
    print "Access Token:"
    print " oauth_token = %s" % request_token['oauth_token']
    print " oauth_token_secret = %s" % request_token['oauth_token_secret']
    return (access_token['oauth_token'], access_token['oauth_token_secret'])
  • Line 69 – Setup an OAuth client. This is an OAuth aware HTTP client from the oauth2 module. This is used later to make the various requests needed for the OAuth authorization flow.
  • Line 72 through 75 – The first step in the 3-legged OAuth flow. A request token is retrieved from Twitter. This is an unauthorized request token.
  • Line 81 through 84 – The second step in the 3-legged OAuth flow. The User must authorize the script with Twitter. The request token from step 1 is sent along to Twitter so that it can be authorized.
  • Line 85 – Capture the pin that Twitter provides. This pin is used to authorize the request token from step 1. If this was a web application, a callback url would be used and this manual step would not be needed.
  • Line 86 through 87 – The request token from step 1 is authorized with the verification pin from step 2. The request token is now authorized.
  • Line 90 – The third step in the 3-legged OAuth flow. An OAuth client is setup with the authorized request token from step 2.
  • Line 91 through 94 – The authorized request token from step 2 is exchanged for an access token. This token is what the script needs to access Twitter resources on behalf of the user.
  • Line 98 – Return a tuple containing the access token key and secret. We are now ready to stream!

Creating an Authorization header to access Twitter
Now that we have the access token, we can access Twitter on behalf of the User. If we were not accessing the streaming API, we could use the access token and the client from the oauth2 module to make synchronous HTTP requests to User resources. We can’t do this unfortunately, since we are accessing the streaming API which keeps an HTTP connection open as the Tweets stream by and we would not see any output. To get around this, we will be using the Twisted event-driven framework and manually signing the HTTP request with our access token. To do this, we need to sign our request with the access token and then capture the header we need to send to Twitter.

def build_authorization_header(access_token):
    url = "http://%s%s" % (TWITTER_STREAM_API_HOST, TWITTER_STREAM_API_PATH)
    params = {
        'oauth_version': "1.0",
        'oauth_nonce': oauth.generate_nonce(),
        'oauth_timestamp': int(time.time()),
        'oauth_token': access_token.key,
        'oauth_consumer_key': CONSUMER.key
    }

    # Sign the request.
    req = oauth.Request(method="GET", url=url, parameters=params)
    req.sign_request(oauth.SignatureMethod_HMAC_SHA1(), CONSUMER, access_token)

    # Grab the Authorization header
    header = req.to_header()['Authorization'].encode('utf-8')
    print "Authorization header:"
    print " header = %s" % header
    return header
  • Line 111 through 112 – Create and sign our request using our Consumer keys and access tokens. This indicates to Twitter what application is accessing the API and which User authorized the access.
  • Line 115 – Convert the signed request to an Authorization header. We will use this header to access the streaming API.

Stream the tweets
We now have everything we need to stream Tweets from Twitter. This code is all Twisted asynchronous code. I’m not going to explain Twisted in this post because this post is getting quite long. Also, its not OAuth specific. You can take a look at the code on github, but all it is doing is using the Authorization header we created earlier to connect to Twitter. It then prints Tweets line-by-line as they are streamed from Twitter.

    # Twitter stream using the Authorization header.
    twsf = TwitterStreamerFactory(auth_header)
    reactor.connectTCP(TWITTER_STREAM_API_HOST, 80, twsf)
    reactor.run()
  • Line 137 – Create a TwistedStreamerFactory using our Authorization header. Twisted now has everything it needs to access Twitter.
  • Line 138 through 139 – Start the Twisted reactor to print the Tweets out to the console.

Thats it!
I hope you have found this post interesting and useful. Please feel free to fork the project on github and play around with it. You could tweak it to make it a lot more interesting. Add the ability to filter or track specific keywords or geotagged Tweets. You can even use the same OAuth code to access the regular Twitter REST APIs, so you can write apps to access Users, Trends, Timelines etc. There’s also a whole bunch of other OAuth APIs for you to explore if you are interested e.g. LinkedIn, SimpleGeo, SoundCloud etc. Have fun!

Learning to K.I.S.S (part 1)

State management is something that all developers must tackle, from building complex multi-player games to RIA’s to micro games.  State is constantly changing and the way we deal with that could determine the success or failure of a project and certainly the amount of firefighting (bug fixing) to get the project into a releasable state. Immutability is a tool for managing state with an application; immutable objects are objects that do not change state beyond initial construction. The title of this post, ‘Learning to K.I.S.S’ (keep it simple stupid), describes an important benefit of immutable objects; they will greatly simplify your application.

  • Objects which are immutable are predictable, programmers can be assumptive about an object because of its encapsulated final state.
  • An Immutable object clearly shows the intent of the programmer, an immutable Tween object, for example, in a Tweening API describes a clear and direct transfer of information to the tweening engine.
  • Objects which are mutable require more thought and design decisions relating to access rights and friendship amongst objects and so add complexity to systems.
  • An codebase which utilises the power of immutable objects will be safer; inexperienced programmers are protected against bad programming decisions, simply, because they are unable to make them.

In this two part post I will be discussing how we implement immutability for a variety of scenarios in Actionscript and using various techniques.  I hope that the examples will convince you of the benefits of striving for immutability in your objects where it is possible.

Starting at the beginning

So you’ve identified a need, and created a class, but you must give this class some state and set some data.  Unfortunately, unlike our brothers and sisters in the Java world, we are missing the ‘final’ keyword for variable declarations.  In Java the final keyword allows you to define a variable once, and only once.  Unlike constants in Actionscript, final fields can be declared after the variable declaration in the constructor which is incredibly useful in creating immutable objects.


/**
 *  An example illustrating how to create
 *  immutable objects in Java using the 'final' keyword
*/
public class Monster {

    final String name;
    final boolean isScary;

    public Monster(String name, boolean isScary) {
       this.name = name;
       this.isScary = isScary;
    }
}

These variables cannot be accidentally overwritten as they have already been set, any attempts to reset any of these variables would result in a compile time exception. In Actionscript we can declare our immutable field as constant or private, however each have their limitations. Constants are great if you know the value ahead of time, and this method of creating enumerated types in Actionscript is a great way of utilising that power. However in most cases you’ll simply want to declare your field private and set its state within the constructor. Private fields in Actionscript cannot be accessed or overridden by subclasses, so in many ways this provides some security, you are however not protected from yourself.  If your class accidentally sets a private field twice, it will not be obvious.

Managing state with Getters and Setters

It is common place to use setters to mutate private fields, however unless you are using the data transfer/ value object pattern (discussed further down), this is generally considered a bad idea. The problem with objects constructed with getters and setters is that objects do not have to be fully initialized.  This initialised state can be changed at any point in the application and is predictable. The broken encapsulation via public setters means that the object is open to abuse by friend objects. Objects that still want to provide this functionality but maintain some immutability use idempotent methods: the returned value will always be the same given the same input.  A setter will return a new object with the new value and cloned values from the parent. A good example of this technique is an immutable list. Calling a method add(objects) : * on an immutable list, will return a new list including the newly added object. The code might look something like:


public function add(value : *) : *
{
	checkNotNull(value, "cannot add a null value to the list");
	return new ImmutableList(value, this);
}

This idiom comes with a warning: setting up an object results in superfluous object creation and consequentially a strain on the garbage collector. However it is worth noting that this type of object creation would usually be part of setup and therefore not a big part of the application runtime.

A word on namespaces

One use for namespaces is to describe access rights between classes, in particular when those classes are between packages and access cannot be described using existing namepaces. Although namespaces cannot be used to create strict immutablity, they can be used to give contextual immutability:

Consider a parser and a model; the parser reads some data, creates and stores that in the model.  The parser is only used when more data is requested by a service.  The model contains a number of setters designed for use by the parser and allows the model to be build up over time (perhaps the data cannot be set at once, or there is too much data for the constructor). Defining a custom namespace between the parser and the model will describe an access intent and relationship between parser and model’s setter methods.

/**
* An example illustrating contextual immutability between a model and parser
*/
public class Monster {

    namespace SetMonster;

    private var name : String;
    private var isScary : Boolean;

    SetMonster function setName(name : String) : void
    {
        this.name = name;
    }
    //...etc
}

public class MonsterParser {

    /**note the :: syntax indicating we are using a namespace, for more info see
    *adobes http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/Namespace.html
    *returns a Monster object which is added to a list of Monsters stored in the app model
    **/
    public function extractData(data) : Monster
    {
        const monster : Monster = new Monster();
        monster.SetMonster::setIsScary(data.isScary);
        monster.SetMonster::setName(data.name);
        return monster;
    }
}

In this context a Monster is intended to be immutable because only the parser has access to the setters and a new Monster will be instantiated each time data is extracted; given another context and you are just defining access points. A gotcha with this pattern is that anyone can use a namespace as long as its known about/ documented.  If your API is packaged up chances are this namespace will remain unknown and it would take effort on behalf of a user of the API to find and abuse it.

Object construction

One of the most basic ways we can both pass state into our object and ensure its immutability is by using the constructor to pass arguments into our object. These arguments are used to initialize the state of the object by setting various private fields. This approach is fine given two or even three fields, but after that quickly becomes dirty. Multi parameter constructors are difficult to read once coded and make for a confusing API prone to error:

Imagine a constructor passing in several strings and ints, reading code which instantiates this object is difficult without the help of documentation or an intelligent code hinting system in the IDE: “which string is which, I can’t remember”, says the confused developer.

const testObj : TestObject = new TestObject('target1', 12, 'target2', 34, 'target3', 23, 'target4', 43, 0, 16);

A good rule of thumb is set by Robert Martin “When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own”. Some API’s try to get around the multi parameter constructor by accepting an object containing optional parameters such as TweenMax and Away3D. Constructing a tween in Away3D may look like :

var sphere:Object3D = new Sphere({material:"red#", name:"sphere", x: 300, y:160, z: 300, radius:150, segmentsW:12, segmentsH:9});

In my opinion this later approach is no better than the former.  You may have named parameters which makes object construction more descriptive, but the programmer will get no code completion from the IDE and is forced to continually refer to documentation (unless this programmer is a robot who has memorized the API by heart!).

Summary

In this post we have looked at some of the limitations of the language and  some of the most common object construction techniques.  In the next post we’ll be looking at other slightly more complex design patterns and idioms that attempt to create immutable objects whilst getting around some of the limitations of other techniques.