Accessing the Twitter Streaming API with OAuth

I was doing some investigation into a few datastores for a project here at Mind Candy. I wanted to do a small realistic test so that I could assess the performance, ease of use and suitability of the datastores for the project. I needed some sample data to use for these tests and in the end I decided that it would be fun to use the Twitter Streaming API to give me some random data with a realistic payload. I would have loved to use the full Twitter Firehose, but they aren’t handing out access to that anymore. In the end I settled on using the sample stream that gives you a random sample of public statuses. This blog post describes the steps required to access the Twitter Streaming API, along with a sample script that shows how to do it.

If you have not used the Twitter Streaming API before, there are 2 ways it can be accessed. There is the easy way (HTTP Basic), and the hard way (3-legged OAuth). This post is about the hard way, because the hard way is more fun! Also, Twitter is apparently going to stop supporting HTTP Basic at some point, so its worth knowing how to access their APIs with OAuth (Open Authorization).

3-legged OAuth in a nutshell
OAuth is a delegated Authorization protocol. This blog post describes OAuth 1.0a 3-legged OAuth and explains it with a simple script. The 3 ‘legs’ are the User, the Consumer and the Service Provider. Using the script in this blog post as an example: the User is you, the Consumer is the script itself and the Service Provider is Twitter. OAuth enables the script (the consumer) to access Twitter resources on your behalf (only if you give it consent), without you giving the script your Twitter credentials. This is very powerful and is the protocol that enables you to link 3rd-party applications to your Twitter accounts in a safe way. The User is given complete control in the authorization flow and is able to revoke access as well. If you want to find out more about OAuth there are some comprehensive docs here: The Authoritative Guide to OAuth 1.0. There is also a new OAuth 2.0 specification brewing that is being used by Facebook and others already although the specification is still in draft. The concepts and approach used in OAuth 2.0 are still similar to OAuth 1.0, so you will be prepared for this new specification when it is more widely adopted.

Please! Show me the easy way, OAuth sounds scary!
OAuth is not scary, it’s badly explained. But ok, if you only want to know the quick and easy way to access the Streaming API with HTTP Basic, here it is:


Not very satisfying is it? You also wouldn’t be able to use this method in an App or Web App since nobody should be giving out their Twitter username and password to you or a 3rd party. Lets take a look at how to do this using OAuth.

Following along
I have created a small script on github that demonstrates how to access the streaming API using 3-legged OAuth. The source can be found on github:

To get the script working, you need to do the following:

  • Clone the git repository
  • Setup the Python dependencies described in the README
  • Setup a Twitter application as described in the README
  • Run the script and authorize it with your Twitter credentials
  • You should now see a stream of tweets flying by on your console
  • Use a green-on-black console theme and watch out for glitches in the Matrix!

I hope it’s not too tricky to get the script running. If it works, you should be seeing tweets fly by on your console. I can try help you get it working if you run into trouble. I’ve got it working on OSX Snow Leopard and Ubuntu 11.04.

Lets look at the code
To follow along, I’m going to assume that you have cloned the repo on github and have followed the steps in the README to get the code running. I’ve added lots of comments and debug output so that hopefully its easy to understand what the script is doing.

Entry point of the script
This code snippet is where the execution starts.

if __name__ == '__main__':
    # Check if we have saved an access token before.
        f = open(ACCESS_TOKEN_FILE)
    except IOError:
        # No saved access token. Do the 3-legged OAuth dance and fetch one.
        (access_token_key, access_token_secret) = fetch_access_token()
        # Save the access token for next time.
        save_access_token(access_token_key, access_token_secret)

    # Load access token from disk.
    access_token = load_access_token()
  • Line 123 – A simple piece of code that attempts to open a local file to read a saved access token.
  • Line 126 – If the file does not exist, it means the script has not been authorized and the OAuth authorization flow needs to happen. ‘fetch_access_token’ is the important method and we will look at it later.
  • Line 128 – The access token is saved to a file for future use.
  • Line 131 – The access token is loaded from disk. The ‘access_token’ is the important piece in the puzzle and it is used to sign requests so that the application can access Twitter on the User’s behalf.

Getting User consent to access Twitter on their behalf
This code snippet is the most important function in the script. It shows the steps needed for 3-legged OAuth.

def fetch_access_token():
    client = oauth.Client(CONSUMER)

    # Step 1: Get a request token.
    resp, content = client.request(TWITTER_REQUEST_TOKEN_URL, "GET")
    if resp['status'] != '200':
        raise Exception("Invalid response %s." % resp['status'])
    request_token = dict(urlparse.parse_qsl(content))
    print "Request Token:"
    print " oauth_token = %s" % request_token['oauth_token']
    print " oauth_token_secret = %s" % request_token['oauth_token_secret']

    # Step 2: User must authorize application.
    auth_url = "%s?oauth_token=%s" % (TWITTER_AUTHORIZE_URL, request_token['oauth_token'])
    print "Go to the following link in your browser:"
    print auth_url
    pin = raw_input('What is the PIN? ')
    token = oauth.Token(request_token['oauth_token'],request_token['oauth_token_secret'])

    # Step 3: Get access token.
    client = oauth.Client(CONSUMER, token)
    resp, content = client.request(TWITTER_ACCESS_TOKEN_URL, "POST")
    if resp['status'] != '200':
        raise Exception("Invalid response %s." % resp['status'])
    access_token = dict(urlparse.parse_qsl(content))
    print "Access Token:"
    print " oauth_token = %s" % request_token['oauth_token']
    print " oauth_token_secret = %s" % request_token['oauth_token_secret']
    return (access_token['oauth_token'], access_token['oauth_token_secret'])
  • Line 69 – Setup an OAuth client. This is an OAuth aware HTTP client from the oauth2 module. This is used later to make the various requests needed for the OAuth authorization flow.
  • Line 72 through 75 – The first step in the 3-legged OAuth flow. A request token is retrieved from Twitter. This is an unauthorized request token.
  • Line 81 through 84 – The second step in the 3-legged OAuth flow. The User must authorize the script with Twitter. The request token from step 1 is sent along to Twitter so that it can be authorized.
  • Line 85 – Capture the pin that Twitter provides. This pin is used to authorize the request token from step 1. If this was a web application, a callback url would be used and this manual step would not be needed.
  • Line 86 through 87 – The request token from step 1 is authorized with the verification pin from step 2. The request token is now authorized.
  • Line 90 – The third step in the 3-legged OAuth flow. An OAuth client is setup with the authorized request token from step 2.
  • Line 91 through 94 – The authorized request token from step 2 is exchanged for an access token. This token is what the script needs to access Twitter resources on behalf of the user.
  • Line 98 – Return a tuple containing the access token key and secret. We are now ready to stream!

Creating an Authorization header to access Twitter
Now that we have the access token, we can access Twitter on behalf of the User. If we were not accessing the streaming API, we could use the access token and the client from the oauth2 module to make synchronous HTTP requests to User resources. We can’t do this unfortunately, since we are accessing the streaming API which keeps an HTTP connection open as the Tweets stream by and we would not see any output. To get around this, we will be using the Twisted event-driven framework and manually signing the HTTP request with our access token. To do this, we need to sign our request with the access token and then capture the header we need to send to Twitter.

def build_authorization_header(access_token):
    params = {
        'oauth_version': "1.0",
        'oauth_nonce': oauth.generate_nonce(),
        'oauth_timestamp': int(time.time()),
        'oauth_token': access_token.key,
        'oauth_consumer_key': CONSUMER.key

    # Sign the request.
    req = oauth.Request(method="GET", url=url, parameters=params)
    req.sign_request(oauth.SignatureMethod_HMAC_SHA1(), CONSUMER, access_token)

    # Grab the Authorization header
    header = req.to_header()['Authorization'].encode('utf-8')
    print "Authorization header:"
    print " header = %s" % header
    return header
  • Line 111 through 112 – Create and sign our request using our Consumer keys and access tokens. This indicates to Twitter what application is accessing the API and which User authorized the access.
  • Line 115 – Convert the signed request to an Authorization header. We will use this header to access the streaming API.

Stream the tweets
We now have everything we need to stream Tweets from Twitter. This code is all Twisted asynchronous code. I’m not going to explain Twisted in this post because this post is getting quite long. Also, its not OAuth specific. You can take a look at the code on github, but all it is doing is using the Authorization header we created earlier to connect to Twitter. It then prints Tweets line-by-line as they are streamed from Twitter.

    # Twitter stream using the Authorization header.
    twsf = TwitterStreamerFactory(auth_header)
    reactor.connectTCP(TWITTER_STREAM_API_HOST, 80, twsf)
  • Line 137 – Create a TwistedStreamerFactory using our Authorization header. Twisted now has everything it needs to access Twitter.
  • Line 138 through 139 – Start the Twisted reactor to print the Tweets out to the console.

Thats it!
I hope you have found this post interesting and useful. Please feel free to fork the project on github and play around with it. You could tweak it to make it a lot more interesting. Add the ability to filter or track specific keywords or geotagged Tweets. You can even use the same OAuth code to access the regular Twitter REST APIs, so you can write apps to access Users, Trends, Timelines etc. There’s also a whole bunch of other OAuth APIs for you to explore if you are interested e.g. LinkedIn, SimpleGeo, SoundCloud etc. Have fun!