It’s been way too long since my last programming focused blog post, so let’s try to rectify this situation:
A couple months ago, Twitter made available their Streaming API. This provides developers with a very efficient way to tap into the public Twitter stream. All you need to do is open and maintain a single HTTP connection, passing in a few filter parameters. Twitter then keeps streaming matching tweets to you. You have the option of either sampling the entire public stream, or passing in a list of keywords and / or user ids to track. In this post I will focus on the latter, but the basic usage remains the same.
My interest was piqued when I came across the excellent TweetStream library, which makes it trivially easy to write a Ruby client application for the Twitter Streaming API. I decided to take this opportunity to play with some other technologies and write a simple web app that displays a subset of tweets, along the lines of cursebird or twistori.
The app I came up with is Twatcher, so go check it out to get an idea of what I’m talking about. It (admittedly very crudely) identifies funny tweets by looking for tweets that contain the word “lol”. It then renders matching tweets using a simple UI not unlike that of twitter.com itself, and visually highlights the word “lol” in each tweet for emphasis. Perhaps most importantly, the app uses AJAX to periodically (currently every 10 seconds) pull in new tweets.
In the remainder of this post, I will describe the architecture of Twatcher, along with the rationale behind it. I will also share some code snippets that should allow you to follow along and build your own Twitter filter app.
Given that the connection to the Twitter Streaming API has to happen in its own application (let’s call it our filter app), outside of the actual web app, we need a way for it to pass tweets to the web app. There are a couple of options here. Obviously we could store the tweets in a database like MySQL, and have the web app read them from there. But given the small schema (only tweets; we don’t care about users or any other relational data) and the ephemeral nature of the Twatcher app (at any given time we really only care about the N most recent tweets), this seems like overkill and leads to an unnecessarily write-heavy app. Instead, one of the various in-memory key/value stores seems like a much better fit. I first thought of memcached, but while it would be entirely possible to build this type of app on top of memcached, it’s not ideal. When a new tweet comes in from the Streaming API, we need to append it to our in-memory list of tweets. Memcached is a very low-level data store and only supports string values, so we would have to implement lists by serializing them as YAML, JSON, or binary Ruby objects. Either way would mean that instead of writing just the latest tweet, we would always have to re-write the entire list of tweets. Similarly, on the web app side, we would always have to read the entire list, even though we may only be interested in the 5 most recent tweets (say in our AJAX action). Combined, this would lead to a fair amount of overhead on both the networking as well as Ruby processing side.
Luckily, there’s another data store that is perfect for this type of app: Redis. On the most basic level it can be thought of as a key/value store like memcached, so it can act pretty much like a drop-in replacement for this. But it also has first class support for basic data structures, such as sets and lists. This means that instead of reading and writing entire lists of tweets, we can append a single tweet at a time, and we can efficiently retrieve the exact number of tweets that we need on the web app side (i.e. 20 for a full page view, and 5 for an AJAX request). Redis is stable, highly performant, and has a solid, extremely easy to use Ruby library. It also supports basic persistence, although we won’t need this for our app.
With the filter app and data store out of the way, that leaves the actual web app. Our requirements are very humble: We will only have two actions (one for the full page of tweets and one for AJAX updates), and perhaps a few more trivial actions in the future for things like help pages. Since we’re not using a relational database, we don’t need any sort of ORM layer. While we could use Ruby on Rails, this would mean shooting sparrows with cannons. For our purposes the Sinatra micro-framework seems like a much better fit.
I’m a big fan of HAML, so we’ll use this for our views. Of course there’s nothing HAML specific about our app, so you’re welcome to use ERB or your template language of choice instead.
I won’t go much into the deployment side of things, but twatcher.com relies on the usual suspects: Nginx (Apache would work fine as well), Passenger (you’re welcome to use Mongrel, Thin, etc.), Capistrano, and God (to start and monitor Redis and our filter app, though I may end up giving Bluepill a try). All of this runs very smoothly on a 256MB VPS slice on Webbynode (and I’m sure just as well on Slicehost or Linode). If necessary, we could easily scale up this app by bringing up additional Sinatra slices and adding HAProxy to the mix (or perhaps even just relying on DNS round robin).
Now that the architectural overview is done, let’s take a look at some of the code. This isn’t the complete code base that I’m using on the site, but it’s a fully functional subset and hopefully enough to demonstrate the overall approach and get you started. Alternatively, you can grab the code from the twatcher-lite Github repository. I will eventually make the complete project (which includes configuration options, RSpec specs, etc.) available on Github as well.
But first a couple of prerequisites: We need to install a bunch of gems. For a production app, I would typically unpack these into the
vendor directory, but for now let’s just install them system-wide:
You also need to install and start Redis. It’s easy enough, but beyond the scope of this blog post. Simply follow these instructions (I highly recommend the entire Redis article series by the way), but make sure to use the latest Redis release from the official website (currently 1.02) rather than the 1.0 version mentioned in the article.
This is the standalone filter app that mainly relies on the TweetStream library to retrieve tweets and then pushes them to Redis. In our final app we would want to use the Daemons library to run this app as a proper daemon, but for now you should be able to simply run it directly from the command line. Note that it relies on two additional files below. Simply place all of these into the same folder.
Make sure to set USERNAME and PASSWORD to your actual Twitter credentials. A word of caution: Apparently Twitter only allows a single Streaming API connection for standard accounts, and they will disconnect or blacklist you if you attempt to start multiple connections. I’m using a dedicated Twitter account for production, and my regular Twitter account during development. The actual version of this file that I’m using reads the (environment specific) credentials from a YAML file, but I didn’t want to distract from the core functionality for the purpose of this tutorial.
This is a thin abstraction layer on top of Redis that encapsulates both pushing and retrieving tweets. This allows us to keep Redis specific persistence code out of the filter and web apps and also comes in handy for testing (which I’m not getting into in this post), as we can easily swap it out for a mock implementation.
Note how we’re using the
push_head operation to push a single tweet to Redis, and
list_range to retrieve the N most recent tweets.
The Tweet class wraps an individual tweet’s data hash and allows us to access the data using method call syntax (
tweet.username) rather than hash element references (
tweet['username']). It also contains some tweet related functionality, such as generating Twitter user links, highlighting the word “lol”, and making URLs clickable.
This is the actual Sinatra web app. This is the entire app (minus the views), so perhaps now you can see why we’re using Sinatra instead of a full-blown Rails app. The views follow below.
Note that our two actions both return tweets. The main difference is that the
/latest action (which is used by AJAX requests) only returns up to 5 tweets, and only if they’re newer than the specified date. It also omits the layout and specifies a special CSS class named
A pretty simple layout.
The actual HTML content is pretty minimal: A heading and a list of tweets, which is included from a separate file (below) so we can reuse it for the AJAX action. We’re also inlining some jQuery code to refresh the tweets every 10 seconds. We insert the new tweets at the beginning, but remember that we’re using a CSS class to initially hide them. We then call
slideDown to make them visible using a nice slide effect. We also trim the list of tweets at 50 to prevent the page from getting too long.
Simply renders a list item for each tweet, with some basic CSS for styling purposes. I wouldn’t normally hardcode
width for an
img tag (and instead let CSS handle this), but for the purpose of this tutorial I wanted the page to render decently without a style sheet, and the Twitter profile pictures can be pretty large, making it look weird.
The stylesheet is pretty basic, but since this blog post is already way too long, I’m not going to reproduce it here. Simply grab the live one instead.
Putting it all together
You should now have a bunch of Ruby files in the same folder, and three HAML files in a
views subdirectory. Make sure you have started Redis according to the instructions above. Then open two shells:
In your first shell, start the filter app:
The app should start and continue running until you hit CTRL+C.
In your second shell, start the web app. You could simply start it using:
However, assuming you’ve installed the Shotgun gem according to the instructions above, I recommend using the following command instead:
This will cause Sinatra to automatically reload modified files during development, similar to the default behavior in Rails.
I hope you can appreciate how little code it took us to implement a complete Twitter filter web application, complete with AJAX updates. I count around 150 lines of code, and this includes plenty of comments and whitespace (granted, including the stylesheet it would be closer to 300 lines).
I also hope I’ve managed to pique your curiosity about Redis, Sinatra, and the TweetStream library. Many of us (myself included) tend to stick with the tools we’re familiar with, such as Rails and MySQL. But often, surprisingly elegant solutions emerge when using better-suited (and often simpler) tools.
Personally, I am excited about adding Redis and Sinatra to my standard toolset. I am also curious about what other types of applications might be able to get away with simple, ephemeral solutions like this. Definitely something worth exploring…