Subscribe   |   @sriramk   |   About

Cacheman - a fast distributed hashtable for Windows


Cacheman is a fast, distributed hashtable for Windows, implemented purely in managed code. This is a personal side-project which I started and abandoned several months ago and picked up a few weekends ago. If you just want the binaries, scroll to the bottom of the post to get the raw, early bits (and I do mean raw and early!)

It all started with my fascination with memcached and how well it works for several heavy-traffic sites. I set off to create a bare bones hashtable with a set of requirements in mind

Note that memcached meets a lot of the requirements above. My biggest reason for starting from scratch was to just see ‘whether I could do it’ :-).

I’ll walk you through a little demonstration before digging into how it works and why certain design decisions were made.

###A quick tour

If you grab the binaries below, you’ll see 3 binaries - the server, a console/client and a library that you can link to your own apps.



Dare Obasanjo has written extensively on how typical code using memcached in .NET looks like and the same pattern is applicable here too. Here’s some sample code using the client library. Ignore the IPEndPoint stuff - the next release will let you write this in a nice config file and never have to deal with IP addresses and port in code.


###How stuff works

The disadvantage of this approach is that when you add a new server node or remove a server node, the cache needs to get repopulated. The fix for this is a consistent hashing algorithm which I haven’t gotten around to implementing (and from the little I know, there are not too many of these algorithms out there)

The other choice is to use a central ‘master cache server’ which stores a lookup table of server nodes at which the clients can constantly poll. I’m not sure whether I like this too much as it seems to be a lot of added complexity and brings its own set of problems.

The networking layer also takes care to not do small writes. This, along with the protocol design lets me get a perf win by setting TCP_NODELAY on my sockets.

The model I have now is built around NT’s IO Completion ports. Unlike the previous model, there is a M*N relationship between sockets and threads (rather than a 1:1 relationship).

When data is queued to the completion port (from the client), one of the waiting threads is woken up and it goes to work on the data. Once it is done processing (either handling the request or deciding that it needs more data), it goes back to waiting by calling GetQueuedCompletionStatus internally. This lack of thread affinity for client sockets lets me multiplex several clients to a few active threads.

On my home machine (2.4GHz Intel Core 2 with 2GB RAM), I can push the server to around 16000 requests per second (clients and server running on the same machine). I believe I can get it to around 25K requests on the same box with a bit of work but anything above that is going to mean some serious work.

Cacheman comes with a nice set of perf counters for you to get at these numbers anytime



However, my code isn’t entirely async. When the server sends data back to the client, it blocks on the send. This was a deliberate choice - whenever I did perf tests with async sends, I saw a noticeable dip in speed (as measured in requests processed per second). My theory here is that the hit is due to the context switch between the ‘request-processing’ thread and the ‘socket send’ thread.

###The Bits

There is a lot of work left to be done (a better client, wrapping the server into a NT service, making things more ops friendly, lock-free internal data structures,etc) but I wanted to try the ‘release-early-release often’ approach for once. Things are quite busy at work (some kick-a** Popfly features in the pipeline as usual) so only expect bug fixes over the weekend :)

Be warned - these are really early, really raw bits. Stuff will crash or not work. The next version will change everything. Demons will be pulled out of your nose. Use at your own risk! Have fun and send feedback through the comments or to or


I don’t usually have an ‘acknowledgements’ section but I just had to have one this time. A shout out to my friends who saw very little of me these past weekends. :) And to Brad Fitzpatrick and the rest of the Memcached folks for their awesome work.

Subscribe to my email newsletter here