Cacheman is a fast, distributed hashtable for Windows, implemented purely in managed code. This is a personal side-project which I started and abandoned several months ago and picked up a few weekends ago. If you just want the binaries, scroll to the bottom of the post to get the raw, early bits (and I do mean raw and early!)
It all started with my fascination with memcached and how well it works for several heavy-traffic sites. I set off to create a bare bones hashtable with a set of requirements in mind
Note that memcached meets a lot of the requirements above. My biggest reason for starting from scratch was to just see ‘whether I could do it’ :-).
I’ll walk you through a little demonstration before digging into how it works and why certain design decisions were made.
###A quick tour
If you grab the binaries below, you’ll see 3 binaries - the server, a console/client and a library that you can link to your own apps.
Dare Obasanjo has written extensively on how typical code using memcached in .NET looks like and the same pattern is applicable here too. Here’s some sample code using the client library. Ignore the IPEndPoint stuff - the next release will let you write this in a nice config file and never have to deal with IP addresses and port in code.
###How stuff works
In terms of general architecture, Cacheman is similar to memcached as opposed to other distributed hash tables (for e.g, EHCache from the Java world is an in-memory cache with async replication to the server). The server listens on a socket, parses commands from the client and responds appropriately. The client itself is pretty dumb and is just a thin layer over some networking code. This means that Cacheman is neither a read-through nor a write-through cache at the moment.
When you do a GET/SET/DELETE operation for a specified key, the client first needs to figure out which Cacheman server instance to talk to. To do that, the client does a quick FNV hash of the key and then mods that with the number of servers to get the server to talk to.
The disadvantage of this approach is that when you add a new server node or remove a server node, the cache needs to get repopulated. The fix for this is a consistent hashing algorithm which I haven’t gotten around to implementing (and from the little I know, there are not too many of these algorithms out there)
The other choice is to use a central ‘master cache server’ which stores a lookup table of server nodes at which the clients can constantly poll. I’m not sure whether I like this too much as it seems to be a lot of added complexity and brings its own set of problems.
The networking layer also takes care to not do small writes. This, along with the protocol design lets me get a perf win by setting TCP_NODELAY on my sockets.
The model I have now is built around NT’s IO Completion ports. Unlike the previous model, there is a M*N relationship between sockets and threads (rather than a 1:1 relationship).
When data is queued to the completion port (from the client), one of the waiting threads is woken up and it goes to work on the data. Once it is done processing (either handling the request or deciding that it needs more data), it goes back to waiting by calling GetQueuedCompletionStatus internally. This lack of thread affinity for client sockets lets me multiplex several clients to a few active threads.
On my home machine (2.4GHz Intel Core 2 with 2GB RAM), I can push the server to around 16000 requests per second (clients and server running on the same machine). I believe I can get it to around 25K requests on the same box with a bit of work but anything above that is going to mean some serious work.
Cacheman comes with a nice set of perf counters for you to get at these numbers anytime
However, my code isn’t entirely async. When the server sends data back to the client, it blocks on the send. This was a deliberate choice - whenever I did perf tests with async sends, I saw a noticeable dip in speed (as measured in requests processed per second). My theory here is that the hit is due to the context switch between the ‘request-processing’ thread and the ‘socket send’ thread.
There is a lot of work left to be done (a better client, wrapping the server into a NT service, making things more ops friendly, lock-free internal data structures,etc) but I wanted to try the ‘release-early-release often’ approach for once. Things are quite busy at work (some kick-a** Popfly features in the pipeline as usual) so only expect bug fixes over the weekend :)
Be warned - these are really early, really raw bits. Stuff will crash or not work. The next version will change everything. Demons will be pulled out of your nose. Use at your own risk! Have fun and send feedback through the comments or to email@example.com or firstname.lastname@example.org
I don’t usually have an ‘acknowledgements’ section but I just had to have one this time. A shout out to my friends who saw very little of me these past weekends. :) And to Brad Fitzpatrick and the rest of the Memcached folks for their awesome work.