2010/02/21

PECL-Memcache compatible consistent hashing loadbalancer for Nginx

During the past year we often saved a lot of resources by adding caches into all kinds of layers of our infrastructure. Some of the most important ones are Memcaches which are getting filled by PHP and read by Nginx.

Now, as long as you have only one Memcache this works pretty well. The PHP stores values and the Nginx reads values, very simple... But what if you need to grow your Memcache cluster? You will run into the problem that the Nginx doesn't know where in the cluster the PHP stored values.
Our first solution to this problem was the Moxi Memcache proxy, which is a really great, recommandable and worth to be mentioned piece of software:
http://code.google.com/p/moxi/

The only solution which might be even more optimal than the Moxi would be if Nginx itself could predict where PHP would store a certain key/value. That kind of predictions are easy to implement if you use consistent hashing on the Memcache keys, a method that is already implement in the PECL-Memcache php module, but not activated by default.

Understanding consistent hashing

So the only thing that I had to do was to read through the code of the PHP Memcache module to check the details of their implementation and then implement the same as upstream balancer for Nginx.
Unfortunately there is some minor flaw in the implementation of the Nginx upstream module which makes it impossible for an upstream loadbalancer to detect if the parameter to the "server" directive in the Nginx configuration file is a DNS name or an IP. In the case of the consistent hashing this is important because the PECL-Memcache implementation uses exactly the given string to build the hashring, which means that the results differ based on if you use DNS names or IPs in your PHP codes.
I personally really hate to have to patch the Nginx upstream module, so i created two branches on github:
  1. The "master" branch only works on environments where the PHP uses IP's to connect to the Memcache servers, this branch doesn't require you to patch the Nginx.
  2. The "dns" branch requires you to patch the Nginx, but it also works if you use DNS names in your PHP code.

As far as I know this module is in prod on multiple websites and I've heard many feedbacks that its working well.

And that's where you get it from:

http://wiki.nginx.org/NginxHttpUpstreamConsistentHash
http://github.com/replay/ngx_http_consistent_hash

I have to warn you tough. The PECL Memcache implementation of the consistent hashing can add considerable overhead to the PHP servers. This overhead comes from the Memcache->addServer method which is rebuilding its whole internal hash ring structure every time when you add a server to the Memcache object. If you have to execute the addServer method very often you might want to consider caching the Memcache objects after adding the servers somehow (don't ask me how).