2010/02/23

Variables from PHP sessions in Nginx config

Recently I found this really cool Nginx module mod_eval: http://github.com/vkholodkov/nginx-eval-module
It allows you to store responses from Nginx upstreams (backends) into variables which can be reused inside the Nginx configuration syntax. This offers a whole lot of new possibilities. So it made me think what I could use that for... One problem that we are often facing is that we would like to move more logic from PHP into Nginx, since the Nginx simply is waaay more efficient than our PHP framework. After considering a lot of possible enhancements this module could give us, I came to the conclusion that it would be really useful if the Nginx knew details about the users, which usually only the PHP knows about.
The Nginx itself can already extract cookie values and store them into $cookie_* variables, there already is a Memcache module for the Nginx which i can use in combination with the mod-eval to retrieve Memcache values, combined I can use those modules to do following:
  • Get the users session id out of his session_id cookie using the Nginx internal header parsing
  • Retrieve the users serialized PHP session out of Memcache using the Nginx Memcache module
  • Store the serialized PHP session into an Nginx variable using the mod_eval
Now the only piece missing is a parser for the serialized PHP sessions, so thats what i wrote. Unfortunately the Nginx configuration syntax doesn't support multidimensional array structures, only simple variables. So I couldn't implement this thing in a way which really represents the whole session. I had to implement it as some kind of string scanner which takes a search path that has to be extracted from the serialized multi dimensional session array. I guess that sounds quite complicated now... Well, I can't say it isn't, but an example should help:
In PHP i stored this structure into my session
$_SESSION['symfony/user/sfUser/attributes'] => Array
  (
    [users_dynamic] => Array
      (
        [get_last_online_state] =>
        [update_counter_time] => 1266041164
      )
    [subscriber] => Array
      (
        [user_actual_culture] => de
        [lastURI] => http://dev.poppen.lab/frontend_dev.php/home
        [invisibility] => 0
        [getGender] => m
      )
  )
My goal is to extract the users gender, so I specify the search path symfony/user/sfUser/attributes|s:10:"subscriber";s:9:"getGender" The return value will then use the PHP serialize syntax like s:1:"m"which means it is of type string, has length 1, and value m.
And now the same thing in the Nginx:
location / {
  eval $session {
    # store the retrieved memcache value into the variable $session

    set $memcached_key $cookie_session_id; # extract the value of the cookie session_id
    # extract the value of the cookie "session_id" and use it as memcache key

    memcached_pass 1.2.3.4:11211;
    # get the serialized session from memcache
  }

  php_session_parse $result $session "symfony/user/sfUser/attributes|s:10:\"subscriber\";s:9:\"getGender\"";
  # extract the gender from the serialized session in $session and store the return value into $result

  if ($result = "s:1:\"w\"")
  {
    # for girls
  }

  if ($result = "s:1:\"m\"")
  {
    # for boys
  }

  # for logged out users
}
This whole thing seems to be working quite well for me and I can't find problems with it at the moment, but I have to admit that we don't have it running in any production environment yet, with emphasis on yet. Once its running in some prod env I will post again about what we used it for, and if it works or not, not necessarily in that order.
For the ones who dare to take the risk already, I'd be glad about comments, usecases and murder threats of admins who got fired because I killed their production sites: http://github.com/replay/ngx_http_php_session

2010/02/21

PECL-Memcache compatible consistent hashing loadbalancer for Nginx

During the past year we often saved a lot of resources by adding caches into all kinds of layers of our infrastructure. Some of the most important ones are Memcaches which are getting filled by PHP and read by Nginx.

Now, as long as you have only one Memcache this works pretty well. The PHP stores values and the Nginx reads values, very simple... But what if you need to grow your Memcache cluster? You will run into the problem that the Nginx doesn't know where in the cluster the PHP stored values.
Our first solution to this problem was the Moxi Memcache proxy, which is a really great, recommandable and worth to be mentioned piece of software:
http://code.google.com/p/moxi/

The only solution which might be even more optimal than the Moxi would be if Nginx itself could predict where PHP would store a certain key/value. That kind of predictions are easy to implement if you use consistent hashing on the Memcache keys, a method that is already implement in the PECL-Memcache php module, but not activated by default.

Understanding consistent hashing

So the only thing that I had to do was to read through the code of the PHP Memcache module to check the details of their implementation and then implement the same as upstream balancer for Nginx.
Unfortunately there is some minor flaw in the implementation of the Nginx upstream module which makes it impossible for an upstream loadbalancer to detect if the parameter to the "server" directive in the Nginx configuration file is a DNS name or an IP. In the case of the consistent hashing this is important because the PECL-Memcache implementation uses exactly the given string to build the hashring, which means that the results differ based on if you use DNS names or IPs in your PHP codes.
I personally really hate to have to patch the Nginx upstream module, so i created two branches on github:
  1. The "master" branch only works on environments where the PHP uses IP's to connect to the Memcache servers, this branch doesn't require you to patch the Nginx.
  2. The "dns" branch requires you to patch the Nginx, but it also works if you use DNS names in your PHP code.

As far as I know this module is in prod on multiple websites and I've heard many feedbacks that its working well.

And that's where you get it from:

http://wiki.nginx.org/NginxHttpUpstreamConsistentHash
http://github.com/replay/ngx_http_consistent_hash

I have to warn you tough. The PECL Memcache implementation of the consistent hashing can add considerable overhead to the PHP servers. This overhead comes from the Memcache->addServer method which is rebuilding its whole internal hash ring structure every time when you add a server to the Memcache object. If you have to execute the addServer method very often you might want to consider caching the Memcache objects after adding the servers somehow (don't ask me how).