In part one, I went into a little bit of detail about how I’ve come to view process supervisor tools as a really useful layer of middleware — they abstract out virtually all of the of the logic of forking, nohuping, stdout & stderr redirection, and boot-time initialization that comes from writing your own init/rc scripts and they provide tremendous functionality beyond the standard init script.
This time around I want to focus on Supervisor, which is one of the more popular and well documented process supervisors in use in the Ops space. At a high level, I’m going over how I set up a proof-of-concept Graphite/StatsD server, and the Graphiti front-end that I’m using to visualize the metrics I’m collecting. I like this example because it’s a complete hodge-podge of technologies: Graphite & Carbon are written in Python, Etsy’s StatsD1 is written in Node, and Graphiti is a Sinatra application. Those technologies are all over the “how do I start this at boot-time?” map, because that’s the world we live in these days.
Here’s how I set up a Graphite stack using Supervisor
Supervisor uses an ini-style config format. This makes it very, very easy to understand, and very, very easy to tease apart, which is one of the big things that made Supervisor appealing to me. Supervisor also understands the idea of defining an include path for config file snippets, so your configuration can be broken up into logical units at the filesystem level — this should be immediately appealing to anyone using a configuration management system (Puppet, Chef, CFEngine, some awful-but-functional home-brewed shell scripts you’ve begged the IT director or CIO to let you retire).
The how-to and why-for’s of how I installed Supervisor are less important than how it’s configured2, but what’s notable for this example is that I installed it using vendor packages so it’s not name-spaced into a location like /opt or /usr/local. This is strictly a matter of taste, as long as Supervisor knows where to find its dependencies.
Here’s my global Supervisor config file. It’s pretty standard, with virtually nothing changed from the vendor-provided config file:
[unix_http_server] file=/var/run/supervisor.sock ; (the path to the socket file) chmod=0700 ; socket file mode (default 0700) [supervisord] logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log) pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid) childlogdir=/var/log/supervisor ; ('AUTO&' child log dir, default $TEMP) ; the below section must remain in the config file for RPC ; (supervisorctl/web interface) to work, additional interfaces may be ; added by defining them in separate rpcinterface: sections [rpcinterface:supervisor] supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket ; The [include] section can just contain the "files" setting. This ; setting can list multiple files (separated by whitespace or ; newlines). It can also contain wildcards. The filenames are ; interpreted as relative to this file. Included files *cannot* ; include files themselves. [include] files = /etc/supervisor/conf.d/*.conf
The important line there is files = /etc/supervisor/conf.d/*.conf: that line tells Supervisor where to find additional configuration files, and it means that we can cleanly seperate our global configuration from our individual service configurations. Trust me when I say THAT’S SOMETHING YOU WANT TO DO.
To get my Graphite stack running, I just had to create a file named something like /etc/supervisor/conf.d/graphite.conf and start or restart supervisord. Here’s the configuration for my entire Graphite stack:
[program:gunicorn-graphite] command=/usr/bin/gunicorn_django -u www-data -g www-data -b unix:/var/tmp/graphite.sock --log-file=/opt/graphite/storage/log/webapp/gunicorn.log /opt/graphite/webapp/graphite/settings.py process_name=%(program_name)s autostart=true autorestart=true stopsignal=QUIT user=www-data [program:carbon-cache] command=python /opt/graphite/bin/carbon-cache.py --debug start process_name=%(program_name)s autostart=true autorestart=true stopsignal=QUIT [program:statsd] command=/usr/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js process_name=%(program_name)s autostart=true autorestart=true stopsignal=QUIT user=www-data [program:unicorn-graphiti] directory=/opt/graphiti command=/local/rvm/bin/rvm ruby-1.9.3-p286 do bundle exec unicorn -c config/unicorn.rb -E production process_name=%(program_name)s autostart=true autorestart=true stopsignal=QUIT stopasgroup=true stopwaitsecs=5 killasgroup=true user=www-data
For those watching, I didn’t lump these into a process group or break the config file into smaller logical units. Both would have been good ideas, but for the sake of this proof-of-concept I admit that I was a little lazy with best-practice. In actual production, you should probably have a good reason to ignore best-practice, and “I didn’t bother” or “I’m lazy” are never good reasons.
How do I control this thing once it’s started?
Meet supervisorctl, the general purpose interface to everything that supervisord manages. You run it as root (or via sudo), and when invoked interactively it looks a little bit like this:
ryan@ubuntu1204-knockaround:~# sudo supervisorctl carbon-cache RUNNING pid 1672, uptime 19 days, 6:53:57 gunicorn-graphite RUNNING pid 1674, uptime 19 days, 6:53:57 statsd RUNNING pid 1671, uptime 19 days, 6:53:57 unicorn-graphiti RUNNING pid 1673, uptime 19 days, 6:53:57 supervisor> help default commands (type help <topic>): ===================================== add clear fg open quit remove restart start stop update avail exit maintail pid reload reread shutdown status tail version supervisor>
So, at a glance, that’s a quick dashboard with the status of whatever you’re running, and a prompt that I used to run the help command. help lists all of the other commands available if you invoke it without an argument, and it will provide you with details for each of those commands in turn if you invoke them together. One of the things I like about supervisorctl is that it also works non-interactively — you can pass any of those commands to it on invocation, and it will simply run the command.
ryan@ubuntu1204-knockaround:~# supervisorctl help fg fg <process> Connect to a process in foreground mode Press Ctrl+C to exit foreground ryan@ubuntu1204-knockaround:~# supervisorctl help restart restart <name> Restart a process restart <gname>:* Restart all processes in a group restart <name> <name> Restart multiple processes or groups restart all Restart all processes ryan@ubuntu1204-knockaround:~#
Why would I do this instead of writing init scripts?
Well, that’s up to you but if I were to make the argument, I’d base it on the strength and flexibility of the simple configuration options available for any defined programs. The ability to define process groups, start standard executable processes, or manage FastCGI process, and the robust tools provided for managing those processes make using Supervisor a veritable no-brainer on any stacks that I have to deploy home-brew daemons on.
Alternatively, a case could be made that the simple ini-style config format makes it easier to programmatically generate valid configurations for new daemons (lowering the barrier to implementation further still), with Supervisor providing XML-RPC and HTTP interfaces for service management. If you’re into empowering your users, then empowering them to restart services without requiring you to do it for them should have you salivating at the thought of your decreased operational-task load.
There’s also a number of 3rd party plugins for Supervisor that can extend listener functionality, integrate Supervisor with Nagios, or scale the number of processes spawned to match the number of CPU cores on any server. That’s pretty POWERFUL stuff to mix into your init system right there, and traditional init scripts don’t offer any of that functionality out of the box.
So what’s my subjective opinion?
I’ve been running this configuration for my Graphite stack since earlier this summer and it’s been absolutely rock solid. There’s definitely some quirks and caveats to keep in mind when using Supervisor:
- If a daemon takes a while to start or stop and you don’t specify a timeout value then Supervisor will wind up stuck in a limited loop wherein it tries to spin up the same process repeatedly until it hits the restart threshold, always without killing previous attempts first. This causes a process pile-up. The configuration directives are very fine-grained, so you can tune the initialization and setup of each process you’re spawning, but there’s no way around the fact that a chance exists that one-size-fits-all configurations won’t work with every program you want to give Supervisor responsibility for.
Some programs just seem to stubbornly resist being passed off to Supervisor. The one that has given me the most trouble so far has been Resque (I am not alone in this). I still don’t have a perfectly functioning Resque configuration, but I will. And when I do, I’m going to post that thing errywhere.
Supervisor is a daemon in its own right, and as such it will consume resources of its own. They’re negligible but it’s still something you should keep in mind if you’re trying to absolutely maximize the resource usage of any given server.
Are these deal-breakers? Not for me — I love Supervisor. I wish I’d known abut it before I pulled a 48-hour marathon session of refactoring a number of home-brew init scripts at work. I like Supervisor so much that I actually never finished my bake-off with Monit because Supervisor solved my problems the first time around (that is bad science, by the way, and the correct put-down here would be ‘my experimental rigor has been found lacking’). I think that the nicest thing I can say about a tool is this: it solved the problem I had at the time and let me work on solving new problems instead. And that’s exactly what Supervisor has done.
- StatsD has actually been implemented in a number of languages. Here’s a great post on ServerZone detailing some of the more popular implementations. ↩
- If you care, I used RPMs from EPEL on the CentOS systems I needed to set this up on, and Canonical-provided DPKGs for the Ubuntu systems. If you’re not using a package manager, I’m not going to explain compiling software or installing Python packages to you because however you’re managing your installed software is probably wrong. ↩