Tag Archives | monit

Daemon-ize your processes on the cheap, part two: Supervisor

The process supervisor so good I didn't bother evaluating other options!

In part one, I went into a little bit of detail about how I’ve come to view process supervisor tools as a really useful layer of middleware — they abstract out virtually all of the of the logic of forking, nohuping, stdout & stderr redirection, and boot-time initialization that comes from writing your own init/rc scripts and they provide tremendous functionality beyond the standard init script.

This time around I want to focus on Supervisor, which is one of the more popular and well documented process supervisors in use in the Ops space. At a high level, I’m going over how I set up a proof-of-concept Graphite/StatsD server, and the Graphiti front-end that I’m using to visualize the metrics I’m collecting. I like this example because it’s a complete hodge-podge of technologies: Graphite & Carbon are written in Python, Etsy’s StatsD1 is written in Node, and Graphiti is a Sinatra application. Those technologies are all over the “how do I start this at boot-time?” map, because that’s the world we live in these days.

Here’s how I set up a Graphite stack using Supervisor

Supervisor uses an ini-style config format. This makes it very, very easy to understand, and very, very easy to tease apart, which is one of the big things that made Supervisor appealing to me. Supervisor also understands the idea of defining an include path for config file snippets, so your configuration can be broken up into logical units at the filesystem level — this should be immediately appealing to anyone using a configuration management system (Puppet, Chef, CFEngine, some awful-but-functional home-brewed shell scripts you’ve begged the IT director or CIO to let you retire).

The how-to and why-for’s of how I installed Supervisor are less important than how it’s configured2, but what’s notable for this example is that I installed it using vendor packages so it’s not name-spaced into a location like /opt or /usr/local. This is strictly a matter of taste, as long as Supervisor knows where to find its dependencies.

Here’s my global Supervisor config file. It’s pretty standard, with virtually nothing changed from the vendor-provided config file:

  file=/var/run/supervisor.sock   ; (the path to the socket file)
  chmod=0700                       ; socket file mode (default 0700)

  logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)
  pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
  childlogdir=/var/log/supervisor            ; ('AUTO&' child log dir, default $TEMP)

; the below section must remain in the config file for RPC
; (supervisorctl/web interface) to work, additional interfaces may be
; added by defining them in separate rpcinterface: sections
  supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

  serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL  for a unix socket

; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.

  files = /etc/supervisor/conf.d/*.conf

The important line there is files = /etc/supervisor/conf.d/*.conf: that line tells Supervisor where to find additional configuration files, and it means that we can cleanly seperate our global configuration from our individual service configurations. Trust me when I say THAT’S SOMETHING YOU WANT TO DO.

To get my Graphite stack running, I just had to create a file named something like /etc/supervisor/conf.d/graphite.conf and start or restart supervisord. Here’s the configuration for my entire Graphite stack:

  command=/usr/bin/gunicorn_django -u www-data -g www-data -b unix:/var/tmp/graphite.sock --log-file=/opt/graphite/storage/log/webapp/gunicorn.log /opt/graphite/webapp/graphite/settings.py

  command=python /opt/graphite/bin/carbon-cache.py --debug start

  command=/usr/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js

  command=/local/rvm/bin/rvm ruby-1.9.3-p286 do bundle exec unicorn -c config/unicorn.rb -E production

For those watching, I didn’t lump these into a process group or break the config file into smaller logical units. Both would have been good ideas, but for the sake of this proof-of-concept I admit that I was a little lazy with best-practice. In actual production, you should probably have a good reason to ignore best-practice, and “I didn’t bother” or “I’m lazy” are never good reasons.

How do I control this thing once it’s started?

Meet supervisorctl, the general purpose interface to everything that supervisord manages. You run it as root (or via sudo), and when invoked interactively it looks a little bit like this:

ryan@ubuntu1204-knockaround:~# sudo supervisorctl
carbon-cache                     RUNNING    pid 1672, uptime 19 days, 6:53:57
gunicorn-graphite                RUNNING    pid 1674, uptime 19 days, 6:53:57
statsd                           RUNNING    pid 1671, uptime 19 days, 6:53:57
unicorn-graphiti                 RUNNING    pid 1673, uptime 19 days, 6:53:57
supervisor> help

default commands (type help <topic>):
add    clear  fg        open  quit    remove  restart   start   stop  update 
avail  exit   maintail  pid   reload  reread  shutdown  status  tail  version


So, at a glance, that’s a quick dashboard with the status of whatever you’re running, and a prompt that I used to run the help command. help lists all of the other commands available if you invoke it without an argument, and it will provide you with details for each of those commands in turn if you invoke them together. One of the things I like about supervisorctl is that it also works non-interactively — you can pass any of those commands to it on invocation, and it will simply run the command.

ryan@ubuntu1204-knockaround:~# supervisorctl help fg
fg <process>  Connect to a process in foreground mode
Press Ctrl+C to exit foreground
ryan@ubuntu1204-knockaround:~# supervisorctl help restart
restart <name>        Restart a process
restart <gname>:* Restart all processes in a group
restart <name> <name>   Restart multiple processes or groups
restart all     Restart all processes

Why would I do this instead of writing init scripts?

Well, that’s up to you but if I were to make the argument, I’d base it on the strength and flexibility of the simple configuration options available for any defined programs. The ability to define process groups, start standard executable processes, or manage FastCGI process, and the robust tools provided for managing those processes make using Supervisor a veritable no-brainer on any stacks that I have to deploy home-brew daemons on.

Alternatively, a case could be made that the simple ini-style config format makes it easier to programmatically generate valid configurations for new daemons (lowering the barrier to implementation further still), with Supervisor providing XML-RPC and HTTP interfaces for service management. If you’re into empowering your users, then empowering them to restart services without requiring you to do it for them should have you salivating at the thought of your decreased operational-task load.

There’s also a number of 3rd party plugins for Supervisor that can extend listener functionality, integrate Supervisor with Nagios, or scale the number of processes spawned to match the number of CPU cores on any server. That’s pretty POWERFUL stuff to mix into your init system right there, and traditional init scripts don’t offer any of that functionality out of the box.

So what’s my subjective opinion?

I’ve been running this configuration for my Graphite stack since earlier this summer and it’s been absolutely rock solid. There’s definitely some quirks and caveats to keep in mind when using Supervisor:

  • If a daemon takes a while to start or stop and you don’t specify a timeout value then Supervisor will wind up stuck in a limited loop wherein it tries to spin up the same process repeatedly until it hits the restart threshold, always without killing previous attempts first. This causes a process pile-up. The configuration directives are very fine-grained, so you can tune the initialization and setup of each process you’re spawning, but there’s no way around the fact that a chance exists that one-size-fits-all configurations won’t work with every program you want to give Supervisor responsibility for.
  • Some programs just seem to stubbornly resist being passed off to Supervisor. The one that has given me the most trouble so far has been Resque (I am not alone in this). I still don’t have a perfectly functioning Resque configuration, but I will. And when I do, I’m going to post that thing errywhere.

  • Supervisor is a daemon in its own right, and as such it will consume resources of its own. They’re negligible but it’s still something you should keep in mind if you’re trying to absolutely maximize the resource usage of any given server.

Are these deal-breakers? Not for me — I love Supervisor. I wish I’d known abut it before I pulled a 48-hour marathon session of refactoring a number of home-brew init scripts at work. I like Supervisor so much that I actually never finished my bake-off with Monit because Supervisor solved my problems the first time around (that is bad science, by the way, and the correct put-down here would be ‘my experimental rigor has been found lacking’). I think that the nicest thing I can say about a tool is this: it solved the problem I had at the time and let me work on solving new problems instead. And that’s exactly what Supervisor has done.

  1. StatsD has actually been implemented in a number of languages. Here’s a great post on ServerZone detailing some of the more popular implementations. 
  2. If you care, I used RPMs from EPEL on the CentOS systems I needed to set this up on, and Canonical-provided DPKGs for the Ubuntu systems. If you’re not using a package manager, I’m not going to explain compiling software or installing Python packages to you because however you’re managing your installed software is probably wrong

Daemon-ize your processes on the cheap!

Or, Death to /etc/rc.local!

TL;DR, or The Executive Summary: Init scripts are hard. Here’s a bunch of UNIX background backing up an argument for using stand-alone Process Supervisors whenever you need a new instance of a custom daemon spun up.

The setup

Picture. if you will, a pile of code.

Yeah, that’s good. The software equivalent of THAT mess.

If you write software, good odds that you’ve written at least one of these steaming sadness piles. If you work in operations, there’s better odds that you’ve been handed at least one failure-pile (this week). You or someone a lot like you needed a message consumer, some new hotness message bus, a stand-alone process that just listens for specific connections and commands on some random port or socket, or just some long-running process that runs non-interactively in the background (this being the very definition of daemon, by the way) and now this thing needs to run in production. In the wild. In the world at large, with other, nicer or bigger daemons grinding up page-to-page with it.

Invariably, the communication or documentation attached to this pig-pile of meadow-muffin-code almost always looks like this:

“Oh, just execute this and Bob’s Your Uncle, we’re all set! Just stick it in /etc/rc.local!”

    LD_LIBRARY_PATH=/some/crazy/nonstandard/path \
    /path/to/some/executable \
      OPTION OPTION OPTION OPTION OPTION >> /dev/null 2>&1 &
Say whaaaaaaaaaat?

Yeah… we’re not doing that. Why? For so, so many good reasons — redefining global variable scope, no PID tracking, no management, it gives me hives, because I said so, etc. ad infinitum.

Continue Reading →