Archive | general

Working. Busy. Read these instead.

So many posts in the “draft” status, and so little time to make any of them presentable. But there’s been a ton of interesting things coming down the pipe for the last few weeks and I wanted to hype a few things that caught my attention.

  • Ruby 1.8.7 EOL in 90 days
    The only reason to have any Ruby 1.8 installations floating around is because it’s what
    Puppet Labs distributes if you use the packages they provide.
    Embrace change.
  • Marc Gauthier | Please Keep a Changelog For Your Open Source Lib
    This is probably tied with “no examples or sample code” for
    “biggest open source pet peeve” and Marc makes an excellent argument for why
    changelogs rule.
  • Chronos
    airbnb needed a distributed task scheduling solution, so they wrote one.
    At ${DAYJOB} I was once involved in a project to find a cron replacement
    which was distributed and provided a web front-end. The project failed horribly
    and we instead built nothing. This would have been fast-tracked as
    “must build proof of concept” if it had existed at the time, no questions asked.
  • Rob Bell | A Beginner’s Guide to Big O Notation
    For reasons I’ve never really be able to fathom, Google and almost anyone who
    has ever worked for Google insists on asking admins about algorithms because they
    deeply, truly believe that an admin should have a computer science background.
    Nevermind that this industry has spent 30 years abstracting things to the point
    where even CS majors don’t really have CS backgrounds anymore. I don’t do so hot
    on those tests, but in my eternal quest to hold my own whenever possible I found
    Rob’s introduction to be tremendously useful. Now, if anyone ever asks how long
    a bubble sort takes (… again), you and I can tell them O(N2) and
    we won’t mix it up with O(2N). Let’s be armchair computer
    scientists together!

And finally, because this project is near and dear to me:

  • Lack AV Rack
    I’ve built a couple of small Lack Racks using shelving brackets, but nothing
    that looks as clean and professional as these. If you’re an audio-gear slut who is
    handy and thrifty, you owe it to yourself to go through sparced‘s
    build-out documentation.

See you all after Monitorama!

Daemon-ize your processes on the cheap, part two: Supervisor

The process supervisor so good I didn't bother evaluating other options!

In part one, I went into a little bit of detail about how I’ve come to view process supervisor tools as a really useful layer of middleware — they abstract out virtually all of the of the logic of forking, nohuping, stdout & stderr redirection, and boot-time initialization that comes from writing your own init/rc scripts and they provide tremendous functionality beyond the standard init script.

This time around I want to focus on Supervisor, which is one of the more popular and well documented process supervisors in use in the Ops space. At a high level, I’m going over how I set up a proof-of-concept Graphite/StatsD server, and the Graphiti front-end that I’m using to visualize the metrics I’m collecting. I like this example because it’s a complete hodge-podge of technologies: Graphite & Carbon are written in Python, Etsy’s StatsD1 is written in Node, and Graphiti is a Sinatra application. Those technologies are all over the “how do I start this at boot-time?” map, because that’s the world we live in these days.

Here’s how I set up a Graphite stack using Supervisor

Supervisor uses an ini-style config format. This makes it very, very easy to understand, and very, very easy to tease apart, which is one of the big things that made Supervisor appealing to me. Supervisor also understands the idea of defining an include path for config file snippets, so your configuration can be broken up into logical units at the filesystem level — this should be immediately appealing to anyone using a configuration management system (Puppet, Chef, CFEngine, some awful-but-functional home-brewed shell scripts you’ve begged the IT director or CIO to let you retire).

The how-to and why-for’s of how I installed Supervisor are less important than how it’s configured2, but what’s notable for this example is that I installed it using vendor packages so it’s not name-spaced into a location like /opt or /usr/local. This is strictly a matter of taste, as long as Supervisor knows where to find its dependencies.

Here’s my global Supervisor config file. It’s pretty standard, with virtually nothing changed from the vendor-provided config file:

[unix_http_server]
  file=/var/run/supervisor.sock   ; (the path to the socket file)
  chmod=0700                       ; socket file mode (default 0700)

[supervisord]
  logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)
  pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
  childlogdir=/var/log/supervisor            ; ('AUTO&' child log dir, default $TEMP)

; the below section must remain in the config file for RPC
; (supervisorctl/web interface) to work, additional interfaces may be
; added by defining them in separate rpcinterface: sections
[rpcinterface:supervisor]
  supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
  serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL  for a unix socket

; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.

[include]
  files = /etc/supervisor/conf.d/*.conf

The important line there is files = /etc/supervisor/conf.d/*.conf: that line tells Supervisor where to find additional configuration files, and it means that we can cleanly seperate our global configuration from our individual service configurations. Trust me when I say THAT’S SOMETHING YOU WANT TO DO.

To get my Graphite stack running, I just had to create a file named something like /etc/supervisor/conf.d/graphite.conf and start or restart supervisord. Here’s the configuration for my entire Graphite stack:

[program:gunicorn-graphite]
  command=/usr/bin/gunicorn_django -u www-data -g www-data -b unix:/var/tmp/graphite.sock --log-file=/opt/graphite/storage/log/webapp/gunicorn.log /opt/graphite/webapp/graphite/settings.py
  process_name=%(program_name)s
  autostart=true
  autorestart=true
  stopsignal=QUIT
  user=www-data

[program:carbon-cache]
  command=python /opt/graphite/bin/carbon-cache.py --debug start
  process_name=%(program_name)s
  autostart=true
  autorestart=true
  stopsignal=QUIT

[program:statsd]
  command=/usr/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js
  process_name=%(program_name)s
  autostart=true
  autorestart=true
  stopsignal=QUIT
  user=www-data

[program:unicorn-graphiti]
  directory=/opt/graphiti
  command=/local/rvm/bin/rvm ruby-1.9.3-p286 do bundle exec unicorn -c config/unicorn.rb -E production
  process_name=%(program_name)s
  autostart=true
  autorestart=true
  stopsignal=QUIT
  stopasgroup=true
  stopwaitsecs=5
  killasgroup=true
  user=www-data

For those watching, I didn’t lump these into a process group or break the config file into smaller logical units. Both would have been good ideas, but for the sake of this proof-of-concept I admit that I was a little lazy with best-practice. In actual production, you should probably have a good reason to ignore best-practice, and “I didn’t bother” or “I’m lazy” are never good reasons.

How do I control this thing once it’s started?

Meet supervisorctl, the general purpose interface to everything that supervisord manages. You run it as root (or via sudo), and when invoked interactively it looks a little bit like this:

ryan@ubuntu1204-knockaround:~# sudo supervisorctl
carbon-cache                     RUNNING    pid 1672, uptime 19 days, 6:53:57
gunicorn-graphite                RUNNING    pid 1674, uptime 19 days, 6:53:57
statsd                           RUNNING    pid 1671, uptime 19 days, 6:53:57
unicorn-graphiti                 RUNNING    pid 1673, uptime 19 days, 6:53:57
supervisor> help

default commands (type help <topic>):
=====================================
add    clear  fg        open  quit    remove  restart   start   stop  update 
avail  exit   maintail  pid   reload  reread  shutdown  status  tail  version

supervisor> 

So, at a glance, that’s a quick dashboard with the status of whatever you’re running, and a prompt that I used to run the help command. help lists all of the other commands available if you invoke it without an argument, and it will provide you with details for each of those commands in turn if you invoke them together. One of the things I like about supervisorctl is that it also works non-interactively — you can pass any of those commands to it on invocation, and it will simply run the command.

ryan@ubuntu1204-knockaround:~# supervisorctl help fg
fg <process>  Connect to a process in foreground mode
Press Ctrl+C to exit foreground
ryan@ubuntu1204-knockaround:~# supervisorctl help restart
restart <name>        Restart a process
restart <gname>:* Restart all processes in a group
restart <name> <name>   Restart multiple processes or groups
restart all     Restart all processes
ryan@ubuntu1204-knockaround:~# 

Why would I do this instead of writing init scripts?

Well, that’s up to you but if I were to make the argument, I’d base it on the strength and flexibility of the simple configuration options available for any defined programs. The ability to define process groups, start standard executable processes, or manage FastCGI process, and the robust tools provided for managing those processes make using Supervisor a veritable no-brainer on any stacks that I have to deploy home-brew daemons on.

Alternatively, a case could be made that the simple ini-style config format makes it easier to programmatically generate valid configurations for new daemons (lowering the barrier to implementation further still), with Supervisor providing XML-RPC and HTTP interfaces for service management. If you’re into empowering your users, then empowering them to restart services without requiring you to do it for them should have you salivating at the thought of your decreased operational-task load.

There’s also a number of 3rd party plugins for Supervisor that can extend listener functionality, integrate Supervisor with Nagios, or scale the number of processes spawned to match the number of CPU cores on any server. That’s pretty POWERFUL stuff to mix into your init system right there, and traditional init scripts don’t offer any of that functionality out of the box.

So what’s my subjective opinion?

I’ve been running this configuration for my Graphite stack since earlier this summer and it’s been absolutely rock solid. There’s definitely some quirks and caveats to keep in mind when using Supervisor:

  • If a daemon takes a while to start or stop and you don’t specify a timeout value then Supervisor will wind up stuck in a limited loop wherein it tries to spin up the same process repeatedly until it hits the restart threshold, always without killing previous attempts first. This causes a process pile-up. The configuration directives are very fine-grained, so you can tune the initialization and setup of each process you’re spawning, but there’s no way around the fact that a chance exists that one-size-fits-all configurations won’t work with every program you want to give Supervisor responsibility for.
  • Some programs just seem to stubbornly resist being passed off to Supervisor. The one that has given me the most trouble so far has been Resque (I am not alone in this). I still don’t have a perfectly functioning Resque configuration, but I will. And when I do, I’m going to post that thing errywhere.

  • Supervisor is a daemon in its own right, and as such it will consume resources of its own. They’re negligible but it’s still something you should keep in mind if you’re trying to absolutely maximize the resource usage of any given server.

Are these deal-breakers? Not for me — I love Supervisor. I wish I’d known abut it before I pulled a 48-hour marathon session of refactoring a number of home-brew init scripts at work. I like Supervisor so much that I actually never finished my bake-off with Monit because Supervisor solved my problems the first time around (that is bad science, by the way, and the correct put-down here would be ‘my experimental rigor has been found lacking’). I think that the nicest thing I can say about a tool is this: it solved the problem I had at the time and let me work on solving new problems instead. And that’s exactly what Supervisor has done.


  1. StatsD has actually been implemented in a number of languages. Here’s a great post on ServerZone detailing some of the more popular implementations. 
  2. If you care, I used RPMs from EPEL on the CentOS systems I needed to set this up on, and Canonical-provided DPKGs for the Ubuntu systems. If you’re not using a package manager, I’m not going to explain compiling software or installing Python packages to you because however you’re managing your installed software is probably wrong