Daemon-ize your processes on the cheap!

Or, Death to /etc/rc.local!

TL;DR, or The Executive Summary: Init scripts are hard. Here’s a bunch of UNIX background backing up an argument for using stand-alone Process Supervisors whenever you need a new instance of a custom daemon spun up.

The setup

Picture. if you will, a pile of code.

Yeah, that’s good. The software equivalent of THAT mess.

If you write software, good odds that you’ve written at least one of these steaming sadness piles. If you work in operations, there’s better odds that you’ve been handed at least one failure-pile (this week). You or someone a lot like you needed a message consumer, some new hotness message bus, a stand-alone process that just listens for specific connections and commands on some random port or socket, or just some long-running process that runs non-interactively in the background (this being the very definition of daemon, by the way) and now this thing needs to run in production. In the wild. In the world at large, with other, nicer or bigger daemons grinding up page-to-page with it.

Invariably, the communication or documentation attached to this pig-pile of meadow-muffin-code almost always looks like this:

“Oh, just execute this and Bob’s Your Uncle, we’re all set! Just stick it in /etc/rc.local!”

    LD_LIBRARY_PATH=/some/crazy/nonstandard/path \
    /path/to/some/executable \
      OPTION OPTION OPTION OPTION OPTION >> /dev/null 2>&1 &
Say whaaaaaaaaaat?

Yeah… we’re not doing that. Why? For so, so many good reasons — redefining global variable scope, no PID tracking, no management, it gives me hives, because I said so, etc. ad infinitum.

Ignore the insane number of positional arguments1 that were handed down in the instructions for running our newly minted buffalo chip and consider instead this nugget of hard truth: you can pretty much plan on every process it starts being killed on reboot by having QUIT, TERM, or KILL sent to it unceremoniously — that may or may not be a deal breaker right there because this application may well not have any exit handlers defined; QUIT might well leave it in an inconsistent state since it means “quit and dump core”. It doesn’t provide any mechanism of tracking the PID beyond judicious grep-ing and it doesn’t provide a simple status lookup mechanism.

What about rc.local?

I think that use of rc.local these days is also sort of sloppy; it’s basically an admission that you can’t be bothered to implement any sort of process management framework for daemons that you consider important enough to start every time the system is booted.

Well, what are my options?

At this point, you can either write an init script, daemonize your code2, or set up a Process Supervisor. Daemonizing is interesting if you’re the developer who wrote the Cleveland Steamer, but it’s out of scope if you’re just the poor bastard running Ops. I’ve beaten the init script drum for a long time but after years of fixing substandard, poorly written init scripts I’ve realized that writing a good init script is hard:

  • You can only do so much (sanely and safely) in Shell.

    Shell functions suck. True story.
    Bourne Shell and bash functions can only return process status (so basically
    0 or not 0). If you want to get some sort of data from them you’re
    echo-ing to STDOUT and capturing the output.

    That’s weak sauce. tcsh doesn’t do functions and zsh probably isn’t installed
    on your machine unless you specifically put it there. So look forward to limited
    string and interger handling abilities and a lot of roll-your-own string construction.
    Using ksh? You’re adorable.

    Fun fact: writing an init script in something other than a shell language is
    frowned upon — once in a while you come across an init system that forces its
    dependant scripts to run through /bin/sh. Not often, but sometimes. So remember,
    just because you can do something doesn’t always mean you should.

  • Shoving a process to the background (sanely and safely) is tricky.

    If an application doesn’t properly detach and background itself then you’re relying
    on the good old ampersand and
    reliable old nohup to get your cow pie off
    the console and running when the TTY hangs up.

  • Dropping privileges (sanely, safely, and correctly) is also tricky.

    I’ve seen a lot of easy-to-diagnose-in-hindsight mistakes made with
    sudo and su in home-brew init scripts… dropping privileges incorrectly is
    probably the most common error I see after output redirection errors (which are RAMPANT).

    If you’re looking for more detail, here’s a great
    Stack Overflow conversation about that.

  • Multiple instances are not tricky — they’re just plain hard or stupid.

    If you need to run the same code base in four separate instances with a different
    argument passed to each instance, you’re basically rolling your own loop around
    reading the contents of a .d directory someplace in /etc and writing configuration
    snippets (my preferred method), or duplicating the same init script four times
    and changing that value by hand in each instance. I am not really a fan of this
    method — every time we have to roll our own anything or edit something in place we
    introduce room for a whole host of new bugs that QE probably won’t catch.

Those are all bummers

I know! That’s what brought me around to the Process Supervisor school of thought! A Process Supervisor is a very robust (and usually very small) daemon that gets started as part of the standard init process but then manages its own defined list of dependent applications from that point on. It’s a bit like xinetd or daemontools, except that they’re usually a little more robust and flexible than xinetd (because they don’t sit around waiting for incoming connections to initialize and spin up dependents) and they’re usually not littering my man hier 7 with superfluous bits and bobs (…they’re probably also still supported). A Process Supervisor like God, Supervisor, or Monit will also handle local process monitoring for you. That means you get PID tracking, frozen process management, process reaping, process spawning, and service restarting for free. And it’ll do it all without having to write brittle init scripts and cronjobs to check process status every 5 minutes. We hates fragile, brittle cronjobs so much.

I’ve also begun to see tremendous maintenance and support value in divorcing the init system from the sun-baked mud-bricks of PID 1 entirely — Process Supervisor dependent configurations will migrate almost seamlessly if you’re siloing your application stack (interpreters, libraries, application code). This means that they can be packaged and repackaged easily… and that offers a very reasonable migration path between operating system variants, and that means that they can insulate you from some of the more political decisions made at the OS level (Upstart vs. SystemD vs. SysV3 being the one we’re most concerned with here). What a beautiful, logical chain of run-on sentences!

Now that we’ve powered through some background, pain points, and philosophy, let’s take a break. I’m working on part two, where I take a high level look at what I consider to be the two most service agnostic, highest quality options available: Supervisor and Monit. That should hopefully have some graphs. Nerds love graphs!

  1. There’s a ton of thought behind argument parsing and using named parameters vs. positional parameters (XKCD forum, Peter Szilagyi, Greg Wooledge, SHELLdorado). You should maybe read some of it if you’re writing a shell script that people (read: you and anyone who isn’t you) will have to use. There’s some excellent libraries available for almost any language, but if you’re passing more than a handful of arguments (positional or otherwise) to a shell script, please consider rewriting your tool in a more robust language. 
  2. Daemons, Daemon-kit, and Dante are sort of the go-to examples in Ruby land. Python has python-daemon, YapDi and a well-written overview of standard daemon behaviors. Full disclosure: I’ve contributed a few patches to Dante. If you really, really care about this sort of stuff you should read Jesse Storimer’s Working With Unix Processes
  3. Pop quiz: what init system does your production stack use? SysV Init (Pretty much everything before 2006, current Debian stable builds)? Upstart (Ubuntu since like 2006, RHEL/CentOS 6)? SystemD (Fedora, future Debian stable builds, and probably RHEL/CentOS 7 whenever they come out)? InitNG or RunIt (Maybe if you’re using Gentoo, Arch, or Linux from Scratch)? Accepted best practice is to target plain old vanilla SysV if you need portability, since Upstart and SystemD both provide compatibility wrappers (albeit to different degrees). 

, , , , , , , ,

Comments are closed.