TL;DR, or The Executive Summary: Init scripts are hard. Here’s a bunch of UNIX background backing up an argument for using stand-alone Process Supervisors whenever you need a new instance of a custom daemon spun up.
Picture. if you will, a pile of code.
Yeah, that’s good. The software equivalent of THAT mess.
If you write software, good odds that you’ve written at least one of these steaming sadness piles. If you work in operations, there’s better odds that you’ve been handed at least one failure-pile (this week). You or someone a lot like you needed a message consumer, some new hotness message bus, a stand-alone process that just listens for specific connections and commands on some random port or socket, or just some long-running process that runs non-interactively in the background (this being the very definition of daemon, by the way) and now this thing needs to run in production. In the wild. In the world at large, with other, nicer or bigger daemons grinding up page-to-page with it.
Invariably, the communication or documentation attached to this pig-pile of meadow-muffin-code almost always looks like this:
“Oh, just execute this and Bob’s Your Uncle, we’re all set! Just stick it in /etc/rc.local!”
LD_LIBRARY_PATH=/some/crazy/nonstandard/path \ /path/to/some/executable \ OPTION OPTION OPTION OPTION OPTION >> /dev/null 2>&1 &
Yeah… we’re not doing that. Why? For so, so many good reasons — redefining global variable scope, no PID tracking, no management, it gives me hives, because I said so, etc. ad infinitum.
Ignore the insane number of positional arguments1 that were handed down in the instructions for running our newly minted buffalo chip and consider instead this nugget of hard truth: you can pretty much plan on every process it starts being killed on reboot by having
KILL sent to it unceremoniously — that may or may not be a deal breaker right there because this application may well not have any exit handlers defined;
QUIT might well leave it in an inconsistent state since it means “quit and dump core”. It doesn’t provide any mechanism of tracking the PID beyond judicious
grep-ing and it doesn’t provide a simple status lookup mechanism.
What about rc.local?
I think that use of
rc.local these days is also sort of sloppy; it’s basically an admission that you can’t be bothered to implement any sort of process management framework for daemons that you consider important enough to start every time the system is booted.
Well, what are my options?
At this point, you can either write an
init script, daemonize your code2, or set up a Process Supervisor. Daemonizing is interesting if you’re the developer who wrote the Cleveland Steamer, but it’s out of scope if you’re just the poor bastard running Ops. I’ve beaten the
init script drum for a long time but after years of fixing substandard, poorly written
init scripts I’ve realized that writing a good
init script is hard:
- You can only do so much (sanely and safely) in Shell.
Shell functions suck. True story.
Bourne Shell and
bashfunctions can only return process status (so basically
0). If you want to get some sort of data from them you’re
STDOUTand capturing the output.
That’s weak sauce.
tcshdoesn’t do functions and
zshprobably isn’t installed
on your machine unless you specifically put it there. So look forward to limited
string and interger handling abilities and a lot of roll-your-own string construction.
ksh? You’re adorable.
Fun fact: writing an
initscript in something other than a shell language is
frowned upon — once in a while you come across an
initsystem that forces its
dependant scripts to run through
/bin/sh. Not often, but sometimes. So remember,
just because you can do something doesn’t always mean you should.
Shoving a process to the background (sanely and safely) is tricky.
If an application doesn’t properly detach and background itself then you’re relying
on the good old ampersand and
nohupto get your cow pie off
the console and running when the TTY hangs up.
Dropping privileges (sanely, safely, and correctly) is also tricky.
I’ve seen a lot of easy-to-diagnose-in-hindsight mistakes made with
initscripts… dropping privileges incorrectly is
probably the most common error I see after output redirection errors (which are RAMPANT).
If you’re looking for more detail, here’s a great
Stack Overflow conversation about that.
Multiple instances are not tricky — they’re just plain hard or stupid.
If you need to run the same code base in four separate instances with a different
argument passed to each instance, you’re basically rolling your own loop around
reading the contents of a
.ddirectory someplace in
/etcand writing configuration
snippets (my preferred method), or duplicating the same
initscript four times
and changing that value by hand in each instance. I am not really a fan of this
method — every time we have to roll our own anything or edit something in place we
introduce room for a whole host of new bugs that QE probably won’t catch.
Those are all bummers
I know! That’s what brought me around to the Process Supervisor school of thought! A Process Supervisor is a very robust (and usually very small) daemon that gets started as part of the standard
init process but then manages its own defined list of dependent applications from that point on. It’s a bit like
daemontools, except that they’re usually a little more robust and flexible than
xinetd (because they don’t sit around waiting for incoming connections to initialize and spin up dependents) and they’re usually not littering my
man hier 7 with superfluous bits and bobs (…they’re probably also still supported). A Process Supervisor like God, Supervisor, or Monit will also handle local process monitoring for you. That means you get PID tracking, frozen process management, process reaping, process spawning, and service restarting for free. And it’ll do it all without having to write brittle
init scripts and cronjobs to check process status every 5 minutes. We hates fragile, brittle cronjobs so much.
I’ve also begun to see tremendous maintenance and support value in divorcing the
init system from the sun-baked mud-bricks of
PID 1 entirely — Process Supervisor dependent configurations will migrate almost seamlessly if you’re siloing your application stack (interpreters, libraries, application code). This means that they can be packaged and repackaged easily… and that offers a very reasonable migration path between operating system variants, and that means that they can insulate you from some of the more political decisions made at the OS level (Upstart vs. SystemD vs. SysV3 being the one we’re most concerned with here). What a beautiful, logical chain of run-on sentences!
Now that we’ve powered through some background, pain points, and philosophy, let’s take a break. I’m working on part two, where I take a high level look at what I consider to be the two most service agnostic, highest quality options available: Supervisor and Monit. That should hopefully have some graphs. Nerds love graphs!
- There’s a ton of thought behind argument parsing and using named parameters vs. positional parameters (XKCD forum, Peter Szilagyi, Greg Wooledge, SHELLdorado). You should maybe read some of it if you’re writing a shell script that people (read: you and anyone who isn’t you) will have to use. There’s some excellent libraries available for almost any language, but if you’re passing more than a handful of arguments (positional or otherwise) to a shell script, please consider rewriting your tool in a more robust language. ↩
- Daemons, Daemon-kit, and Dante are sort of the go-to examples in Ruby land. Python has python-daemon, YapDi and a well-written overview of standard daemon behaviors. Full disclosure: I’ve contributed a few patches to Dante. If you really, really care about this sort of stuff you should read Jesse Storimer’s Working With Unix Processes. ↩
Pop quiz: what
initsystem does your production stack use? SysV Init (Pretty much everything before 2006, current Debian stable builds)? Upstart (Ubuntu since like 2006, RHEL/CentOS 6)? SystemD (Fedora, future Debian stable builds, and probably RHEL/CentOS 7 whenever they come out)? InitNG or RunIt (Maybe if you’re using Gentoo, Arch, or Linux from Scratch)? Accepted best practice is to target plain old vanilla SysV if you need portability, since Upstart and SystemD both provide compatibility wrappers (albeit to different degrees). ↩