February 20th, 2019
"Style is a set of different repeated microdecisions, each made the same way whenever it arises, even though the context may be different. [...] I believe that consistency underlies all principles of quality." -- Frederick P. Brooks, The Design of Design
With the above quote in mind, I strive to be eventually consistent in writing simple tools and support, implement, and consider the same interface, options, features, and behavioral traits where reasonably possible. These properties not surprisingly align with the Unix Philosophy, and some of them I've covered before. However, perhaps as a reminder to myself, here are my primary guidelines:
Most of my commands will support the same command-line options with the same, consistent meaning:
I almost always only use short options, because I'm lazy and don't like to type long options. If long options are needed, then two dashes are required; standard getopt(3) (or equivalent) behavior applies.
My tools frequently need to access different resources that require some form of authentication via a secret, a token, or a password. Passing passwords on the command-line has its own set of considerations and pitfalls, so my tools will generally support the following methods as arguments to the -p flag:
If none of the above options is provided, then my tool will prompt the user for a password on the tty, so as to allow input from stdin to continue to be processed (see below).
Specifying the user to authenticate as is done via the -u flag. If -u is not specified, the tool will use the $USER environment variable or, if that is not set, getlogin(2).
Most of my tools tend to operate on large sets of input, such as a long list of hostnames pushing the full command well beyond ARG_MAX. As such, my tools will accept input from stdin, but allow iteration over optional arguments:
$ <file-with-many-hostnames tool $ tool hostname1 hostname2 $ grep something file | tool hostname1 hostname2 hostname3
If, for some reason, I need to be able to support reading input from a file via e.g., a command-line option (-f or -c for a configuration file), then my tools will always also support an argument of - to read from stdin.
Whenever possible, I try to iterate over input as a stream and begin processing it as it comes in, one line at a time. That is, I try to avoid reading all input into memory and then work it off. This allows my tool to work as a true filter and not block process.
Errors encountered should not lead to the program aborting, but instead generating a suitable error message and moving on to the next entry.
One of my favorite signals (besides SIGSNOW) is SIGINFO. Given that my tools frequently have to operate on large input sets, I equally frequently would like to know how far along it is. Passing -v to generate verbose output may work, but I also often want no output but then be able to see which record the tool is currently working on.
For this, I install a signal handler to catch SIGINFO and spit out some statistic or diagnostic message. This way, I can run the tool in non-verbose mode, then hit Ctrl+T, et voilà, I can haz info!
$ <input tool [... time elapses ...] ^T load: 1.23 cmd: tool 57943 running 0.11u 0.05s Resolving 'hostname33' (33/120). [... time elapses ...] ^T load: 1.23 cmd: tool 57943 running 0.14u 0.07s Frobbing hobknobbin on 'hostname18' (18/120). [...] $
My tools generally produce line-based output on stdout, as you would expect. If the output consists of multiple fields, it is either space- or comma separated. (I have not yet gone so far as to develop a need to standardize on inspection of the OFS environment variable.)
If the output generated is more complex, then many of my tools support JSON as an alternative output format via the -j command-line option. This is particularly useful, since I can then more easily post-process the results via jq(1) and continue on with my pipe.
As noted above, output is generally generated on stdout. Input is read from stdin. That is, I try to avoid any and all file I/O wherever possible. However, sometimes it cannot be avoided. To ensure I do not fall victim of the many pitfalls of temporary files, my tools will generally:
This blog post covers the handling of temporary files in more detail.
Obviously not all of my tools always support all of the above options or behaviors. However, I've often found that with time I end up going back and adding them (or at least wish I had added them). That is, I'm striving for consistency, for my tools to behave more or less the same, and to consistently fit into the Unix ecosystem.
"Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency." -- Richard P. Gabriel, The Rise of Worse is Better
May my tools always be worse, they're better that way.
February 20th, 2019