Project Post-Mortem: Bash was the Wrong Choice

In one of my recent projects, the goal was to extend the functionality and to reduce code duplication of a set of UNIX shell scripts. In this blog post I will discuss the errors that I recognized after completing the project and when I had opportunities to spot them.

The shell scripts were used to manage the build configuration and dependencies of several pieces of software. They also contained logic to work around idiosyncrasies of the build environment, e.g., the OpenBLAS development package libopenblas-dev in Ubuntu Trusty ships without header files (compare the Trusty file list to the Xenial file list). The scripts were expected to work on any Linux distribution and on OS X. Note that there are many UNIX shells but some of their features have been standardized in POSIX.

The existing build scripts consisted of about 2000 lines of mostly POSIX-compliant shell scripts and initially, I wondered if I should continue to use shell scripts and restrict myself to POSIX shell features, if I should continue to use shell scripts but use more powerful Bash features, or if I should chose a different programming language altogether, e.g., Python. The scripts have to call many different programs and in this regard, writing POSIX-conforming shell scripts would be a natural approach offering the best portability compared to the other solutions. A Bash script would offer an equally natural approach and all Linux distributions known to me ship with Bash though this is not always the default shell. For example, current Debian-based distributions use the much faster Debian Almquist Shell (Dash) and the Debian live system grml uses Z shell (zsh) by default. The major drawback of shell scripts were the lack of data structures and language features. Specifically I was looking for

arrays or lists,
associative arrays, and
scoped variables.

When I began the project I was not familiar with all Bash-specific features and I was aware of that. Therefore, I decided to take a closer look at Bash because it was more portable than say, Python. If Bash offered all of the features above, then I would start to use it for the project. It turns out Bash always supported arrays, associative arrays were built in since release 4.0 (see the maintainer's release notes), and scoped variables are supported with the local built-in. Hence, I decided to use Bash. At this point I had the first opportunity to decide against Bash: OS X ships with old releases of UNIX command line tools and I knew this. As of February 2018, the latest OS X release is still shipping with Bash 3.2 and I was indeed saved by the package manager Brew.

The first issue I ran into was passing associative arrays as arguments to functions and returning them. Returning values from functions is not possible in Bash because the return value is an integer that is used for the exit status. Usually I prefer a functional programming style where I pass read-only arguments to the function and use the return value for the function outputs but this was clearly not possible in Bash with associative arrays. Instead of reviewing my choice of Bash scripts, I partially switched to an imperative programming style where a function would modify associative arrays given as arguments.

Passing associative arrays to functions is not directly possible in Bash (see this discussion) and instead you can

pass the list of keys and the list of values as two arguments and reconstruct the associative array with it or
make the associative array global, pass the variable name to the function and "dereference" it with the declare built-in (declare -n).

Since I only needed two associative arrays, I decided to make both of them global. This was in direct violation of my own requirements and the another opportunity to review the choice of Bash scripts.

Finally, error handling proved to be annoying. Error handling is challenging, especially if one wants to recover from an error, but luckily it is in many cases sufficient to print an informative error message and terminate the program. In practice, this is not possible with Bash: Functions can end script execution at any time by calling exit built-in but functions may be executed in a subshell and calling exit in a subshell does not terminate the calling shell. This problem was also noticed by the Bash developers and by calling set -e one can ask the interpreter to abort execution as soon as an error occurs. Unfortunately, this is purely heuristic because the exit status of programs is used during the evaluation of conditionals, e.g., executing set -e; false terminates the current shell with exit status 1 whereas set -e; if false; then echo 'true'; fi prints "true". Similarly, in pipelines the exit statuses of all programs except the last are ignored and statements of the form local var="$(program)" swallow the exit status, too, because local has an exit status of its own. I was aware that set -e works purely heuristically and this should have been a warning as well.

In conclusion, I ignored the following warning signs that Bash may be the wrong language for the project:

I did not check if Bash actually fulfilled all project requirements. In particular I did not check if Bash 4.0 is available on all target platforms.
I assumed that data structures can be passed to and returned from functions.
When I found out that data structures cannot be returned from functions or easily passed as arguments, I did not reconsider my choice of programming language.
I introduced global variables contrary to my personal preferences.
I assumed that I can stop script execution at any time.

Christoph Conrads' Blog

Project Post-Mortem: Bash was the Wrong Choice

Scientific Computing