Bash shell script tips
Monday, 23 February 2015
Do you write shell scripts using bash?If so, do you know what the "-e" option does? How about "-x"?
Every "bash" user should know about these invaluable options. They are off by default. Every "bash" user should turn them on whenever possible when writing or maintaining scripts. They can be turned on when a script is started, and they can be turned on and off while a script is running.
Here is what they do:
- "-e" means "Exit immediately if a command exits with a non-zero status."
- "-x" means "Print commands and their arguments as they are executed."
"-x" is useful because it prints an execution trace of your bash script. The trace is printed to standard error. You can redirect it to a file, and then look through it to see where your script went wrong. This is a primitive but effective form of debugging which requires no special tools. To illustrate the feature, here is a short bash script. Notice that -x has been placed on the first line:
#!/bin/bash -xWhen executed, this script prints the following to standard output:
for i in 1 2 3
do
echo $i
done
1and the following to standard error:
2
3
+ for i in 1 2 3The standard error output is a trace of each command that was executed. Variables ($i) have been expanded.
+ echo 1
+ for i in 1 2 3
+ echo 2
+ for i in 1 2 3
+ echo 3
"-e" is even more useful. It causes the bash script to exit immediately if one of the commands exits with a non-zero exit status. Almost all of the Unix commands exit with a non-zero code if something goes wrong. Generally, non-zero means "error". For instance,
- "cp" will exit non-zero if the copy couldn't be carried out,
- "tar" will exit non-zero if there was an error unpacking an archive,
- "cmp" will exit non-zero if the two files given to "cmp" were not identical,
- "gcc" and "make" will exit non-zero if compilation failed (e.g. syntax error in your code),
- "cd" will exit non-zero if the target directory is not accessible.
The great benefit of "-e" is that your script will stop instantly if any command fails. Without "-e", your bash script blunders onwards, blindly ignoring the earlier failure. Maybe this will just waste your time: the build script failed to patch a source file in an early step, and that caused the final link to fail. Or maybe it will be more harmful. To see the potential for disaster, consider the following bash script:
#!/bin/bash -xThe intention of the script is to delete all the files and directories in /home/jack/build_directory. But what if /home/jack/build_directory does not exist? Thanks to "-x", we can see the disaster unfold:
cd /home/jack
cd build_directory
rm -rf -- *
+ cd /home/jackOh dear. The script deletes all files in /home/jack! Hopefully, jack has a backup. Disaster is averted if we use "-e":
+ cd build_directory
./s: line 3: cd: build_directory: No such file or directory
+ rm -rf -- file1 file2 file3 ...
#!/bin/bash -exThis time:
cd /home/jack
cd build_directory
rm -rf -- *
+ cd /home/jackThe script now exits (with a non-zero exit code) before "rm" is reached. No files are deleted.
+ cd build_directory
./s: line 3: cd: build_directory: No such file or directory
In some cases, "-e" carries a slight disadvantage, because you may be quite happy for a command to fail. For instance, suppose you want to try to delete a file called "foobar", but you want the script to continue running even if "foobar" can't be deleted. You might write the following script:
#!/bin/bash -exAlas, "/tmp/foobar" exists and is owned by "root", so you see:
cd /tmp
rm -f foobar
echo hello
+ cd /tmpThe script stops running here. In this case, you should tell bash to ignore errors for that command only. Here is how I would do it:
+ rm -f foobar
rm: cannot remove `foobar': Operation not permitted
#!/bin/bash -exThe "||" is a short-circuit OR operation, just like "||" in the C language, and it means that if the command on the left-hand side fails, then the command on the right-hand side should be used instead. The "true" program always exits with a zero. So, while the "rm" command fails, the statement succeeds. The output is:
cd /tmp
rm -f foobar || true
echo hello
+ cd /tmpThe "echo" command is also a nice replacement for "true", because you can use it to print an error message, like:
+ rm -f foobar
rm: cannot remove `foobar': Operation not permitted
+ true
+ echo hello
hello
rm -f foobar || echo "Unable to delete foobar!!"You can even force an exit with a specific exit code:
rm -f foobar || exit 123If you want to disable error-checking for more than one command - potentially the rest of the script - you can do so with the command "set +e". You can re-enable it with "set -e". However, don't. This prevents "-e" helping you.
Shell scripting errors turn up all the time, if you are unlucky, or looking for them. One common mistake is to try to use a variable that does not exist. Here is a high-profile example of that mistake. Many programming languages would exit with an error if you tried to use an undefined variable, but Unix shell scripts do not. Not even with "-e". However, "-e" and "-x" will help to track down such mistakes. For example, a command like "cp $SRC $DEST" will fail with an error if either (or both) of $SRC or $DEST are undefined. You will see the mistake in the "-x" output, just before the script exited.
Failing to check error return codes is a general programming problem. It is one of the reasons why languages began to adopt exceptions as a mechanism for error handling. Lazy programmers ignore return codes, and this is a great source of bugs. A program that ignores return codes will probably crash eventually, but the crash might not happen at the point where the mistake was made, which is good fun for the maintenance programmer trying to trace the bug. Here's a real example. Unlike return codes, exceptions will propagate if they are not explicitly caught, so the programmer is forced to handle errors or allow them to be propagated to some other handler, higher up the call stack. Of course, the lazy programmer still has a way to catch all exceptions, so this does not always help.
Conclusion: use "-x" and "-e" in your bash scripts. It will save you time. It will save other people time. (Think about the maintenance programmer!)
Use "-e", at least, in all new scripts. If you are maintaining an old script, it may not be easy to add "-e", because the old script may depend on not exiting on error somewhere. In this case, think carefully about error handling. Consider adding code like my "||" example to explicitly detect errors for each line that might fail, calling "exit" on error, or otherwise doing something helpful. Consider using "set -e" to temporarily enable error checking.