Optimizing shell scripts
Introduction
I consider myself a fairly competent shell scripter. I typically prefer to program towards readabilty, but at the end tend to write towards terse code.
Usually, readability is important because other people (including myself in the future) will need to read the code and figure out what is going on.
I think because I used to write perl for a long time, I tend also to write fairly terse code. This is not something to be proud of.
The other day, I was writing a small shell script to generate a menu of links in markdown from a simple specially written text file.
Input Data
See input:
Desired Output
See output:
Approach
I initially wrote it on my PC with a comparatively faster CPU. The response time on my PC was about one second. When I transferred the script to my Web server (with an small CPU), the reponse time was 15 seconds to generate the menu. This was surprising to me, as I did not expect the performance difference between my PC and my Web server to be so massive.
I figured that this need to be optimized. The simplest way to do this is to add caching. Which I did very quickly. In reality, I needed to improve the code.
The first thing you need to do when optimizing code is to measure the effects of changes on the code. In order to do that, I created a simple test harness:
The options to the harness are:
-1
or--no-count
: run the code once. Essentially to check that the output is correct.--count=int
: Will run the code the specified number of times1
or2
: Run implementation 1 or implementation 2.
Run this with the time
built-in to measure running times.
So running:
sh th.cgi -1 1
or
sh th.cgi -1 2
Can be used to make sure that implementation 1 and 2 are correct.
Then, you can run:
time sh th.cgi --count=100 1
or
time sh th.cgi --count=100 2
This will show the how fast/slow the code peforms. For testing, I would also
test with busybox sh
, as that is the shell
that I have in my web server. On
my PC, sh
is bash
So the general approach is to measure the original performance, then make some changes and measure if the performance improves (or not).
Original Script
The original un-optimized code is:
Running this on my PC executes in almost 10 minutes.
Optimized Script
The optimized version is:
This version executes in 16 seconds.
So overall is a pretty good performance enhancement. So the approach I took is first to isolate what part of the code is the one that executes the slowest. This script essentially runs in a loop and runs on two halves, the first half is scanning lines, the second half is outputing markdown.
Commenting out the different parts of the code, I was able to determine that most of the time was spent in the scanning half.
The next step was to review the code and re-factor it to faster versions:
Moving invariant code out of loops
Move invariable code outside the loop. Assigning elem2o
and elem2c
was
originally done in the loop. This was moved outside the loop. This did not yield
much improvement, but in general this is always a good optimizaiton.
Replacing if/then/else with case
I was doing:
if (echo "$ARGS" | grep -q '>') ; then
...
else
...
fi
This actually forks and execs two additional commands and a subshell. This was replaced with:
case "$ARGS" in
*\>*)
...
;;
*)
...
;;
esac
Since this runs entirely inside the shell, this removes spawning commands and forking subshells.
Using IFS for parsing
I replaced:
local i=1
while [ -n "$(echo "$ARGS" | cut -d'>' -f$i-)" ] ; do
set - "$@" "$(echo "$ARGS" | cut -d'>' -f$i | xargs)"
i=$(expr $i + 1)
done
Which is a terrible way to parse a line. This was replaced with:
Adding -e 's/[ ]*>[ ]*/>/g'
to the sed
command at the top in front of
the loop, and then...
IFS=">" ; set - $ARGS; IFS="$oIFS"
The performance difference here is massive.
Conclusion
So there you have it. Turns out that my skills at scripting
are poor. On the other
hand it could be that I am so used writing shell scripts that I am not using the best
tools for the job. In this particular example, I shouldn't have started with shell
script but use something more suitable for text manipulation, such as perl or
awk. However, most scripting languages come with useful string processing
capabilities and would have done the job well.