Go to the first, previous, next, last section, table of contents.


Processing Command Line Options

Most utilities on POSIX compatible systems take options or "switches" on the command line that can be used to change the way a program behaves. awk is an example of such a program (see section Command Line Options). Often, options take arguments, data that the program needs to correctly obey the command line option. For example, awk's `-F' option requires a string to use as the field separator. The first occurrence on the command line of either `--' or a string that does not begin with `-' ends the options.

Most Unix systems provide a C function named getopt for processing command line arguments. The programmer provides a string describing the one letter options. If an option requires an argument, it is followed in the string with a colon. getopt is also passed the count and values of the command line arguments, and is called in a loop. getopt processes the command line arguments for option letters. Each time around the loop, it returns a single character representing the next option letter that it found, or `?' if it found an invalid option. When it returns -1, there are no options left on the command line.

When using getopt, options that do not take arguments can be grouped together. Furthermore, options that take arguments require that the argument be present. The argument can immediately follow the option letter, or it can be a separate command line argument.

Given a hypothetical program that takes three command line options, `-a', `-b', and `-c', and `-b' requires an argument, all of the following are valid ways of invoking the program:

prog -a -b foo -c data1 data2 data3
prog -ac -bfoo -- data1 data2 data3
prog -acbfoo data1 data2 data3

Notice that when the argument is grouped with its option, the rest of the command line argument is considered to be the option's argument. In the above example, `-acbfoo' indicates that all of the `-a', `-b', and `-c' options were supplied, and that `foo' is the argument to the `-b' option.

getopt provides four external variables that the programmer can use.

optind
The index in the argument value array (argv) where the first non-option command line argument can be found.
optarg
The string value of the argument to an option.
opterr
Usually getopt prints an error message when it finds an invalid option. Setting opterr to zero disables this feature. (An application might wish to print its own error message.)
optopt
The letter representing the command line option. While not usually documented, most versions supply this variable.

The following C fragment shows how getopt might process command line arguments for awk.

int
main(int argc, char *argv[])
{
    ...
    /* print our own message */
    opterr = 0;
    while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
        switch (c) {
        case 'f':    /* file */
            ...
            break;
        case 'F':    /* field separator */
            ...
            break;
        case 'v':    /* variable assignment */
            ...
            break;
        case 'W':    /* extension */
            ...
            break;
        case '?':
        default:
            usage();
            break;
        }
    }
    ...
}

As a side point, gawk actually uses the GNU getopt_long function to process both normal and GNU-style long options (see section Command Line Options).

The abstraction provided by getopt is very useful, and would be quite handy in awk programs as well. Here is an awk version of getopt. This function highlights one of the greatest weaknesses in awk, which is that it is very poor at manipulating single characters. Repeated calls to substr are necessary for accessing individual characters (see section Built-in Functions for String Manipulation).

The discussion walks through the code a bit at a time.

# getopt --- do C library getopt(3) function in awk
#
# [email protected]
# Public domain
#
# Initial version: March, 1991
# Revised: May, 1993

# External variables:
#    Optind -- index of ARGV for first non-option argument
#    Optarg -- string value of argument to current option
#    Opterr -- if non-zero, print our own diagnostic
#    Optopt -- current option letter

# Returns
#    -1     at end of options
#    ?      for unrecognized option
#    <c>    a character representing the current option

# Private Data
#    _opti  index in multi-flag option, e.g., -abc

The function starts out with some documentation: who wrote the code, and when it was revised, followed by a list of the global variables it uses, what the return values are and what they mean, and any global variables that are "private" to this library function. Such documentation is essential for any program, and particularly for library functions.

function getopt(argc, argv, options,    optl, thisopt, i)
{
    optl = length(options)
    if (optl == 0)        # no options given
        return -1

    if (argv[Optind] == "--") {  # all done
        Optind++
        _opti = 0
        return -1
    } else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) {
        _opti = 0
        return -1
    }

The function first checks that it was indeed called with a string of options (the options parameter). If options has a zero length, getopt immediately returns -1.

The next thing to check for is the end of the options. A `--' ends the command line options, as does any command line argument that does not begin with a `-'. Optind is used to step through the array of command line arguments; it retains its value across calls to getopt, since it is a global variable.

The regexp used, /^-[^: \t\n\f\r\v\b]/, is perhaps a bit of overkill; it checks for a `-' followed by anything that is not whitespace and not a colon. If the current command line argument does not match this pattern, it is not an option, and it ends option processing.

    if (_opti == 0)
        _opti = 2
    thisopt = substr(argv[Optind], _opti, 1)
    Optopt = thisopt
    i = index(options, thisopt)
    if (i == 0) {
        if (Opterr)
            printf("%c -- invalid option\n",
                                  thisopt) > "/dev/stderr"
        if (_opti >= length(argv[Optind])) {
            Optind++
            _opti = 0
        } else
            _opti++
        return "?"
    }

The _opti variable tracks the position in the current command line argument (argv[Optind]). In the case that multiple options were grouped together with one `-' (e.g., `-abx'), it is necessary to return them to the user one at a time.

If _opti is equal to zero, it is set to two, the index in the string of the next character to look at (we skip the `-', which is at position one). The variable thisopt holds the character, obtained with substr. It is saved in Optopt for the main program to use.

If thisopt is not in the options string, then it is an invalid option. If Opterr is non-zero, getopt prints an error message on the standard error that is similar to the message from the C version of getopt.

Since the option is invalid, it is necessary to skip it and move on to the next option character. If _opti is greater than or equal to the length of the current command line argument, then it is necessary to move on to the next one, so Optind is incremented and _opti is reset to zero. Otherwise, Optind is left alone and _opti is merely incremented.

In any case, since the option was invalid, getopt returns `?'. The main program can examine Optopt if it needs to know what the invalid option letter actually was.

    if (substr(options, i + 1, 1) == ":") {
        # get option argument
        if (length(substr(argv[Optind], _opti + 1)) > 0)
            Optarg = substr(argv[Optind], _opti + 1)
        else
            Optarg = argv[++Optind]
        _opti = 0
    } else
        Optarg = ""

If the option requires an argument, the option letter is followed by a colon in the options string. If there are remaining characters in the current command line argument (argv[Optind]), then the rest of that string is assigned to Optarg. Otherwise, the next command line argument is used (`-xFOO' vs. `-x FOO'). In either case, _opti is reset to zero, since there are no more characters left to examine in the current command line argument.

    if (_opti == 0 || _opti >= length(argv[Optind])) {
        Optind++
        _opti = 0
    } else
        _opti++
    return thisopt
}

Finally, if _opti is either zero or greater than the length of the current command line argument, it means this element in argv is through being processed, so Optind is incremented to point to the next element in argv. If neither condition is true, then only _opti is incremented, so that the next option letter can be processed on the next call to getopt.

BEGIN {
    Opterr = 1    # default is to diagnose
    Optind = 1    # skip ARGV[0]

    # test program
    if (_getopt_test) {
        while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
            printf("c = <%c>, optarg = <%s>\n",
                                       _go_c, Optarg)
        printf("non-option arguments:\n")
        for (; Optind < ARGC; Optind++)
            printf("\tARGV[%d] = <%s>\n",
                                    Optind, ARGV[Optind])
    }
}

The BEGIN rule initializes both Opterr and Optind to one. Opterr is set to one, since the default behavior is for getopt to print a diagnostic message upon seeing an invalid option. Optind is set to one, since there's no reason to look at the program name, which is in ARGV[0].

The rest of the BEGIN rule is a simple test program. Here is the result of two sample runs of the test program.

$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
-| c = <a>, optarg = <>
-| c = <c>, optarg = <>
-| c = <b>, optarg = <ARG>
-| non-option arguments:
-|         ARGV[3] = <bax>
-|         ARGV[4] = <-x>

$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
-| c = <a>, optarg = <>
error--> x -- invalid option
-| c = <?>, optarg = <>
-| non-option arguments:
-|         ARGV[4] = <xyz>
-|         ARGV[5] = <abc>

The first `--' terminates the arguments to awk, so that it does not try to interpret the `-a' etc. as its own options.

Several of the sample programs presented in section Practical awk Programs, use getopt to process their arguments.


Go to the first, previous, next, last section, table of contents.