gawk
GNU software that deals with regular expressions provides a number of
additional regexp operators. These operators are described in this
section, and are specific to gawk
; they are not available in other
awk
implementations.
Most of the additional operators are for dealing with word matching. For our purposes, a word is a sequence of one or more letters, digits, or underscores (`_').
\w
[[:alnum:]_]
.
\W
[^[:alnum:]_]
.
\<
/\<away/
matches `away', but not
`stowaway'.
\>
/stow\>/
matches `stow', but not `stowaway'.
\y
\B
/\Brat\B/
matches `crate', but it does not match `dirty rat'.
`\B' is essentially the opposite of `\y'.
There are two other operators that work on buffers. In Emacs, a
buffer is, naturally, an Emacs buffer. For other programs, the
regexp library routines that gawk
uses consider the entire
string to be matched as the buffer.
For awk
, since `^' and `$' always work in terms
of the beginning and end of strings, these operators don't add any
new capabilities. They are provided for compatibility with other GNU
software.
\`
\'
In other GNU software, the word boundary operator is `\b'. However,
that conflicts with the awk
language's definition of `\b'
as backspace, so gawk
uses a different letter.
An alternative method would have been to require two backslashes in the GNU operators, but this was deemed to be too confusing, and the current method of using `\y' for the GNU `\b' appears to be the lesser of two evils.
The various command line options
(see section Command Line Options)
control how gawk
interprets characters in regexps.
gawk
provide all the facilities of
POSIX regexps and the GNU regexp operators described
above.
However, interval expressions are not supported.
--posix
--traditional
awk
regexps are matched. The GNU operators
are not special, interval expressions are not available, and neither
are the POSIX character classes ([[:alnum:]]
and so on).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
--re-interval
Go to the first, previous, next, last section, table of contents.