[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gawk
-Specific Regexp Operators
GNU software that deals with regular expressions provides a number of
additional regexp operators. These operators are described in this
section and are specific to gawk
;
they are not available in other awk
implementations.
Most of the additional operators deal with word matching.
For our purposes, a word is a sequence of one or more letters, digits,
or underscores (`_'):
\w
[[:alnum:]_]
.
\W
[^[:alnum:]_]
.
\<
/\<away/
matches `away' but not
`stowaway'.
\>
/stow\>/
matches `stow' but not `stowaway'.
\y
\B
/\Brat\B/
matches `crate' but it does not match `dirty rat'.
`\B' is essentially the opposite of `\y'.
There are two other operators that work on buffers. In Emacs, a
buffer is, naturally, an Emacs buffer. For other programs,
gawk
's regexp library routines consider the entire
string to match as the buffer.
\`
\'
Because `^' and `$' always work in terms of the beginning
and end of strings, these operators don't add any new capabilities
for awk
. They are provided for compatibility with other
GNU software.
In other GNU software, the word-boundary operator is `\b'. However,
that conflicts with the awk
language's definition of `\b'
as backspace, so gawk
uses a different letter.
An alternative method would have been to require two backslashes in the
GNU operators, but this was deemed too confusing. The current
method of using `\y' for the GNU `\b' appears to be the
lesser of two evils.
The various command-line options
(see section Command-Line Options)
control how gawk
interprets characters in regexps:
gawk
provides all the facilities of
POSIX regexps and the
previously described
GNU regexp operators.
GNU regexp operators described
in Regular Expression Operators.
However, interval expressions are not supported.
--posix
--traditional
awk
regexps are matched. The GNU operators
are not special, interval expressions are not available, nor
are the POSIX character classes ([[:alnum:]]
and so on).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
--re-interval
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |