The sed
utility is a "stream editor," a program that reads a
stream of data, makes changes to it, and passes the modified data on.
It is often used to make global changes to a large file, or to a stream
of data generated by a pipeline of commands.
While sed
is a complicated program in its own right, its most common
use is to perform global substitutions in the middle of a pipeline:
command1 < orig.data | sed 's/old/new/g' | command2 > result
Here, the `s/old/new/g' tells sed
to look for the regexp
`old' on each input line, and replace it with the text `new',
globally (i.e. all the occurrences on a line). This is similar to
awk
's gsub
function
(see section Built-in Functions for String Manipulation).
The following program, `awksed.awk', accepts at least two command line arguments; the pattern to look for and the text to replace it with. Any additional arguments are treated as data file names to process. If none are provided, the standard input is used.
# awksed.awk --- do s/foo/bar/g using just print # Thanks to Michael Brennan for the idea # Arnold Robbins, [email protected], Public Domain # August 1995 function usage() { print "usage: awksed pat repl [files...]" > "/dev/stderr" exit 1 } BEGIN { # validate arguments if (ARGC < 3) usage() RS = ARGV[1] ORS = ARGV[2] # don't use arguments as files ARGV[1] = ARGV[2] = "" } # look ma, no hands! { if (RT == "") printf "%s", $0 else print }
The program relies on gawk
's ability to have RS
be a regexp
and on the setting of RT
to the actual text that terminated the
record (see section How Input is Split into Records).
The idea is to have RS
be the pattern to look for. gawk
will automatically set $0
to the text between matches of the pattern.
This is text that we wish to keep, unmodified. Then, by setting ORS
to the replacement text, a simple print
statement will output the
text we wish to keep, followed by the replacement text.
There is one wrinkle to this scheme, which is what to do if the last record
doesn't end with text that matches RS
? Using a print
statement unconditionally prints the replacement text, which is not correct.
However, if the file did not end in text that matches RS
, RT
will be set to the null string. In this case, we can print $0
using
printf
(see section Using printf
Statements for Fancier Printing).
The BEGIN
rule handles the setup, checking for the right number
of arguments, and calling usage
if there is a problem. Then it sets
RS
and ORS
from the command line arguments, and sets
ARGV[1]
and ARGV[2]
to the null string, so that they will
not be treated as file names
(see section Using ARGC
and ARGV
).
The usage
function prints an error message and exits.
Finally, the single rule handles the printing scheme outlined above,
using print
or printf
as appropriate, depending upon the
value of RT
.
Go to the first, previous, next, last section, table of contents.