[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Several kinds of tasks occur repeatedly
when working with text files.
You might want to extract certain lines and discard the rest.
Or you may need to make changes wherever certain patterns appear,
but leave the rest of the file alone.
Writing single-use programs for these tasks in languages such as C, C++ or Pascal
is time-consuming and inconvenient.
Such jobs are often easier with awk
.
The awk
utility interprets a special-purpose programming language
that makes it easy to handle simple data-reformatting jobs.
The GNU implementation of awk
is called gawk
; it is fully
compatible with the System V Release 4 version of
awk
. gawk
is also compatible with the POSIX
specification of the awk
language. This means that all
properly written awk
programs should work with gawk
.
Thus, we usually don't distinguish between gawk
and other
awk
implementations.
In addition,
gawk
provides facilities that make it easy to:
This Web page teaches you about the awk
language and
how you can use it effectively. You should already be familiar with basic
system commands, such as cat
and ls
,(1) as well as basic shell
facilities, such as Input/Output (I/O) redirection and pipes.
Implementations of the awk
language are available for many
different computing environments. This Web page, while describing
the awk
language in general, also describes the particular
implementation of awk
called gawk
(which stands for
"GNU awk"). gawk
runs on a broad range of Unix systems,
ranging from 80386 PC-based computers, up through large-scale systems,
such as Crays. gawk
has also been ported to Mac OS X,
MS-DOS, Microsoft Windows (all versions) and OS/2 PC's, Atari and Amiga
micro-computers, BeOS, Tandem D20, and VMS.
History of awk
andgawk
The history of gawk
andawk
.
1.0 A Rose by Any Other Name What name to use to find awk
.1.1 Using This Book Using this Web page. Includes sample input files that you can use. 1.2 Typographical Conventions The GNU Project and This Book Brief history of the GNU project and this Web page. How to Contribute Helping to save the world. Acknowledgments
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
awk
and gawk
1 part egrep | 1 part snobol |
|
2 parts ed | 3 parts C |
Blend all parts well usinglex
andyacc
. Document minimally and release.After eight years, add another part
egrep
and two more parts C. Document very well and release.
The name awk
comes from the initials of its designers: Alfred V.
Aho, Peter J. Weinberger and Brian W. Kernighan. The original version of
awk
was written in 1977 at AT&T Bell Laboratories.
In 1985, a new version made the programming
language more powerful, introducing user-defined functions, multiple input
streams, and computed regular expressions.
This new version became widely available with Unix System V
Release 3.1 (SVR3.1).
The version in SVR4 added some new features and cleaned
up the behavior in some of the "dark corners" of the language.
The specification for awk
in the POSIX Command Language
and Utilities standard further clarified the language.
Both the gawk
designers and the original Bell Laboratories awk
designers provided feedback for the POSIX specification.
Paul Rubin wrote the GNU implementation, gawk
, in 1986.
Jay Fenlason completed it, with advice from Richard Stallman. John Woods
contributed parts of the code as well. In 1988 and 1989, David Trueman, with
help from me, thoroughly reworked gawk
for compatibility
with the newer awk
.
Circa 1995, I became the primary maintainer.
Current development focuses on bug fixes,
performance improvements, standards compliance, and occasionally, new features.
In May of 1997, Jürgen Kahrs felt the need for network access
from awk
, and with a little help from me, set about adding
features to do this for gawk
. At that time, he also
wrote the bulk of
TCP/IP Internetworking with gawk
(a separate document, available as part of the gawk
distribution).
His code finally became part of the main gawk
distribution
with gawk
version 3.1.
See section Major Contributors to gawk
,
for a complete list of those who made important contributions to gawk
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The awk
language has evolved over the years. Full details are
provided in The Evolution of the awk
Language.
The language described in this Web page
is often referred to as "new awk
" (nawk
).
Because of this, many systems have multiple
versions of awk
.
Some systems have an awk
utility that implements the
original version of the awk
language and a nawk
utility
for the new
version.
Others have an oawk
for the "old awk
"
language and plain awk
for the new one. Still others only
have one version, which is usually the new one.(2)
All in all, this makes it difficult for you to know which version of
awk
you should run when writing your programs. The best advice
I can give here is to check your local documentation. Look for awk
,
oawk
, and nawk
, as well as for gawk
.
It is likely that you already
have some version of new awk
on your system, which is what
you should use when running your programs. (Of course, if you're reading
this Web page, chances are good that you have gawk
!)
Throughout this Web page, whenever we refer to a language feature
that should be available in any complete implementation of POSIX awk
,
we simply use the term awk
. When referring to a feature that is
specific to the GNU implementation, we use the term gawk
.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing.
Dick Brandon
The term awk
refers to a particular program as well as to the language you
use to tell this program what to do. When we need to be careful, we call
the program "the awk
utility" and the language "the awk
language."
This Web page explains
both the awk
language and how to run the awk
utility.
The term awk
program refers to a program written by you in
the awk
programming language.
Primarily, this Web page explains the features of awk
,
as defined in the POSIX standard. It does so in the context of the
gawk
implementation. While doing so, it also
attempts to describe important differences between gawk
and other awk
implementations.(3) Finally, any gawk
features that are not in
the POSIX standard for awk
are noted.
This Web page has the difficult task of being both a tutorial and a reference. If you are a novice, feel free to skip over details that seem too complex. You should also ignore the many cross references; they are for the expert user and for the online Info version of the document.
There are subsections labelled as Advanced Notes scattered throughout the Web page. They add a more complete explanation of points that are relevant, but not likely to be of interest on first reading. All appear in the index, under the heading "advanced notes."
Most of the time, the examples use complete awk
programs.
In some of the more advanced sections, only the part of the awk
program that illustrates the concept currently being described is shown.
While this Web page is aimed principally at people who have not been
exposed
to awk
, there is a lot of information here that even the awk
expert should find useful. In particular, the description of POSIX
awk
and the example programs in
A Library of awk
Functions, and in
Practical awk
Programs,
should be of interest.
Getting Started with awk
,
provides the essentials you need to know to begin using awk
.
Regular Expressions,
introduces regular expressions in general, and in particular the flavors
supported by POSIX awk
and gawk
.
Reading Input Files,
describes how awk
reads your data.
It introduces the concepts of records and fields, as well
as the getline
command.
I/O redirection is first described here.
Printing Output,
describes how awk
programs can produce output with
print
and printf
.
6. Expressions, describes expressions, which are the basic building blocks for getting most things done in a program.
Patterns Actions and Variables,
describes how to write patterns for matching records, actions for
doing something when a record is matched, and the built-in variables
awk
and gawk
use.
Arrays in awk
,
covers awk
's one-and-only data structure: associative arrays.
Deleting array elements and whole arrays is also described, as well as
sorting arrays in gawk
.
9. Functions,
describes the built-in functions awk
and
gawk
provide for you, as well as how to define
your own functions.
Internationalization with gawk
,
describes special features in gawk
for translating program
messages into different languages at runtime.
Advanced Features of gawk
,
describes a number of gawk
-specific advanced features.
Of particular note
are the abilities to have two-way communications with another process,
perform TCP/IP networking, and
profile your awk
programs.
Running awk
and gawk
,
describes how to run gawk
, the meaning of its
command-line options, and how it finds awk
program source files.
A Library of awk
Functions, and
Practical awk
Programs,
provide many sample awk
programs.
Reading them allows you to see awk
being used
for solving real problems.
The Evolution of the awk
Language,
describes how the awk
language has evolved since it was
first released to present. It also describes how gawk
has acquired features over time.
Installing gawk
,
describes how to get gawk
, how to compile it
under Unix, and how to compile and use it on different
non-Unix systems. It also describes how to report bugs
in gawk
and where to get three other freely
available implementations of awk
.
Implementation Notes,
describes how to disable gawk
's extensions, as
well as how to contribute new code to gawk
,
how to write extension libraries, and some possible
future directions for gawk
development.
Basic Programming Concepts, provides some very cursory background material for those who are completely unfamiliar with computer programming. Also centralized there is a discussion of some of the issues involved in using floating-point numbers.
The Glossary, defines most, if not all, the significant terms used throughout the book. If you find terms that you aren't familiar with, try looking them up.
GNU General Public License, and
GNU Free Documentation License,
present the licenses that cover the gawk
source code,
and this Web page, respectively.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This Web page is written using Texinfo, the GNU documentation formatting language. A single Texinfo source file is used to produce both the printed and online versions of the documentation. This section briefly documents the typographical conventions used in Texinfo.
Examples you would type at the command-line are preceded by the common shell primary and secondary prompts, `$' and `>'. Output from the command is preceded by the glyph "-|". This typically represents the command's standard output. Error messages, and other output on the command's standard error, are preceded by the glyph "error-->". For example:
$ echo hi on stdout -| hi on stdout $ echo hello on stderr 1>&2 error--> hello on stderr |
Characters that you type at the keyboard look like this. In particular, there are special characters called "control characters." These are characters that you type by holding down both the CONTROL key and another key, at the same time. For example, a Ctrl-d is typed by first pressing and holding the CONTROL key, next pressing the d key and finally releasing both keys.
Dark corners are basically fractal -- no matter how much you illuminate, there's always a smaller but darker one.
Brian Kernighan
Until the POSIX standard (and The Gawk Manual),
many features of awk
were either poorly documented or not
documented at all. Descriptions of such features
(often called "dark corners") are noted in this Web page with
"(d.c.)".
They also appear in the index under the heading "dark corner."
As noted by the opening quote, though, any coverage of dark corners is, by definition, something that is incomplete.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Software is like sex: it's better when it's free.
Linus Torvalds
The Free Software Foundation (FSF) is a non-profit organization dedicated to the production and distribution of freely distributable software. It was founded by Richard M. Stallman, the author of the original Emacs editor. GNU Emacs is the most widely used version of Emacs today.
The GNU(4)
Project is an ongoing effort on the part of the Free Software
Foundation to create a complete, freely distributable, POSIX-compliant
computing environment.
The FSF uses the "GNU General Public License" (GPL) to ensure that
their software's
source code is always available to the end user. A
copy of the GPL is included
in this Web page
for your reference
(see section GNU General Public License).
The GPL applies to the C language source code for gawk
.
To find out more about the FSF and the GNU Project online,
see the GNU Project's home page.
This Web page may also be read from
their web site.
A shell, an editor (Emacs), highly portable optimizing C, C++, and
Objective-C compilers, a symbolic debugger and dozens of large and
small utilities (such as gawk
), have all been completed and are
freely available. The GNU operating
system kernel (the HURD), has been released but is still in an early
stage of development.
Until the GNU operating system is more fully developed, you should
consider using GNU/Linux, a freely distributable, Unix-like operating
system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other
systems.(5)
There are
many books on GNU/Linux. One that is freely available is Linux
Installation and Getting Started, by Matt Welsh.
Many GNU/Linux distributions are often available in computer stores or
bundled on CD-ROMs with books about Linux.
(There are three other freely available, Unix-like operating systems for
80386 and other systems: NetBSD, FreeBSD, and OpenBSD. All are based on the
4.4-Lite Berkeley Software Distribution, and they use recent versions
of gawk
for their versions of awk
.)
The Web page you are reading now is actually free--at least, the
information in it is free to anyone. The machine readable
source code for the Web page comes with gawk
; anyone
may take this Web page to a copying machine and make as many
copies of it as they like. (Take a moment to check the Free Documentation
License; see GNU Free Documentation License.)
Although you could just print it out yourself, bound books are much easier to read and use. Furthermore, the proceeds from sales of this book go back to the FSF to help fund development of more free software.
The Web page itself has gone through a number of previous editions.
Paul Rubin wrote the very first draft of The GAWK Manual;
it was around 40 pages in size.
Diane Close and Richard Stallman improved it, yielding a
version that was
around 90 pages long and barely described the original, "old"
version of awk
.
I started working with that version in the fall of 1988.
As work on it progressed,
the FSF published several preliminary versions (numbered 0.x).
In 1996, Edition 1.0 was released with gawk
3.0.0.
The FSF published the first two editions under
the title The GNU Awk User's Guide.
This edition maintains the basic structure of Edition 1.0,
but with significant additional material, reflecting the host of new features
in gawk
version 3.1.
Of particular note is
Sorting Array Values and Indices with gawk
,
as well as
Using gawk
's Bit Manipulation Functions,
Internationalization with gawk
,
and also
Advanced Features of gawk
,
and
Adding New Built-in Functions to gawk
.
GAWK: Effective AWK Programming will undoubtedly continue to evolve.
An electronic version
comes with the gawk
distribution from the FSF.
If you find an error in this Web page, please report it!
See section Reporting Problems and Bugs, for information on submitting
problem reports electronically, or write to me in care of the publisher.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As the maintainer of GNU awk
,
I am starting a collection of publicly available awk
programs.
For more information,
see ftp://ftp.freefriends.org/arnold/Awkstuff.
If you have written an interesting awk
program, or have written a
gawk
extension that you would like to
share with the rest of the world, please contact me ([email protected]).
Making things available on the Internet helps keep the
gawk
distribution down to manageable size.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The initial draft of The GAWK Manual had the following acknowledgments:
Many people need to be thanked for their assistance in producing this manual. Jay Fenlason contributed many ideas and sample programs. Richard Mlynarik and Robert Chassell gave helpful comments on drafts of this manual. The paper A Supplemental Document forawk
by John W. Pierce of the Chemistry Department at UC San Diego, pinpointed several issues relevant both toawk
implementation and to this manual, that would otherwise have escaped us.
I would like to acknowledge Richard M. Stallman, for his vision of a better world and for his courage in founding the FSF and starting the GNU project.
The following people (in alphabetical order) provided helpful comments on various versions of this book, up to and including this edition. Rick Adams, Nelson H.F. Beebe, Karl Berry, Dr. Michael Brennan, Rich Burridge, Claire Coutier, Diane Close, Scott Deifik, Christopher ("Topher") Eliot, Jeffrey Friedl, Dr. Darrel Hankerson, Michal Jaegermann, Dr. Richard J. LeBlanc, Michael Lijewski, Pat Rankin, Miriam Robbins, Mary Sheehan, and Chuck Toporek.
Robert J. Chassell provided much valuable advice on the use of Texinfo. He also deserves special thanks for convincing me not to title this Web page How To Gawk Politely. Karl Berry helped significantly with the TeX part of Texinfo.
I would like to thank Marshall and Elaine Hartholz of Seattle and
Dr. Bert and Rita Schreiber of Detroit for large amounts of quiet vacation
time in their homes, which allowed me to make significant progress on
this Web page and on gawk
itself.
Phil Hughes of SSC contributed in a very important way by loaning me his laptop GNU/Linux system, not once, but twice, which allowed me to do a lot of work while away from home.
David Trueman deserves special credit; he has done a yeoman job
of evolving gawk
so that it performs well and without bugs.
Although he is no longer involved with gawk
,
working with him on this project was a significant pleasure.
The intrepid members of the GNITS mailing list, and most notably Ulrich Drepper, provided invaluable help and feedback for the design of the internationalization features.
Nelson Beebe,
Martin Brown,
Scott Deifik,
Darrel Hankerson,
Michal Jaegermann,
Jürgen Kahrs,
Pat Rankin,
Kai Uwe Rommel,
and Eli Zaretskii
(in alphabetical order)
are long-time members of the
gawk
"crack portability team." Without their hard work and
help, gawk
would not be nearly the fine program it is today. It
has been and continues to be a pleasure working with this team of fine
people.
David and I would like to thank Brian Kernighan of Bell Laboratories for
invaluable assistance during the testing and debugging of gawk
, and for
help in clarifying numerous points about the language. We could not have
done nearly as good a job on either gawk
or its documentation without
his help.
Chuck Toporek, Mary Sheehan, and Claire Coutier of O'Reilly & Associates contributed
significant editorial help for this Web page for the
3.1 release of gawk
.
I must thank my wonderful wife, Miriam, for her patience through
the many versions of this project, for her proof-reading,
and for sharing me with the computer.
I would like to thank my parents for their love, and for the grace with
which they raised and educated me.
Finally, I also must acknowledge my gratitude to G-d, for the many opportunities
He has sent my way, as well as for the gifts He has given me with which to
take advantage of those opportunities.
Arnold Robbins
Nof Ayalon
ISRAEL
March, 2001
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |