The systime
function built in to gawk
returns the current time of day as
a timestamp in "seconds since the Epoch." This timestamp
can be converted into a printable date of almost infinitely variable
format using the built-in strftime
function.
(For more information on systime
and strftime
,
see section Functions for Dealing with Time Stamps.)
An interesting but difficult problem is to convert a readable representation
of a date back into a timestamp. The ANSI C library provides a mktime
function that does the basic job, converting a canonical representation of a
date into a timestamp.
It would appear at first glance that gawk
would have to supply a
mktime
built-in function that was simply a "hook" to the C language
version. In fact though, mktime
can be implemented entirely in
awk
.
Here is a version of mktime
for awk
. It takes a simple
representation of the date and time, and converts it into a timestamp.
The code is presented here intermixed with explanatory prose. In section Extracting Programs from Texinfo Source Files, you will see how the Texinfo source file for this book can be processed to extract the code into a single source file.
The program begins with a descriptive comment and a BEGIN
rule
that initializes a table _tm_months
. This table is a two-dimensional
array that has the lengths of the months. The first index is zero for
regular years, and one for leap years. The values are the same for all the
months in both kinds of years, except for February; thus the use of multiple
assignment.
# mktime.awk --- convert a canonical date representation # into a timestamp # Arnold Robbins, [email protected], Public Domain # May 1993 BEGIN \ { # Initialize table of month lengths _tm_months[0,1] = _tm_months[1,1] = 31 _tm_months[0,2] = 28; _tm_months[1,2] = 29 _tm_months[0,3] = _tm_months[1,3] = 31 _tm_months[0,4] = _tm_months[1,4] = 30 _tm_months[0,5] = _tm_months[1,5] = 31 _tm_months[0,6] = _tm_months[1,6] = 30 _tm_months[0,7] = _tm_months[1,7] = 31 _tm_months[0,8] = _tm_months[1,8] = 31 _tm_months[0,9] = _tm_months[1,9] = 30 _tm_months[0,10] = _tm_months[1,10] = 31 _tm_months[0,11] = _tm_months[1,11] = 30 _tm_months[0,12] = _tm_months[1,12] = 31 }
The benefit of merging multiple BEGIN
rules
(see section The BEGIN
and END
Special Patterns)
is particularly clear when writing library files. Functions in library
files can cleanly initialize their own private data and also provide clean-up
actions in private END
rules.
The next function is a simple one that computes whether a given year is or is not a leap year. If a year is evenly divisible by four, but not evenly divisible by 100, or if it is evenly divisible by 400, then it is a leap year. Thus, 1904 was a leap year, 1900 was not, but 2000 will be.
# decide if a year is a leap year function _tm_isleap(year, ret) { ret = (year % 4 == 0 && year % 100 != 0) || (year % 400 == 0) return ret }
This function is only used a few times in this file, and its computation could have been written in-line (at the point where it's used). Making it a separate function made the original development easier, and also avoids the possibility of typing errors when duplicating the code in multiple places.
The next function is more interesting. It does most of the work of
generating a timestamp, which is converting a date and time into some number
of seconds since the Epoch. The caller passes an array (rather
imaginatively named a
) containing six
values: the year including century, the month as a number between one and 12,
the day of the month, the hour as a number between zero and 23, the minute in
the hour, and the seconds within the minute.
The function uses several local variables to precompute the number of seconds in an hour, seconds in a day, and seconds in a year. Often, similar C code simply writes out the expression in-line, expecting the compiler to do constant folding. E.g., most C compilers would turn `60 * 60' into `3600' at compile time, instead of recomputing it every time at run time. Precomputing these values makes the function more efficient.
# convert a date into seconds function _tm_addup(a, total, yearsecs, daysecs, hoursecs, i, j) { hoursecs = 60 * 60 daysecs = 24 * hoursecs yearsecs = 365 * daysecs total = (a[1] - 1970) * yearsecs # extra day for leap years for (i = 1970; i < a[1]; i++) if (_tm_isleap(i)) total += daysecs j = _tm_isleap(a[1]) for (i = 1; i < a[2]; i++) total += _tm_months[j, i] * daysecs total += (a[3] - 1) * daysecs total += a[4] * hoursecs total += a[5] * 60 total += a[6] return total }
The function starts with a first approximation of all the seconds between Midnight, January 1, 1970,(21) and the beginning of the current year. It then goes through all those years, and for every leap year, adds an additional day's worth of seconds.
The variable j
holds either one or zero, if the current year is or is not
a leap year.
For every month in the current year prior to the current month, it adds
the number of seconds in the month, using the appropriate entry in the
_tm_months
array.
Finally, it adds in the seconds for the number of days prior to the current day, and the number of hours, minutes, and seconds in the current day.
The result is a count of seconds since January 1, 1970. This value is not yet what is needed though. The reason why is described shortly.
The main mktime
function takes a single character string argument.
This string is a representation of a date and time in a "canonical"
(fixed) form. This string should be
"year month day hour minute second"
.
# mktime --- convert a date into seconds, # compensate for time zone function mktime(str, res1, res2, a, b, i, j, t, diff) { i = split(str, a, " ") # don't rely on FS if (i != 6) return -1 # force numeric for (j in a) a[j] += 0 # validate if (a[1] < 1970 || a[2] < 1 || a[2] > 12 || a[3] < 1 || a[3] > 31 || a[4] < 0 || a[4] > 23 || a[5] < 0 || a[5] > 59 || a[6] < 0 || a[6] > 60 ) return -1 res1 = _tm_addup(a) t = strftime("%Y %m %d %H %M %S", res1) if (_tm_debug) printf("(%s) -> (%s)\n", str, t) > "/dev/stderr" split(t, b, " ") res2 = _tm_addup(b) diff = res1 - res2 if (_tm_debug) printf("diff = %d seconds\n", diff) > "/dev/stderr" res1 += diff return res1 }
The function first splits the string into an array, using spaces and tabs as separators. If there are not six elements in the array, it returns an error, signaled as the value -1. Next, it forces each element of the array to be numeric, by adding zero to it. The following `if' statement then makes sure that each element is within an allowable range. (This checking could be extended further, e.g., to make sure that the day of the month is within the correct range for the particular month supplied.) All of this is essentially preliminary set-up and error checking.
Recall that _tm_addup
generated a value in seconds since Midnight,
January 1, 1970. This value is not directly usable as the result we want,
since the calculation does not account for the local timezone. In other
words, the value represents the count in seconds since the Epoch, but only
for UTC (Universal Coordinated Time). If the local timezone is east or west
of UTC, then some number of hours should be either added to, or subtracted from
the resulting timestamp.
For example, 6:23 p.m. in Atlanta, Georgia (USA), is normally five hours west
of (behind) UTC. It is only four hours behind UTC if daylight savings
time is in effect.
If you are calling mktime
in Atlanta, with the argument
"1993 5 23 18 23 12"
, the result from _tm_addup
will be
for 6:23 p.m. UTC, which is only 2:23 p.m. in Atlanta. It is necessary to
add another four hours worth of seconds to the result.
How can mktime
determine how far away it is from UTC? This is
surprisingly easy. The returned timestamp represents the time passed to
mktime
as UTC. This timestamp can be fed back to
strftime
, which will format it as a local time; i.e. as
if it already had the UTC difference added in to it. This is done by
giving "%Y %m %d %H %M %S"
to strftime
as the format
argument. It returns the computed timestamp in the original string
format. The result represents a time that accounts for the UTC
difference. When the new time is converted back to a timestamp, the
difference between the two timestamps is the difference (in seconds)
between the local timezone and UTC. This difference is then added back
to the original result. An example demonstrating this is presented below.
Finally, there is a "main" program for testing the function.
BEGIN { if (_tm_test) { printf "Enter date as yyyy mm dd hh mm ss: " getline _tm_test_date t = mktime(_tm_test_date) r = strftime("%Y %m %d %H %M %S", t) printf "Got back (%s)\n", r } }
The entire program uses two variables that can be set on the command
line to control debugging output and to enable the test in the final
BEGIN
rule. Here is the result of a test run. (Note that debugging
output is to standard error, and test output is to standard output.)
$ gawk -f mktime.awk -v _tm_test=1 -v _tm_debug=1 -| Enter date as yyyy mm dd hh mm ss: 1993 5 23 15 35 10 error--> (1993 5 23 15 35 10) -> (1993 05 23 11 35 10) error--> diff = 14400 seconds -| Got back (1993 05 23 15 35 10)
The time entered was 3:35 p.m. (15:35 on a 24-hour clock), on May 23, 1993. The first line of debugging output shows the resulting time as UTC--four hours ahead of the local time zone. The second line shows that the difference is 14400 seconds, which is four hours. (The difference is only four hours, since daylight savings time is in effect during May.) The final line of test output shows that the timezone compensation algorithm works; the returned time is the same as the entered time.
This program does not solve the general problem of turning an arbitrary date
representation into a timestamp. That problem is very involved. However,
the mktime
function provides a foundation upon which to build. Other
software can convert month names into numeric months, and AM/PM times into
24-hour clocks, to generate the "canonical" format that mktime
requires.
Go to the first, previous, next, last section, table of contents.