The GNU gettext
toolset helps programmers and translators
at producing, updating and using translation files, mainly those
PO files which are textual, editable files. This chapter insists
on the format of PO files, and contains a PO mode starter. PO mode
description is spread over this manual instead of being concentrated
in one place, this chapter presents only the basics of PO mode.
gettext
Installation
Once you have received, unpacked, configured and compiled the GNU
gettext
distribution, the `make install' command puts in
place the programs xgettext
, msgfmt
, gettext
, and
tupdate
, as well as their available message catalogs. For
completing a comfortable installation, you might also want to make the
PO mode available to your GNU Emacs users.
To finish the installation of the PO mode, you might want modify your file `.emacs', once and for all, so it contains a few lines looking like:
(setq auto-mode-alist (cons '("\\.pox?\\'" . po-mode) auto-mode-alist)) (autoload 'po-mode "po-mode")
Later, whenever you edit some `.po' or `.pox' file, Emacs loads `po-mode.elc' (or `po-mode.el') as needed, and automatically activate PO mode commands for the associated buffer. The string PO appears in the mode line for any buffer for which PO mode is active. Many PO files may be active at once in a single Emacs session.
A PO file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language. One PO file entry has the following schematic structure:
white-space # translator-comments #. automatic-comments #: reference... msgid untranslated-string msgstr translated-string
The general structure of a PO file should be well understood by the translator. When using PO mode, very little has to be known about the format details, as PO mode takes care of them for her.
Entries begin with some optional white space. Usually, when generated
through GNU gettext
tools, there is exactly one blank line
between entries. Then comments follow, on lines all starting with the
character #. There are two kinds of comments: those which have
some white space immediately following the #, which comments are
created and maintained exclusively by the translator, and those which
have some non-white character just after the #, which comments
are created and maintained automatically by GNU gettext
tools.
All comments, of any kind, are optional.
After white space and comments, entries show two strings, giving
first the untranslated string as it appears in the original program
sources, and then, the translation of this string. The original
string is introduced by the keyword msgid
, and the translation,
by msgstr
. The two strings, untranslated and translated,
are quoted in various ways in the PO file, using "
delimiters and \ escapes, but the translator does not really
have to pay attention to the precise quoting format, as PO mode fully
intend to take care of quoting for her.
The msgid
strings, as well as automatic comments, are produced
and managed by other GNU gettext
tools, and PO mode does not
provide means for the translator to alter these. The most she can
do is merely deleting them, and only by deleting the whole entry.
On the other hand, the msgstr
string, as well as translator
comments, are really meant for the translator, and PO mode gives her
the full control she needs.
It happens that some lines, usually whitespace or comments, follow the very last entry of a PO file. Such lines are not part of any entry, and PO mode is unable to take action on those lines. By using the PO mode function M-x po-normalize, the translator may get rid of those spurious lines. See section Normalizing Strings in Entries.
The remainder of this section may be safely skipped for those using PO mode, yet it may be interesting for everybody to have a better idea of the precise format of a PO file. On the other hand, those not having GNU Emacs handy should carefully continue reading on.
Each of untranslated-string and translated-string respects the C syntax for a character string, including the surrounding quotes and imbedded backslashed escape sequences. When the time comes to write multi-line strings, one should not use escaped newlines. Instead, a closing quote should follow the last character on the line to be continued, and an opening quote should resume the string at the beginning of the following PO file line. For example:
msgid "" "Here is an example of how one might continue a very long string\n" "for the common case the string represents multi-line output.\n"
In this example, the empty string is used on the first line, for
allowing the better alignment of the H from the word `Here'
over the f from the word `for'. In this example, the
msgid
keyword is followed by three strings, which are meant
to be concatenated. Concatenating the empty string does not change
the resulting overall string, but it is a way for us to comply with
the necessity of msgid
to be followed by a string on the same
line, while keeping the multi-line presentation left-justified, as
we find this to be cleaner disposition. The empty string could have
been omitted, but only if the string starting with `Here' was
promoted on the first line, right after msgid
.(1) It was not really necessary
either to switch between the two last quoted strings immediately after
the newline `\n', the switch could have occurred after any
other character, we just did it this way because it is neater.
One should carefully distinguish between end of lines marked as `\n' inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string.
Outside strings, white lines and comments may be used freely.
Comments start at the beginning of a line with `#' and extend
until the end of the PO file line. Comments written by translators
should have the initial `#' immediately followed by some white
space. If the `#' is not immediately followed by white space,
this comment is most likely generated and managed by specialized GNU
tools, and might disappear or be replaced unexpectandly when the PO
file is given to tupdate
.
When Emacs finds a PO file in a window, PO mode is activated for that window. This puts the window read-only and establishes a po-mode-map, which is a genuine Emacs mode, in that way that it is not derived from text mode in any way.
The main PO commands are those who do not fit in the other categories in subsequent sections, they allow for quitting PO mode or managing windows in special ways.
The command u (po-undo
) interfaces to the GNU Emacs
undo facility. See section `Undoing Changes' in The Emacs Editor. Each time u is typed, modifications the translator
did to the PO file are undone a little more. For the purpose of
undoing, each PO mode command is atomic. This is especially true for
the RET command: the whole edition made by using a single
use of this command is undone at once, even if the edition itself
implied several actions. However, while in the editing window, one
can undo the edition work quite parsimoniously.
The command q (po-quit
) is used when the translator is
done with the PO file. If the file has been modified, it is saved
on disk first. However, prior to all this, the command checks if
some untranslated message remains in the PO file and, if yes, the
translator is asked if she really wants to leave working with this
PO file. This is the preferred way of getting rid of an Emacs PO
file buffer. Merely killing it through the usual command C-x
k (kill-buffer
), say, has the unnice effect of leaving a PO
internal work buffer behind.
The command o (po-other-window
) is another, softer
way, to leave PO mode, temporarily. It just moves the cursor in
some other Emacs window, and pops one if necessary. For example, if
the translator just got PO mode to show some source context in some
other, she might discover some apparent bug in the program source
that needs correction. This command allows the translator to change
sex, become a programmer, and have the cursor right into the window
containing the program she (or rather he) wants to modify.
By later getting the cursor back in the PO file window, or by
asking Emacs to edit this file once again, PO mode is then recovered.
The command h (po-help
) displays a summary of all
available PO mode commands. The translator should then type any
character to resume normal PO mode operations. The command ?
has the same effect as h.
The command = (po-statistics
) computes the total number
of entries in the PO file, the ordinal of the current entry
(counted from 1), the number of untranslated entries, the number of
obsolete entries, and displays all these numbers.
The command v (po-validate
) launches msgfmt
in
verbose mode over the current PO file. This command first offers
to save the current PO file on disk. The msgfmt
tool, from
GNU gettext
, has the purpose of creating an MO file out of a
PO file, and PO mode uses the features of this program for checking
the overall format of a PO file, as well as all individual entries.
The program msgfmt
runs asynchronously with Emacs, so
the translator regains control immediately while her PO file
is being studied. Error output is collected in the GNU Emacs
`*compilation*' buffer, displayed in another window. The regular
GNU Emacs command C-x` (next-error
), as well as other
usual compile commands, allow the translator to reposition quickly to
the offending parts of the PO file. Once the cursor on the line in
error, the translator may decide for any PO mode action which would
help correcting the error.
The cursor in a PO file window is almost always part of an entry. The only exceptions are the special case when the cursor is after the last entry in the file, or when the PO file is empty. The entry where the cursor is found to be is said to be the current entry. Many PO mode commands operate on the current entry, so moving the cursor does more than allowing the translator to browse the PO file, this also selects on which entry commands operate.
Some PO mode commands alter the position of the cursor in a specialized way. A few of those special purpose positioning are described here, the others are described in following sections.
Any GNU Emacs command able to reposition the cursor may be used
to select the current entry in PO mode, including commands which
move by characters, lines, paragraphs, screens or pages, and search
commands. However, there is a kind of standard way to display the
current entry in PO mode, which usual GNU Emacs commands moving
the cursor do not especially try to enforce. The command .
(po-current-entry
) has the sole purpose of redisplaying the
current entry properly, after the current entry has been changed by
means external to PO mode, or the Emacs screen otherwise altered.
It is yet to decide if PO mode would help the translator, or otherwise irritate her, by forcing a more fixed window disposition while she is doing her work. We originally had quite precise ideas about how windows should behave, but on the other hand, anyone used to GNU Emacs is often happy to keep full control. Maybe a fixed window disposition might be offered as a PO mode option that the translator might activate or deactivate at will, so it could be offered on an experimental basis. If nobody feels a real need for using it, or a compulsion for writing it, we might as well drop this whole idea. The incentive for doing it should come from translators rather than programmers, as opinions from an experienced translator are surely more worth to me than opinions from programmers thinking about how others should do translation.
The commands n (po-next-entry
) and p
(po-previous-entry
) move the cursor the entry following,
or preceding, the current one. If n is given while the
cursor is on the last entry of the PO file, or if p
is given while the cursor is on the first entry, no move is done.
SPC and DEL are alternate keys for n and
p, respectively.
The commands < (po-first-entry
) and >
(po-last-entry
) move the cursor to the first entry, or last
entry, of the PO file. When the cursor is located past the last
entry in a PO file, most PO mode commands will return an error saying
`After last entry'. However, the commands < and >
have the special property of being able to work even when the cursor
is not into some PO file entry, and you may use them for nicely
correcting this situation. But even these commands will fail on a
truly empty PO file. There are development plans for PO mode for it
to interactively fill an empty PO file from sources. See section Marking Translatable Strings.
The translator may decide, before working at the translation of a particular entry, that she needs browsing the remainder of the PO file, maybe for finding the terminology or phraseology used in related entries. She can of course use the standard Emacs idioms for saving the current cursor location in some register, and use that register for getting back, or else, to use the location ring.
PO mode offers another approach, by which cursor locations may be saved
onto a special stack. The command m (po-push-location
)
merely adds the location of current entry to the stack, pushing
the already saved locations under the new one. The command
l (po-pop-location
) consumes the top stack element and
reposition the cursor to the entry associated with that top element.
This position is then lost, for the next l will move the cursor
to the previously saved location, and so on until locations remain
on the stack.
If the translator wants the position to be kept on the location stack, maybe for taking a mere look at the entry associated with the top element, then go elsewhere with the intent of getting back later, she ought to use m immediately after l.
The command x (po-exchange-location
) simultaneously
reposition the cursor to the entry associated with the top element of
the stack of saved locations, and replace that top element with the
location of the current entry before the move. Consequently, repeating
the x command toggles alternatively between two entries.
For achieving this, the translator will position the cursor on the
first entry, use m, then position to the second entry, and
merely use x for making the switch.
There are many different ways for encoding a particular string into a
PO file entry, because there are so many different ways to split and
quote multi-line strings, and even, to represent special characters
by backslahsed escaped sequences. Some features of PO mode rely on
the ability for PO mode to scan an already existing PO file for a
particular string encoded into the msgid
field of some entry.
Even if PO mode has internally all the built-in machinery for
implementing this recognition easily, doing it fast is technically
difficult. For facilitating a solution to this efficiency problem,
we decided for a canonical representation for strings.
A conventional representation of strings in a PO file is currently
under discussion, and PO mode experiments a canonical representation.
Having both xgettext
and PO mode converging towards a uniform
way of representing equivalent strings would be useful, as the internal
normalization needed by PO mode could be automatically satisfied
when using xgettext
from GNU gettext
. An explicit
PO mode normalization should then be only necessary for PO files
imported from elsewhere, or for when the convention itself evolves.
So, for achieving normalization of at least the strings of a given PO file needing a canonical representation, the following PO mode command is available:
The special command M-x po-normalize, which has no associate
keys, revises all entries, ensuring that strings of both original
and translated entries use uniform internal quoting in the PO file.
It also removes any crumb after the last entry. This command may be
useful for PO files freshly imported from elsewhere, or if we ever
improve on the canonical quoting format we use. This canonical format
is not only meant for getting cleaner PO files, but also for greatly
speeding up msgid
string lookup for some other PO mode commands.
M-x po-normalize presently makes three passes over the entries.
The first implements heuristics for converting PO files for GNU
gettext
0.6 and earlier, in which msgid
and msgstr
fields were using K&R style C string syntax for multi-line strings.
These heuristics may fail for comments not related to obsolete
entries and ending with a backslash; they also depend on subsequent
passes for finalizing the proper commenting of continued lines for
obsolete entries. This first pass might disappear once all oldish PO
files would have been adjusted. The second and third pass normalize
all msgid
and msgstr
strings respectively. They also
clean out those trailing backslashes used by XView's msgfmt
for continued lines.
Having such an explicit normalizing command allows for importing PO
files from other sources, but also eases the evolution of the current
convention, evolution driven mostly by aesthetic concerns, as of now.
It is all easy to make suggested adjustments at a later time, as the
normalizing command and eventually, other GNU gettext
tools
should greatly automate conformance. A description of the canonical
string format is given below, for the particular benefit of those not
having GNU Emacs handy, and who would nevertheless want to handcraft
their PO files in nice ways.
Right now, in PO mode, strings are single line or multi-line. A string goes multi-line if and only if it has embedded newlines, that is, if it matches `[^\n]\n+[^\n]'. So, we would have:
msgstr "\n\nHello, world!\n\n\n"
but, replacing the space by a newline, this becomes:
msgstr "" "\n" "\n" "Hello,\n" "world!\n" "\n" "\n"
We are deliberately using a caricatural example, here, to make the point clearer. Usually, multi-lines are not that bad looking. It is probable that we will implement the following suggestion. We might lump together all initial newlines into the empty string, and also all newlines introducing empty lines (that is, for n > 1, the n-1'th last newlines would go together on a separate string), so making the previous example appear:
msgstr "\n\n" "Hello,\n" "world!\n" "\n\n"
There are a few yet undecided little points about string normalization, to be documented in this manual, once these questions settle.
Go to the first, previous, next, last section, table of contents.