Go to the first, previous, next, last section, table of contents.


Copyright (C) 1995 Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation.

Introduction

This manual is still in DRAFT state. Some sections are still empty, or almost. We keep merging material from other sources (essentially email folders) while the proper integration of this material is delayed.

In this manual, we use he when speaking of the programmer or maintainer, she when speaking of the translator, and they when speaking of the installers or end users of the translated program. This is only a convenience for clarifying the documentation. It is absolutely not meant to imply that some roles are more appropriate to males or females. Besides, as you might guess, GNU gettext is meant to be useful for people using computers, whatever their sex, race, religion or nationality!

This chapter explains what are the goals seeked by the mere existence of GNU gettext. Then, it explains a few wide concepts around Native Language Support, and situates message translation in regard to other aspects of national and cultural variance, as applicable to programs. It also surveys what are those files used to convey translations. It explains how the various tools interrelate in the initial generation for these files, and later, how the maintenance cycle usually operate.

The Purpose of GNU gettext

Usually, programs are written and documented in English, and use English at execution time for interacting with users. This is true not only from within GNU, but also in a great deal of commercial and free software. Using a common language is quite handy for communication between developers, maintainers and users from all countries. On the other hand, most people are less comfortable with English than with their own native language, and would rather prefer using their mother tongue for day to day's work, as far as possible. Many would simply love seeing their computer screen showing a lot less of English, and far more of their own spoken language.

However, to some people, this dream might appear so far fetched that they may believe it is not even worth spending time thinking about it, and they have no confidence at all that the dream might ever become true. Many did not loose hope yet, and organized themselves. The GNU Translation Project is a formalization of this hope into a workable structure, which has a good chance to get all of us nearer the achievement of a truly multi-lingual set of programs.

GNU gettext is an important step for the GNU Translation Project, as it is an asset on which we may build many other steps. This package offers to programmers, translators and even users, a well integrated set of tools and documentation. Specifically, the GNU gettext utilities are a set of tools that provides a framework to help other GNU packages produce multi-lingual messages. These tools include a set of conventions about how programs should be written to support message catalogs, a directory and file naming organization for the message catalogs themselves, a runtime library supporting the retrieval of translated messages, and a few stand-alone programs to massage in various ways the sets of translatable strings, or already translated strings. A special GNU Emacs mode also helps interested parties into preparing these sets, or bringing them up to date.

GNU gettext is designed so it minimizes the impact of internationalization on program sources, keeping this impact as small and hardly noticeable as possible. Internationalization has better chances of succeeding if it is very light weighted, or at least, appear to be so, when looking at program sources.

The GNU Translation Project also uses the GNU gettext distribution as a vehicle for documenting its structure and methods, even if this goes beyond the technicalities of the GNU gettext proper. By doing so, translators will find in a single place, as far as possible, all they need to know for properly doing their translating work. Also, this supplementary documentation might also help programmers, and even curious users, at understanding how GNU gettext is related to the remainder of the GNU Translation Project, and consequently, have a glimpse at the big picture.

I18n, L10n, and Such

Two long words appear all the time when we discuss support of native language in programs, and these words have a precise meaning, worth being explained here, once and for all in this document. The words are internationalization and localization. Many people, tired of writing these long words over and over again, took the habit of writing i18n and l10n instead, quoting the first and last letter of each word, and replacing the run of intermediate letters by a number merely telling how many such letters there are. But in this manual, in the sake of clarity, we will patiently write the names in full, each time...

By internationalization, one refers to the operation by which a program, or a set of programs turned into a package, is made aware and able to support multiple languages. This is a generalization process, by which the programs are untied from using only English strings or other English specific habits, and connected to generic ways of doing the same, instead. Program developers may use various techniques to internationalize their programs, some of them have been standardized. GNU gettext offers one of these standards. See section The Programmer's View.

By localization, one means the operation by which, in a set of programs already internationalized, one gives the program all needed information so that it can bend itself to handle its input and output in a fashion which is correct for some native language and cultural habits. This is a particularisation process, by which generic methods already implemented in an internationalized program are used in specific ways. The programming environment puts several functions to the programmers disposal which allow this runtime configuration. The formal description of specific set of cultural habits for some country, together with all associated translations targeted to the same native language, is called the locale for this language or country. Users achieve localization of programs by setting proper values to special environment variables, prior to executing those programs, identifying which locale should be used.

In fact, locale message support is only one component of the cultural data that makes up a particular locale. There are a whole host of routines and functions provided to aid programmers in developing internationalized software and which allows them to access the data stored in a particular locale. When someone presently refers to a particular locale, they are obviously referring to the data stored within that particular locale. Similarly, if a programmer is referring to "accessing the locale routines", they are referring to the complete suite of routines that access all of the locale's information.

One uses the expression Native Language Support, or merely NLS, for speaking of the overall activity or feature encompassing both internationalization and localization, allowing for multi-lingual interactions in a program. In a nutshell, one could say that internationalization is the operation by which further localizations are made possible.

Also, very roughly said, when it comes to multi-lingual messages, internationalization is usually taken care of by programmers, and localization is usually taken care of by translators.

Aspects in Native Language Support

For a totally multi-lingual distribution, there are many things to translate beyond output messages.

As we already stressed, translation is only one aspect of locales. Other internationalization aspects are not currently handled by GNU gettext, but perhaps may be handled in future versions. There are many attributes that are needed to define a country's cultural conventions. These attributes include beside the country's native language, the formatting of the date and time, the representation of numbers, the symbols for currency, etc. These local rules are termed the country's locale. The locale represents the knowledge needed to support the country's native attributes.

There are a few major areas which may vary between countries and hence, define what a locale must describe. The following list helps putting multi-lingual messages into the proper context of other tasks related to locales, and also presents some other areas which GNU gettext might eventually tackle, maybe, one of these days.

Characters and Codesets
The codeset most commonly used through out the USA and most English speaking parts of the world is the ASCII codeset. However, there are many characters needed by various locales that are not found within this codeset. The 8-bit ISO 8859-1 code set has most of the special characters needed to handle the major European languages. However, in many cases, the ISO 8859-1 font is not adequate. Hence each locale will need to specify which codeset they need to use and will need to have the appropriate character handling routines to cope with the codeset.
Currency
The symbols used vary from country to country as does the position used by the symbol. Software needs to be able to transparently display currency figures in the native mode for each locale.
Dates
The format of date varies between locales. For example, Christmas day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia. Other countries might use ISO 8061 dates, etc. Time of the day may be noted as hh:mm, hh.mm, or otherwise. Some locales require time to be specified in 24-hour mode rather than as AM or PM. Further, the nature and yearly extent of the Daylight Saving correction vary widely between countries.
Numbers
Numbers can be represented differently in different locales. For example, the following numbers are all written correctly for their respective locales:
12,345.67       English
12.345,67       French
1,2345.67       Asia
Some programs could go further and use different unit systems, like English units or Metric units, or even take into account variants about how numbers are spelled in full.
Messages
The most obvious area is the language support within a locale. This is where GNU gettext provide an ease for developers and users to easily change the language that the software uses to communicate to the user.

In the near future we see no chance that beside message handling more components of locale will be made available for use in other GNU packages. The reason for this is that most modern system provide a more or less reasonable support for at least some of the missing components. Another point is that the GNU libc and Linux will get a new and complete implementation of the whole locale functionality which could be adopted by system lacking a reasonable locale support.

Files Conveying Translations

The letters PO in `.po' files means Portable Object, to distinguish it from `.mo' files, where MO stands for Machine Object. This paradigm, as well as the PO file format, is inspired by the NLS standard developed by Uniforum, and implemented by Sun in their Solaris system.

PO files are meant to be read and edited by humans, and associate each original, translatable string of a given package with its translation in a particular target language. A single PO file is dedicated to a single target language. If a package supports many languages, there is one such PO file per language supported, and each package has its own set of PO files. These PO files are best created by the xgettext program, and later updated or refreshed through the tupdate program. Program xgettext extracts all marked messages from a set of C files and initializes a PO file with empty translations. Program tupdate takes care of adjusting PO files between releases of the corresponding sources, commenting obsolete entries, initializing new ones, and updating all source line references. Files ending with `.pot' are kind of base translation files found in distributions, in PO file format, and `.pox' files are often temporary PO files.

MO files are meant to be read by programs, and are binary in nature. A few systems already offer tools for creating and handling MO files as part of the Native Language Support coming with the system, but the format of these MO files is often different from system to system, and non-portable. They do not necessary use `.mo' for file extensions, but since system libraries are also used for accessing these files, it works as long as the system is self-consistent about it. If GNU gettext is able to interface with the tools already provided with systems, it will consequently let these provided tools take care of generating the MO files. Or else, if such tools are not found or do not seem usable, GNU gettext will use its own ways and its own format for MO files. Files ending with `.gmo' are really MO files, when it is known that these files use the GNU format.

Overview of GNU gettext

The following diagram summarizes the relation between the files handled by GNU gettext and the tools acting on these files. It is followed by a somewhat detailed explanations, which you should read while keeping an eye on the diagram. Having a clear understanding of these interrelations would surely help programmers, translators and maintainers.

Original C Sources ---> PO mode ---> Marked C Sources ---.
                                                         |
              .---------<--- GNU gettext Library         |
.--- make <---+                                          |
|             `---------<--------------------+-----------'
|                                            |
|   .-----<--- PACKAGE.pot <--- xgettext <---'   .---<--- PO Compendium
|   |                                            |             ^
|   |                                            `---.         |
|   `---.                                            +---> PO mode ---.
|       +----> tupdate -------> LANG.pox --->--------'                |
|   .---'                                                             |
|   |                                                                 |
|   `-------------<---------------.                                   |
|                                 +--- LANG.po <--- New LANG.pox <----'
|   .--- LANG.gmo <--- msgfmt <---'
|   |
|   `---> install ---> /.../LANG/PACKAGE.mo ---.
|                                              +---> "Hello world!"
`-------> install ---> /.../bin/PROGRAM -------'

The indication `PO mode' appears in two places in this picture, and you may safely read it as merely meaning "hand editing", using any editor of your choice, really. However, for those of you being the lucky users of GNU Emacs, PO mode has been specifically created for providing a cosy environment for editing or modifying PO files. While editing a PO file, PO mode allows for the easy browsing of auxiliary and compendium PO files, as well as following references into the set of C program sources from which PO files has been derived. It has a few special features, among which the interactive marking of program strings as translatable, and the validatation of PO files with easy repositioning to PO file lines showing errors.

As a programmer, the first step into bringing GNU gettext into your package is identifying, right in the C sources, which strings are meant to be translatable, and which are untranslatable. This tedious job can be done a little more comfortably using PO mode, but you can use any means being usual to you for modifying your C sources. Some other simple, standard changes are also needed to properly initialize the translation library. See section Preparing Program Sources, for more information about all this.

Once the C sources have been modified, the xgettext program is used to find and extract all translatable strings, and create an initial PO file out of all these. This `package.pot' file contains all original program strings, it has sets of pointers to exactly where in C sources each string is used, and all translations are set to empty. The letter t in `.pot' marks that this is a Template PO file, not yet oriented towards any particular language. See section Invoking the xgettext Program, for more details about how one calls the xgettext program. If you are really lazy, you might be interested at working a lot more right away, and preparing the whole distribution setup (see section The Maintainer's View). By doing so, you spare typing the xgettext command yourself, as make should now generate the proper things automatically for you!

The first time through, there is no `lang.po' yet, so the tupdate step may be skipped and replaced by a mere copy of `package.pot' to `lang.pox', where lang represents the target language.

Then comes the initial translation of messages. Translation in itself is a whole matter, still exclusively meant for humans, and whose complexity far overwhelms the level of this manual. Nevertheless, a few hints are given in some other chapter of this manual (see section The Translator's View). You will also find there indications about how to contact translating teams, or becoming part of them, for sharing your translating concerns with others who target the same native language.

While adding the translated messages into the `lang.pox' PO file, if you do not have GNU Emacs handy, you are on your own for ensuring that your fully respect the PO file format, and quoting conventions (see section The Format of PO Files). This is surely not an impossible task, as this is the way many people handled PO files already for Uniforum or Solaris. On the other hand, using PO mode in GNU Emacs, most details of PO file format are taken care for you, but you have to acquire some familiarity with PO mode itself. Besides main PO mode commands (see section Main Commands), you should know how to move between entries (see section Entry Positioning), and how to handle untranslated entries (see section Untranslated Entries).

If some common translations have already been saved into a compendium PO file, translators may use PO mode for initializing untranslated entries from the compendium, and also save selected translations into the compendium, updating it (see section Using Translation Compendiums). Compendium files are meant to be exchanged between members of a given translation team.

Programs, or packages of programs, are dynamic in nature: users write bug reports and suggestion for improvements, maintainers react by modifying programs in various ways. The fact that a package has already been internationalized should not make maintainers shy of adding new strings, or modifying strings already translated. They just do their job the best they can. For the GNU Translation Project to work smoothly, it is important that maintainers do not carry translation concerns on their already loaded shoulders, and that translators be kept as free as possible of programmatic concerns.

The only concern maintainers should have is carefully marking new strings are translatable, when they should be, and do not otherwise worry about them being translated, as this will come in proper time. Consequently, when programs and their strings are adjusted in various ways by maintainers, and for matters usually unrelated to translation, xgettext would construct `package.pot' files which are evolving over time, so the translations carried by `lang.po' are slowly fading out of date.

It is important for translators (and even maintainers) to understand that package translation is a continuous process in the lifetime of a package, and not something which is done once and for all at the start. After an initial burst of translation activity for a given package, interventions are needed once in a while, because here and there, translated entries become obsolete, and new untranslated entries appear, needing translation.

The tupdate program has the purpose of refreshing an already existing `lang.po' file, by comparing it with a newer `package.pot' template file, extracted by xgettext out of recent C sources. The refreshing operation adjusts all references to C source locations for strings, since these strings move as programs are modified. Also, tupdate comments out as obsolete, in `lang.pox', those already translated entries which are no longer used in the program sources (see section Obsolete Entries. It finally discovers new strings and insert them in the resulting PO file as untranslated entries (see section Untranslated Entries. See section Invoking the tupdate Program, for more information about what tupdate really does.

Whatever route or means taken, the goal is obtaining an updated `lang.pox' file offering translations for all strings. When this is properly achieved, this file `lang.pox' may take the place of the previous official `lang.po' file.

The time mobility, or fluidity of PO files, is an integral part of the translation game, and should be well understood, and accepted. People resisting it will have a hard time participating in the GNU Translation Project, or will give a hard time to other participants! In particular, maintainers should relax and include all available PO files in their distributions, even if these have not recently been updated, without banging or otherwise trying to exert pressure on the translator teams to get the job done. The pressure should rather come from the community of users speaking a particular language, and maintainers should consider themselves fairly relieved of any concern about the adequacy of translation files. On the other hand, translators should reasonably try updating the PO files they are responsible for, while the package is undergoing pretest, prior to an official distribution.

Once the PO file is complete and dependable, the msgfmt program is used for turning the PO file into a machine-oriented format, which may yield efficient retrieval of translations by the programs of the package, whenever needed at runtime (see section The Format of GNU MO Files). See section Invoking the msgfmt Program, for more information about all modalities of execution for the msgfmt program.

Finally, the modified and marked C sources are compiled and linked with the GNU gettext library, usually through the operation of make, given a suitable `Makefile' exists for the project, and the resulting executable is installed somewhere users will find it. The MO files themselves should also be properly installed. Given the appropriate environment variables are set (see section Magic for End Users), the program should localize itself automatically, whenever it executes.

The remaining of this manual has the purpose of deepening the various steps outlined in this section.


Go to the first, previous, next, last section, table of contents.