Next: Wide_Wide_Text_IO, Previous: Text_IO, Up: The Implementation of Standard I/O
Wide_Text_IO
is similar in most respects to Text_IO, except that
both input and output files may contain special sequences that represent
wide character values. The encoding scheme for a given file may be
specified using a FORM parameter:
WCEM=x
as part of the FORM string (WCEM = wide character encoding method), where x is one of the following characters
The encoding methods match those that can be used in a source program, but there is no requirement that the encoding method used for the source program be the same as the encoding method used for files, and different files may use different encoding methods.
The default encoding method for the standard files, and for opened files for which no WCEM parameter is given in the FORM string matches the wide character encoding specified for the main program (the default being brackets encoding if no coding method was specified with -gnatW).
ESC a b c d
where a, b, c, d are the four hexadecimal
characters (using upper case letters) of the wide character code. For
example, ESC A345 is used to represent the wide character with code
16#A345#. This scheme is compatible with use of the full
Wide_Character
set.
16#0000#-16#007f#: 2#0xxxxxxx# 16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx# 16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
where the xxx bits correspond to the left-padded bits of the
16-bit character value. Note that all lower half ASCII characters
are represented as ASCII bytes and all upper half characters and
other wide characters are represented as sequences of upper-half
(The full UTF-8 scheme allows for encoding 31-bit characters as
6-byte sequences, but in this implementation, all UTF-8 sequences
of four or more bytes length will raise a Constraint_Error, as
will all invalid UTF-8 sequences.)
[ " a b c d " ]
where a
, b
, c
, d
are the four hexadecimal
characters (using uppercase letters) of the wide character code. For
example, ["A345"]
is used to represent the wide character with code
16#A345#
.
This scheme is compatible with use of the full Wide_Character set.
On input, brackets coding can also be used for upper half characters,
e.g. ["C1"]
for lower case a. However, on output, brackets notation
is only used for wide characters with a code greater than 16#FF#
.
For the coding schemes other than Hex and Brackets encoding, not all wide character values can be represented. An attempt to output a character that cannot be represented using the encoding scheme for the file causes Constraint_Error to be raised. An invalid wide character sequence on input also causes Constraint_Error to be raised.