Previous: Other 8-Bit Codes, Up: Foreign Language Representation
GNAT allows wide character codes to appear in character and string literals, and also optionally in identifiers, by means of the following possible encoding schemes:
ESC a b c d
Where a
, b
, c
, d
are the four hexadecimal
characters (using uppercase letters) of the wide character code. For
example, ESC A345 is used to represent the wide character with code
16#A345#
.
This scheme is compatible with use of the full Wide_Character set.
16#abcd#
where the upper bit is on (in
other words, "a" is in the range 8-F) is represented as two bytes,
16#ab#
and 16#cd#
. The second byte cannot be a format control
character, but is not required to be in the upper half. This method can
be also used for shift-JIS or EUC, where the internal coding matches the
external coding.
16#ab#
and
16#cd#
, with the restrictions described for upper-half encoding as
described above. The internal character code is the corresponding JIS
character according to the standard algorithm for Shift-JIS
conversion. Only characters defined in the JIS code set table can be
used with this encoding method.
16#ab#
and
16#cd#
, with both characters being in the upper half. The internal
character code is the corresponding JIS character according to the EUC
encoding algorithm. Only characters defined in the JIS code set table
can be used with this encoding method.
16#0000#-16#007f#: 2#0xxxxxxx# 16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx# 16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#
where the xxx bits correspond to the left-padded bits of the
16-bit character value. Note that all lower half ASCII characters
are represented as ASCII bytes and all upper half characters and
other wide characters are represented as sequences of upper-half
(The full UTF-8 scheme allows for encoding 31-bit characters as
6-byte sequences, but in this implementation, all UTF-8 sequences
of four or more bytes length will be treated as illegal).
[ " a b c d " ]
Where a
, b
, c
, d
are the four hexadecimal
characters (using uppercase letters) of the wide character code. For
example, ["A345"] is used to represent the wide character with code
16#A345#
. It is also possible (though not required) to use the
Brackets coding for upper half characters. For example, the code
16#A3#
can be represented as ["A3"]
.
This scheme is compatible with use of the full Wide_Character set, and is also the method used for wide character encoding in the standard ACVC (Ada Compiler Validation Capability) test suite distributions.
Note: Some of these coding schemes do not permit the full use of the Ada 95 character set. For example, neither Shift JIS, nor EUC allow the use of the upper half of the Latin-1 set.