Next: Wide_Wide_Character Encodings, Previous: Other 8-Bit Codes, Up: Foreign Language Representation
GNAT allows wide character codes to appear in character and string literals, and also optionally in identifiers, by means of the following possible encoding schemes:
ESC a b c d
where a, b, c, d are the four hexadecimal
characters (using uppercase letters) of the wide character code. For
example, ESC A345 is used to represent the wide character with code
16#A345#.
This scheme is compatible with use of the full Wide_Character set.
16#0000#-16#007f#: 2#0`xxxxxxx`# 16#0080#-16#07ff#: 2#110`xxxxx`# 2#10`xxxxxx`# 16#0800#-16#ffff#: 2#1110`xxxx`# 2#10`xxxxxx`# 2#10`xxxxxx`#
where the xxx bits correspond to the left-padded bits of the
16-bit character value. Note that all lower half ASCII characters
are represented as ASCII bytes and all upper half characters and
other wide characters are represented as sequences of upper-half
(The full UTF-8 scheme allows for encoding 31-bit characters as
6-byte sequences, and in the following section on wide wide
characters, the use of these sequences is documented).
[ " a b c d " ]
where a, b, c, d are the four hexadecimal characters (using uppercase letters) of the wide character code. For example, ['A345'] is used to represent the wide character with code 16#A345#. It is also possible (though not required) to use the Brackets coding for upper half characters. For example, the code 16#A3# can be represented as ['A3'].
This scheme is compatible with use of the full Wide_Character set, and is also the method used for wide character encoding in some standard ACATS (Ada Conformity Assessment Test Suite) test suite distributions.
Note: Some of these coding schemes do not permit the full use of the Ada character set. For example, neither Shift JIS nor EUC allow the use of the upper half of the Latin-1 set. |