GNAT allows wide wide character codes to appear in character and string literals, and also optionally in identifiers, by means of the following possible encoding schemes:
A wide character is represented using UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO 10646-1/Am.2. Depending on the character value, the representation of character codes with values greater than 16#FFFF# is a is a four, five, or six byte sequence:
16#01_0000#-16#10_FFFF#: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 16#0020_0000#-16#03FF_FFFF#: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 16#0400_0000#-16#7FFF_FFFF#: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
where the xxx
bits correspond to the left-padded bits of the
32-bit character value.
In this encoding, a wide wide character is represented by the following ten or twelve byte character sequence:
[ " a b c d e f " ] [ " a b c d e f g h " ]
where a-h
are the six or eight hexadecimal
characters (using uppercase letters) of the wide wide character code. For
example, [“1F4567”] is used to represent the wide wide character with code
16#001F_4567#
.
This scheme is compatible with use of the full Wide_Wide_Character set, and is also the method used for wide wide character encoding in some standard ACATS (Ada Conformity Assessment Test Suite) test suite distributions.