Wide_Wide_Character Encodings (GNAT User’s Guide for Native Platforms)

Previous: Wide_Character Encodings, Up: Foreign Language Representation [Contents][Index]

3.2.4 Wide_Wide_Character Encodings ¶

GNAT allows wide wide character codes to appear in character and string literals, and also optionally in identifiers, by means of the following possible encoding schemes:

`UTF-8 Coding'

A wide character is represented using UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO 10646-1/Am.2. Depending on the character value, the representation of character codes with values greater than 16#FFFF# is a is a four, five, or six byte sequence:

16#01_0000#-16#10_FFFF#:     11110xxx 10xxxxxx 10xxxxxx
                             10xxxxxx
16#0020_0000#-16#03FF_FFFF#: 111110xx 10xxxxxx 10xxxxxx
                             10xxxxxx 10xxxxxx
16#0400_0000#-16#7FFF_FFFF#: 1111110x 10xxxxxx 10xxxxxx
                             10xxxxxx 10xxxxxx 10xxxxxx

where the xxx bits correspond to the left-padded bits of the 32-bit character value.

`Brackets Coding'

In this encoding, a wide wide character is represented by the following ten or twelve byte character sequence:

[ " a b c d e f " ]
[ " a b c d e f g h " ]

where a-h are the six or eight hexadecimal characters (using uppercase letters) of the wide wide character code. For example, [“1F4567”] is used to represent the wide wide character with code 16#001F_4567#.

This scheme is compatible with use of the full Wide_Wide_Character set, and is also the method used for wide wide character encoding in some standard ACATS (Ada Conformity Assessment Test Suite) test suite distributions.