Internationalizing ARM

The ARM API is designed to enable applications to use native code pages (or code sets) and languages, and for measurement agents to be able to support many different languages. Users of agents should contact the providers to see if the agent supports the needed code pages and languages.

The ARM API supports any code page as long as no characters are encoded with binary zero bytes (octets). This is because most strings are passed as NULL terminated strings, and the NULL terminator character is a binary zero byte. If a binary zero byte is encountered before the end of the string, the agent would interpret the zero byte as the NULL terminator and truncate the string. Most code pages meet this requirement.

There are code pages that contain binary zero bytes, but there are alternate ways to encode the characters. A well-known example is the Unicode standard. In its native format using 16 bit characters (UTC-2), there are binary zero bytes. However, the UTF-8 encoding of the same Unicode characters does not contain binary zero bytes, and this format is entirely compatible with the ARM API.

Agents that support native languages will often use the following technique. When the application links to the agent it links to a part of the agent that executes in the same process space as the application. Typically this small part of the agent communicates with the main part of the agent across an inter-process communications (IPC) channel. The small part of the agent that executes in the same process as the application can issue an operating system call to find out what code page and language the process is using. It can then pass this information to the main part of the agent, and the main part of the agent can convert from the native code page as necessary, or the small part of the agent can make the conversion itself.

On some operating systems there are more than one way to define what language and code page the application is using. In order to avoid any ambiguity, the following conventions apply for the specified operating systems.

There are the following three restrictions on the use of native languages.