Unicode / Unicode Transformation Format


All electronic data is represented as sequences of bits, or numbers. Each alphabet or script used in a language is mapped to a unique numeric value, or ‘encoded’ for use on a computer using a standard known as Unicode. Within Unicode, each letter or character has been assigned its own unique value in the Unicode encoding schemes, known as the Unicode Transformation Format (UTF). The UTF utilizes multiple encoding schemes, of which the most commonly used are known as UTF-8 and UTF-16. For example, the English alphabet and the more common punctuation marks have been assigned values between 0 and 255, while Tibetan characters have been assigned the values between 3,840 (written as x0F00) and 4,095 (written as x0FFF). All modern (and many historical) scripts are supported by the Unicode Standard. Unicode provides a unique number for every character, regardless of the platform, program, or language. The Unicode Standard is described in detail at the website http://www.unicode.org. See also, Character Encoding.

Print Friendly, PDF & Email