In detail
UTF-16 is a variable-width 2-byte or 4-byte character encoding. Code points (characters) are encoded in either 2-bytes or 4-bytes depending upon the code point number.0x0
and 0xFFFF
(i.e. 0 to 65,536) the code point can be encoded in a single code unit (16-bits). 0x10000
and 0x10FFFF
the code point requires 2 code units (a 16-bit high word, and a 16-bit low word). 0xD800 - 0xDBFF
. 0xDC00 - 0xDFFF
. An example
The highest code point of Plane 0, (0xFFFF)
can be encoded with a single code point.
The lowest code point of Plane 1, (0x10000)
, requires 2 code units. A high surrogate and an low surrogate.