A code unit is a sequence of bits, of a specified minimum size, that is output by a character encoding. It can be thought of as a character encoding word.
Each code point, via a character encoding, is encoded to 1 or more code units.
An example:
Encoding the musical symbol G clef 𝄞 (code point U+1D11E
) using the 3 UTF character encodings, we have:
Character Encoding
Code Unit size
Encoded value
Description
UTF-8
8-bit
0xF0 0x9D 0x84 0x9E
4 bytes. A sequence of 4 code units each 8-bits in length
UTF-16
16-bit
0xD834 0xDD1E
4 bytes. A sequence of 2 code units each 16-bits in length
UTF-32
32-bit
0x0001D11E
4 bytes. A sequence of 1 code units each 32-bits in length