Text - What is the Unicode BMP?

BMP stands for Basic Multilingual Plane.

It represents the first 65,536 code points (characters) in Unicode.
From: U+0000 To: U+FFFF
All code points in the BMP can be encoded in 16-bits.

There are 17 planes in the Unicode standard for a total of 1,114,112 possible code points (17 * 65,536).


All code points in the BMP can be encoded using UTF-16 and UTF-32 without translation.
E.g.
The code point U+FFFF in UTF-16 is encoded to: 0xFFFF
The code point U+FFFF in UTF-32 is encoded to: 0x0000FFFF


The code point that is +1 over the BMP requires 2 code points. It can't be encoded in 16-bit.
The code point U+10000 in UTF-16 has the value: 0xD800 0xDC00
The code point U+10000 in UTF-32 has the value: 0x00010000

Note that it is only in UTF-32 that the encoded value is identical to the code point number for all code points. This is by design.



Ads by Google


Ask a question, send a comment, or report a problem - click here to contact me.

© Richard McGrath