Text - UCS-N vs. UTF-N

UCS-N and UTF-N are character encodings for Unicode.

A character encoding is an algorithm that converts a code point to a sequence of bytes that can be included in a document.


UCS-N : These are fixed-width character encodings.

UCS-2 stands for Unicode Character Set coded in 2 octets.
UCS-2 is a fixed-width 2-byte character encoding. It was the predecessor to the variable-width encoding UTF-16.

UCS-4 stands for Unicode Character Set coded in 4 octets.
UCS-4 is a fixed-width 4-byte character encoding.


UTF-N : These are variable-width and fixed-width character encocodings.
UTF stands for Unicode Transformation Format.
UTF-8 is a variable-width: 1,2,3 or 4-byte character encoding.
UTF-16 is a variable-width: 2-byte or 4-byte character encoding.
UTF-32 is a fixed-width: 4-byte character encoding.

The term UCS (on it's own, with no suffix), is short for Universal Character Set. It represents all of the characters in the Unicode standard.


Ads by Google


Ask a question, send a comment, or report a problem - click here to contact me.

© Richard McGrath