Text - Terminology

Character Set

A collection of characters.
Letters, numbers, punctuation marks, arithmetic operators, symbols, etc.

USC

Universal Character Set.
The complete set of 100,000+ characters contained in the UNICODE standard.

Code Point

The integer number that represents a character.

Character Encoding

A mapping between a code point (a number) and a sequence of bytes representing the character.

ASCII

A (fixed-width) 7-bit character encoding. See Zuga.net - the ASCII table

UTF-8

8-bit Unicode Transformation Format.
A variable-width character encoding for the UNICODE character set.

  • Backwards compatible with ASCII.
  • ASCII characters are encoded in 1-byte.
  • Non-ASCII characters are encoded in 2,3 or 4 bytes.

UTF-16

16-bit Unicode Transformation Format.
A variable-width character encoding for the UNICODE character set.

  • Characters are encoded in 2 bytes or 4 bytes.

UTF-32

32-bit Unicode Transformation Format.
A fixed-width character encoding for the UNICODE character set.

  • All characters are encoded in 4-bytes.

Code Page

A common term for Character Encoding used by Windows.

Unicode

An international standard for representing all of the world's text.

Ads by Google

Ask a question, send a comment, or report a problem - click here to contact me.

© Richard McGrath