Text - What is a Windows CodePage?

CodePage is the term used by Microsoft (and others) to describe an 8-bit character encoding. A CodePage is identified by a number.


A character encoding is a mapping from a code point (a number that represents a character) to a byte, or sequence of bytes that encode the character.
In a simple character encoding (most 8-bit encodings), the number that represents the character is identical to the encoded byte.

E.g.

  The ASCII character encoding:
 
  code point  ->  code unit or byte
  65: 'A'     ->  0x41 (Dec. 65)
  66: 'B'     ->  0x42 (Dec. 66)
  ...
  90: 'Z'     ->  0x5A (Dec. 90)
  

Windows uses CodePage 1252 for non-Unicode graphical applications. CP-1252 is based on ASCII and is identical in the first 128 code points.
Windows uses CodePage 437 for non-Unicode console applications. CP-437 is based on ASCII and is identical in the first 128 code points.


The default CodePage will depend on the current system locale. 1252 is the default for North America.


In Windows CodePage 1252 for example, the copyright symbol is encoded with the hexadecimal character 0xA9 (Octal 169).

In CodePage 437, there is no copyright symbol.


Unicode applications do not use CodePages because Unicode can encode all characters in all languages.



Ads by Google


Ask a question, send a comment, or report a problem - click here to contact me.

© Richard McGrath