php - Why does everyone use latin1? -


someone said utf8 has variable length encoding 1 3 bytes.

so why still use latin1? if same thing stored in utf8 1 byte, utf8 has advantage can adapt larger character set.

  • is hidden reason uses latin1?
  • what disadvantages of using utf8 vs. latin1?

iso 8859-1 (at least de facto) default character encoding of multiple standards http (at least textual contents):

when no explicit charset parameter provided sender, media subtypes of "text" type defined have default charset value of "iso-8859-1" when received via http. data in character sets other "iso-8859-1" or subsets must labeled appropriate charset value.

the reason iso 8859-1 chosen it’s superset of us-ascii fundamental character set internet based technologies. , world wide web invented , developed @ cern in geneva, switzerland, might reason choose characters of western european languages 128 remaining characters.

when unicode standard developed, character set of iso 8859-1 used base of unicode character set (the universal character set) first 256 character identical of iso 8859-1. done due importance of iso 8859-1 web standard character encoding many technologies.

now discuss advantages of iso 8859-1 in opposite utf-8, need @ underlying character sets , encoding schemes used encode these characters:

  • iso 8859-1 contains 256 characters character point of each character directly mapped onto binary representation. 12310 encoded 011110112.

  • utf-8 uses prefixed variable length encoding scheme prefix indicates word length. utf-8 used encode characters of universal character set , encoding scheme can encode 1,048,576 characters. first 128 characters require 1 byte, characters in 0x80–0x7ff require 2 bytes, characters in 0x800–0xffff require 3 bytes, , characters in 0x10000–0x1fffff require 4 bytes.

so difference if range of codeable characters on 1 hand , length of encoded word on other hand.

so choice of “right” character encoding depends on needs: if need characters of iso 8859-1 (or us-ascii subset of it), use iso 8859-1 requires 1 byte each character in opposite utf-8 characters 128–255 require 2 bytes. , if need more or other characters in iso 8859-1, use utf-8.


Comments

Popular posts from this blog

java - SNMP4J General Variable Binding Error -

sql server - python to mssql encoding problem -

windows - Python Service Installation - "Could not find PythonClass entry" -