php - Why does everyone use latin1? -
someone said utf8 has variable length encoding 1 3 bytes.
so why still use latin1? if same thing stored in utf8 1 byte, utf8 has advantage can adapt larger character set.
- is hidden reason uses latin1?
- what disadvantages of using utf8 vs. latin1?
iso 8859-1 (at least de facto) default character encoding of multiple standards http (at least textual contents):
when no explicit charset parameter provided sender, media subtypes of "text" type defined have default charset value of "iso-8859-1" when received via http. data in character sets other "iso-8859-1" or subsets must labeled appropriate charset value.
the reason iso 8859-1 chosen it’s superset of us-ascii fundamental character set internet based technologies. , world wide web invented , developed @ cern in geneva, switzerland, might reason choose characters of western european languages 128 remaining characters.
when unicode standard developed, character set of iso 8859-1 used base of unicode character set (the universal character set) first 256 character identical of iso 8859-1. done due importance of iso 8859-1 web standard character encoding many technologies.
now discuss advantages of iso 8859-1 in opposite utf-8, need @ underlying character sets , encoding schemes used encode these characters:
iso 8859-1 contains 256 characters character point of each character directly mapped onto binary representation. 12310 encoded 011110112.
utf-8 uses prefixed variable length encoding scheme prefix indicates word length. utf-8 used encode characters of universal character set , encoding scheme can encode 1,048,576 characters. first 128 characters require 1 byte, characters in 0x80–0x7ff require 2 bytes, characters in 0x800–0xffff require 3 bytes, , characters in 0x10000–0x1fffff require 4 bytes.
so difference if range of codeable characters on 1 hand , length of encoded word on other hand.
so choice of “right” character encoding depends on needs: if need characters of iso 8859-1 (or us-ascii subset of it), use iso 8859-1 requires 1 byte each character in opposite utf-8 characters 128–255 require 2 bytes. , if need more or other characters in iso 8859-1, use utf-8.
Comments
Post a Comment