UTF-8

Wikipedia (chū-iû ê pek-kho-choân-su) beh kā lí kóng...
UTF-8
Standard Unicode Standard
Classification Unicode Transformation Format, extended ASCII, variable-length encoding
Extends ASCII
Transforms / Encodes ISO/IEC 10646 (Unicode)
Preceded by UTF-1

UTF-8 pian-bé sī tsi̍t-tsióng iōng-teh tiān-tsí thong-sìn ê khó-piàn tn̂g-tōo jī-hû pian-bé. Iû Unicode piau-tsún tīng-gī, bîng-tshing guân-tsū Unicode (hi̍k-tsiá Thong-iōng pian-bé jī-hû tsi̍p (Universal Coded Character Set))) tsuán-uānn kik-sik – 8-bit.[1]

UTF-8 ē-tàng sú-iōng 1 kàu 4-ê tan-uī byte (8-bit) tāi-bé tan-guân, tuì Unicode tang-tiong êsóo-ū 1,112,064[lower-alpha 1] ê ū-hāu jī-hû tāi-bé tiám tsìn-hîng pian-bé. Kū-iú khah-kkē sòo-ti̍t ê tāi-bé tiám, óng-óng koh-khah pîn-huân teh tshut-hiān, sú-iōng khah-tsió ê jī-tsiat (byte) tsìn-hîng pian-bé. UTF-8 sī uī-tio̍h kah ASCII hiòng-āu kiam-iông jî-lâi siat-kè ê: Unicode ê tsiân 128-ê jī-hû kap ASCII it-tuì-it teh tuì-ìng, sú-iōng hām ASCII kū-iú sio-siâng ê 2 tsín-tsè ti̍t ê tan-ê jī-tsiat tsìn-hîng pian-bé; in-tshú ū-hāu ê ASCII bûn-pún sī ū-hāu ê UTF-8 pian-bé Unicode ma-s án-ne.

UTF-8 hông siat-kè tsò UTF-1 ê koh-khah hó ê tāi-thè phín, UTF-1 sī tsi̍t-tsióng kiàn-gī ê khó-piàn tngt-oo pian-bé, kū-iú pōo-hūn ASCII kiam-iông sìng; m̄-ku khiàm-khuat tsi̍t-kuá-á kong-lîng, pau-kuat tsū sio-tuè (tông-pōo) hām uân-tsuân ASCII kiam-iông ê jī-hû tshú-lí, pí-jû siâ-suànn. Ken Thompson hām Rob Pike tī 1992-nî 9-gue̍h thâu-pái si̍t-hiān Plan 9 tshau-tsok hē-thóng.[2][3] Tse tō tì-sú X/Open tshái-iōng UTF-8 tsok-uî FSS-UTF ê kui-huān,[4] tī 1993-nî 1-gue̍h thâu-pái teh USENIX[5] tíng-kuân tsìng-sik the̍h-tshut, suî-āu hōo internet kang-tîng jīm-bū tsoo (IETF) teh RFC 2277 lāi-té tshái-iōng (BCP 18)[6], iōng-teh bī-lâi ê internet piau-tsún khang-khuè, í-tshù tshú-tāi kū-pán RFC tang-tiong ê tan-jī jī-tsiat (byte) jī-hû tsi̍p, pí-jû Lating-1.

Kah jīm-hô tāi-thè ê bûn-jī pian-bé sio-pí, UTF-8 tsō-sîng ê kok-tsè-huà būn-tê koh-khah tsió[7][8]; pīng-tshiánn UTF-8 í-king teh sóo-iú ê hiān-tāi tshau-tsok hē-thóng (pau-kuat Microsoft Windows) kap JSON tíng-tíng ê piau-tsún tang-tiong si̍t-hiān, kî-tiong jû-lâi jû-tsē ê tsîng-hóng tō sī, UTF-8 sī uî-it ún-tsún ê Unicode hîng-sik.

Tsia̍t-tsí 2023-nî, UTF-8 sī World Wide Web (hām internet ki-su̍t) ê tsú-iàu pian-bé, tsiàm soou bāng-ia̍h ê 98.0%, tsiân 10,000 ê ia̍h-bīn ê 99.0%, tuì tsiânn-tsē gú-giân lóng kuân-kàu 100%.[9] Tsha-put-to sóo-ū ê kok-ka hām gú-giân teh bāng-lōo tíng-kuân lóng-ū 95% hi̍k-tsiá í-siōng lóng teh sú-iōng UTF-8 pian-bé.

Tsù-sik[siu-kái | kái goân-sí-bé]

  1. 17 planes times 216 code points per plane, minus 211 technically-invalid surrogates.

Tsù-kái[siu-kái | kái goân-sí-bé]

  1. "Chapter 2. General Structure". The Unicode Standard (6.0 pán.). Mountain View, California, US: The Unicode Consortium. ISBN 978-1-936213-01-6. 
  2. Pike, Rob (30 April 2003). "UTF-8 history". 
  3. Pike, Rob; Thompson, Ken (1993). "Hello World or Καλημέρα κόσμε or こんにちは 世界" (PDF). Proceedings of the Winter 1993 USENIX Conference. 
  4. "File System Safe UCS - Transformation Format (FSS-UTF) - X/Open Preliminary Specification" (PDF). unicode.org. 
  5. "USENIX Winter 1993 Conference Proceedings". usenix.org. 
  6. Alvestrand, Harald T. (January 1998). IETF Policy on Character Sets and Languages. IETF. doi:10.17487/RFC2277. BCP 18. RFC 2277. 
  7. "UTF-8 support in the Microsoft Game Development Kit (GDK) - Microsoft Game Development Kit". learn.microsoft.com (ēng Eng-gí). 2023-03-05 khòaⁿ--ê. By operating in UTF-8, you can ensure maximum compatibility [..] Windows operates natively in UTF-16 (or WCHAR), which requires code page conversions by using MultiByteToWideChar and WideCharToMultiByte. This is a unique burden that Windows places on code that targets multiple platforms. [..] The Microsoft Game Development Kit (GDK) and Windows in general are moving forward to support UTF-8 to remove this unique burden of Windows on code targeting or interchanging with multiple platforms and the web. Also, this results in fewer internationalization issues in apps and games and reduces the test matrix that's required to get it right. 
  8. "Encoding Standard". encoding.spec.whatwg.org. 2020-04-15 khòaⁿ--ê. 
  9. "Usage Survey of Character Encodings broken down by Ranking". W3Techs (ēng Eng-gí). 2023-10-01 khòaⁿ--ê. 

Tsham-ua̍t[siu-kái | kái goân-sí-bé]

  • The Unicode Consortium

Guā-pōo liân-kiat[siu-kái | kái goân-sí-bé]