Chinese character sets

A Chinese character set (simplified Chinese: 汉字字符集; traditional Chinese: 中文字元集; pinyin: hànzì zìfú jí) is a group of Chinese characters. Since the size of a set is the number of elements in it, an introduction to Chinese character sets will also introduce the Chinese character numbers in them. [1]

There are different Chinese character sets for different purposes. The following is an introduction to some representative character sets in history, in modern languages and in information technology.

Dictionaries and lexicon

edit

Along with the development of writing systems, the number of Chinese characters kept growing, as shown by the character sets of the dictionaries.[2][3] The total number of characters that have been created reaches well into the hundreds of thousands if variants are considered.

Number of characters in monolingual Chinese dictionaries
Year Dict. Char.
300 BC Erya 4300[4]
100 AD Shuowen Jiezi 9516
230 Shenglei 11520
350 Zilin 12824
543 Yupian 16917
601 Qieyun 12150
732 Tangyun 15000
753 Yunhai jingyuan 26911[5]
997 Longkan Shoujian 26430[6]
1011 Guangyun 26194
1039 Jiyun 53525
1066 Leipian 31319
1615 Zihui 33179
1675 Zhengzitong 33440
1716 Kangxi Dictionary 46933
1915 Zhonghua Da Zidian 48200
1968 Zhongwen Da Cidian 49888
1989 Hanyu Da Zidian 54678
1994 Zhonghua Zihai 85568[7]
2017 Dictionary of Chinese Character Variants 106330[8]
Number of characters in bilingual Chinese dictionaries
Year Dictionary Language Char.
2003 ABC Chinese–English Comprehensive Dictionary English 9638[a][9]
2003 Dai Kan-Wa Jiten Japanese 50305[10]
2008 Han-Han Dae Sajeon Korean 53667[11]

Modern standards

edit

Due to the dynamic development of languages, there is no definite number of modern Chinese characters. However a reasonable estimation can be made by a survey of the character sets of relevant standard lists and influential dictionaries in the countries and regions where Chinese characters are used.[12]

Polity Standard Char. Date
  China General List of Simplified Chinese Characters 2235 1964[13]
List of Commonly Used Characters in Modern Chinese 7000 1988[14]
List of Commonly Used Standard Chinese Characters 8105 2013[15]
  Hong Kong List of Graphemes of Commonly-Used Chinese Characters[b] 4762 2012[16]
Reference Glyphs for Chinese Computer Systems in Hong Kong[c] 2016[17]
  Taiwan Chart of Standard Forms of Common National Characters 4808 1982[18]
Chart of Standard Forms of Less-Than-Common National Characters 6341 1983[19]
Chart of Rarely-Used National Characters 18388 2017
  Japan Tōyō kanji 1850 1946[20]
Jōyō kanji 2136 2010[21]
  South Korea Basic Hanja for Educational Use 1800 2000[22]

Mainland China

edit

The important standards in the People's Republic of China include List of Frequently Used Characters in Modern Chinese (现代汉语常用字表, of 3,500 characters),[23] and the List of Commonly Used Characters in Modern Chinese (现代汉语通用字表 with 7,000 characters, including the 3,500 characters in the previous list).[24] But the current standard is the List of Commonly Used Standard Chinese Characters, which was released by the State Council in June 2013 to replace the previous two lists and some other standards. It includes 8,105 characters of the Simplified Chinese writing system, 3,500 as primary, 3,000 as secondary, and 1,605 as tertiary. In addition, there are 2,574 Traditional characters and 1,023 variants.[25]

From 1990 to 1991, the National Leading Group for Teaching Chinese as a Foreign Language and the Chinese Proficiency Test Center of Beijing Language Institute jointly developed the "汉语水平词汇与汉字等级大纲" (Outline of the Graded Vocabulary and Characters for HSK). The Chinese character outline contains 2,905 characters, divided into four grades: 800 Grade A characters, 804 Grade B characters, 601 Grade C characters, and 700 Grade D characters. [26]

The most popular modern Chinese character dictionary and word dictionary are Xinhua Zidian[27] and Xiandai Hanyu Cidian.[28] They each includes over 13,000 characters of Simplified characters, Traditional characters and some variants.

Taiwan

edit

In Taiwan, there are the Chart of Standard Forms of Common National Characters (常用國字標準字體表) with 4,808 characters, and the Chart of Standard Forms of Less-Than-Common National Characters (次常用國字標準字體表), with 6,341 common national characters. Both lists were released by the Ministry of Education, with a total of 11,149 characters of the Traditional Chinese writing system.

Hong Kong

edit

In Hong Kong, there is the List of Graphemes of Commonly-Used Chinese Characters for elementary and junior secondary education, totally 4,762 characters. This list was released by the Education Bureau, and is very influential in the educational circles.

Japan

edit

In Japan, there are the jōyō kanji (frequently-used Chinese characters, designated by the Japanese Ministry of Education, including 2,136 characters) and jinmeiyō kanji (for use in personal names, currently including 983 characters).

Korea

edit

In Korea, there are the Basic Hanja for educational use (漢文敎育用基礎漢字, a subset of 1,800 Hanja defined in 1972 by a South Korea educational standard), and the Table of Hanja for Personal Name Use (人名用追加漢字表), published by the Supreme Court of Korea in March 1991.[29] The list expanded gradually, and to year 2015 there were 8,142 hanja permitted to be used in Korean names.[30]

Overall estimates

edit

With consideration of all the character sets mentioned above, the total number of modern Chinese characters in the world is over 10,000, probably around 15,000.[31][32] Such an estimation should not be counted as too rough, considering that there are totally over 100,000 Chinese characters, as mentioned above.

A college graduate who is literate in written Chinese knows between three and four thousand characters. Specialists in classical literature or history, who would often encounter characters no longer in use, are estimated to have a working vocabulary of between 5,000 and 6,000 characters.[33]

Information technology

edit

The following sections will introduce the Chinese character sets of some encoding standards used in information technology, including GB, Big5 and Unicode.

Guobiao

edit

GB stands for Guobiao (‘national standard’), and is the prefix for reference numbers of official standards issued by the People's Republic of China.

The first GB Chinese character encoding standard is GB 2312, which was released in 1980. It includes 6,763 Chinese characters, with 3,755 frequently-used ones sorted by pinyin, and the rest by radicals (indexing components). GB 2312 was designed for simplified characters. Traditional characters which have been simplified are not covered. GB 2312 is still in use on some computers and the web, though newer versions with extended character sets, such as GB 13000.1 and GB 18030, have been released.[34] The latest version of GB encoding is GB 18030. It supports both simplified and traditional Chinese characters, and is consistent with Unicode's character set.[35]

Big5

edit

Big5 encoding was designed by five big IT companies in Taiwan in the early 1980s, and has been the de facto standard for representing traditional Chinese in computers ever since. Big5 is popularly used in Taiwan, Hong Kong and Macau. The original Big5 standard included 13,053 Chinese characters, with no simplified characters of the Mainland. Chinese characters in the Big5 character set are arranged in radical order. Extended versions of Big5 include Big-5E and Big5-2003, which include some simplified characters and Hong Kong Cantonese characters.[36]

Unicode

edit

Unicode is the most influential international standard for multilingual character encoding. It is consistent with (or virtually equivalent to) standard ISO/IEC10646. The full version of Unicode represents a character with a 4-byte digital code, providing a huge encoding space to cover all characters of all languages in the world. The Basic Multilingual Plane (BMP) is a 2-byte kernel version of Unicode with 2^16=65,536 code points for important characters of many languages. There are 27,522 characters in the CJKV (China, Japan, Korea and Vietnam) Ideographs Area, including all the simplified and traditional Chinese characters in GB2312 and Big5 traditional. [37]

In Unicode 15.0, there is a multilingual character set of 149,813 characters, among which 98,682 (about 2/3) are Chinese characters sorted by Kangxi Radicals. Even very rarely-used characters are available.[38]

All the 5,009 characters of the Hong Kong Supplementary Character Set (HKSCS)[39] are included in Unicode. HKSCS was developed by the Hong Kong government as a collection of locally specific Chinese characters not available on the computer in the early days.

Unicode is becoming more and more popular. It is reported that UTF-8 (Unicode) is used by 98.1% of all the websites. It is widely believed that Unicode will ultimately replace all other information interchange codes and internal codes for digital devices.[40]

See also

edit

Notes

edit
  1. ^ Heading characters of 196,373 word entries
  2. ^ Reference for education
  3. ^ Reference for font foundries

References

edit

Citations

edit
  1. ^ Yang 2008, p. 186.
  2. ^ Yang 2008, pp. 186–187.
  3. ^ Li 2013, p. 32.
  4. ^ Wang & Zou 2003, p. 45.
  5. ^ Zhou 2003, p. 73.
  6. ^ Yong & Peng 2008, pp. 198–199.
  7. ^ Wilkinson 2012, p. 46.
  8. ^ 《異體字字典》網路版說明. Dictionary of Chinese Character Variants (in Chinese). Archived from the original on 2009-03-17.
  9. ^ DeFrancis 2003.
  10. ^ 大漢和辞典デジタル版 – 株式会社大修館書店. www.taishukan.co.jp (in Japanese). Retrieved 2023-11-01.
  11. ^ "World's Biggest Chinese Character Dictionary Nearly Complete". Chosun. Retrieved 2023-11-01.
  12. ^ Su 2014, p. 47.
  13. ^ Li 2020, p. 142.
  14. ^ Lunde 2008, p. 80.
  15. ^ 国务院关于公布《通用规范汉字表》的通知 [Notice of the State Council on the Publication of the "General Standard Chinese Character List"] (in Chinese). State Council of the People's Republic of China. 5 June 2013.
  16. ^ Hong Kong Education Bureau 2012.
  17. ^ "Reference Glyphs for Chinese Computer Systems in Hong Kong". Common Chinese Language Interface. Retrieved 25 March 2024.
  18. ^ 常用國字標準字體表 [Chart of Standard Forms of Common National Characters] (in Chinese). Taipei: Zhengzhong shuju. 1983. ISBN 978-9-570-90664-6.
  19. ^ Lunde 2008, pp. 81–82.
  20. ^ Lunde 2008, p. 82.
  21. ^ 改定常用漢字表、30日に内閣告示 閣議で正式決定 [The Amended List of Jōyō Kanji Receives Cabinet Notice on 30th: To Be Officially Confirmed in Cabinet Meeting] (in Japanese). Nihon Keizai Shimbun. 24 November 2010.
  22. ^ Lunde 2008, pp. 84.
  23. ^ 现代汉语常用字表 Archived 2016-11-13 at the Wayback Machine [List of Frequently Used Characters in Modern Chinese], Ministry of Education of the People's Republic of China, 26 Jan 1988.
  24. ^ 现代汉语通用字表 Archived 2016-11-23 at the Wayback Machine [List of Commonly Used Characters in Modern Chinese], Ministry of Education of the People's Republic of China, 26 Jan 1988.
  25. ^ "国务院关于公布《通用规范汉字表》的通知" [Notice of the State Council on the publication of the "General Standard Chinese Character List"]. Gov.cn (in Chinese). State Council of the People's Republic of China. 5 June 2013.
  26. ^ Yang 2008, p. 220.
  27. ^ Language Institute 2020.
  28. ^ Language Institute 2016.
  29. ^ National Academy of the Korean Language (1991) Archived March 19, 2016, at the Wayback Machine
  30. ^ "'인명용(人名用)' 한자 5761→8142자로 대폭 확대". The Chosun Ilbo (in Korean). 2014-10-20. Retrieved 2017-08-23.
  31. ^ Su 2014, p. 51.
  32. ^ Yang 2008, p. 192.
  33. ^ Norman 1988, p. 73.
  34. ^ Su 2014, pp. 213–215.
  35. ^ Lunde, Ken (4 August 2022). "The GB 18030-2022 Standard". Medium. Retrieved 7 August 2022.
  36. ^ "[chinese mac] Character Sets". chinesemac.org. Retrieved 2023-11-24.
  37. ^ Unicode Consortium 2023.
  38. ^ "Unicode Statistics".
  39. ^ "OGCIO : Hong Kong Supplementary Character Set (HKSCS)".
  40. ^ "Usage Statistics and Market Share of UTF-8 for Websites, March 2024".

Works cited

edit
  • DeFrancis, John, ed. (2003). ABC Chinese–English Comprehensive Dictionary. University of Hawaiʻi Press. ISBN 978-0-824-82766-3.
  • 常用字字形表:二零零七年重排本:附粤普字音及英文解釋 [Commonly Used Characters Glyph Table: 2007 Rearranged Edition with Cantonese and Mandarin Pronunciations and English Explanations] (in Chinese). Hong Kong Education Bureau. 2012 [2007]. ISBN 978-9-888-12393-3.
  • Language Institute, Chinese Academy of Social Sciences (2016). 现代汉语词典 [Modern Chinese Dictionary] (in Chinese) (7th ed.). Beijing: The Commercial Press. ISBN 978-7-100-12450-8.
  • Language Institute, Chinese Academy of Social Sciences (2020). 新华字典 [Xinhua Dictionary] (in Chinese) (12th ed.). Beijing: The Commercial Press. ISBN 978-7-100-17093-2.
  • Li Dasui (李大遂) (2013). 简明实用汉字学 [Concise and Practical Chinese Characters] (in Chinese) (3rd ed.). Peking University Press. ISBN 978-7-301-21958-4.
  • Li, Yu (2020). The Chinese Writing System in Asia: An Interdisciplinary Perspective. Routledge. ISBN 978-1-138-90731-7.
  • Lunde, Ken (2008). CJKV Information Processing (2nd ed.). O'Reilly. ISBN 978-0-596-51447-1.
  • Norman, Jerry (1988). Chinese. Cambridge University Press. ISBN 978-0-521-29653-3.
  • Su Peicheng (苏培成) (2014). 现代汉字学纲要 [Essentials of Modern Chinese Characters] (in Chinese) (3rd ed.). Beijing: The Commercial Press. ISBN 978-7-100-10440-1.
  • Unicode Standard, Version 15.1.0, South San Francisco, CA: Unicode Consortium, 2023, ISBN 978-1-936-21332-0
  • Wang Ning (王寧) Zou Xiaoli (鄒曉麗) (2003). 工具書 [Reference Books] (in Chinese). Hong Kong: 和平圖書有限公司. ISBN 9-622-38363-7.
  • Wilkinson, Endymion (2012). Chinese History: A New Manual. Harvard-Yenching Institute Monograph Series. Vol. 85. Cambridge, MA: Harvard University Asia Center. ISBN 978-0-674-06715-8.
  • Yang Runlu (杨润陆) (2008). 现代汉字学 [Modern Chinese Characters] (in Chinese). Beijing Normal University Press. ISBN 978-7-303-09437-0.
  • Yong, Heming; Peng, Jing (2008). Chinese Lexicography: A History from 1046 BC to AD 1911. Oxford University Press. ISBN 978-0-191-56167-2.
  • Zhou Youguang (周有光) (2003). The Historical Evolution of Chinese Languages and Scripts 中国语文的时代演进 (in English and Chinese). Translated by Zhang Liqing (张立青). Columbus: National East Asian Languages Resource Center, Ohio State University. ISBN 978-0-87415-349-1.