ISO/IEC 8859-2

ISO/IEC 8859-2
MIME / IANA	ISO-8859-2
Alias(es)	iso-ir-101, csISOLatin2, latin2, l2, IBM1111
Language(s)	(see below)
Standard	ECMA-94:1986, ISO/IEC 8859
Classification	Extended ASCII, ISO/IEC 8859
Extends	US-ASCII
Based on	ISO-8859-1
Other related encoding(s)	Windows-1250, MacCroatian
	v; t; e;

ISO/IEC 8859-2:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 2: Latin alphabet No. 2, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central^[1] or "Eastern European" languages that are written in the Latin script. Note that ISO/IEC 8859-2 is very different from code page 852 (MS-DOS Latin 2, PC Latin 2) which is also referred to as "Latin-2" in Czech and Slovak regions.^[2] Almost half the use of the encoding is for Polish, and it's the main legacy encoding for Polish, while virtually all use of it has been replaced by UTF-8 (on the web).

ISO-8859-2 is the IANA preferred charset name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429. Less than 0.04% of all web pages use ISO-8859-2 as of October 2022.^[3]^[4] Microsoft has assigned code page 28592 a.k.a. Windows-28592 to ISO-8859-2 in Windows. IBM assigned code page 912 to ISO 8859-2,^[5] until that code page was extended in 1999.^[6] Code page 1111 is similar, but replaces byte B0 ° (degree sign) with U 02DA ˚ (ring above).

Windows-1250 is similar to ISO-8859-2 and has all the printable characters it has and more. However a few of them are rearranged (unlike Windows-1252, which keeps all printable characters from ISO-8859-1 in the same place).

Language coverage

These code values can be used for the following languages:

^ The missing letter Å is officially a part of the Finnish alphabet, however it has no native use and its usage is limited to foreign names only.
^ In 2017, the Council for German Orthography officially added a capital ẞ, but is not actually required as SS can be used instead.
^ This character set unifies Ș and Ț (S,T with commas below) with Ş and Ţ (S, T with cedillas), as did virtually all other character sets including Microsoft's Windows-1250 and the first version of Unicode. Unicode subsequently disunified them however, this complicated processing of Romanian data; pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.^{[citation needed]}

Code page layout

Differences from ISO-8859-1 have the Unicode code point number underneath.

ISO/IEC 8859-2 (Latin-2)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x
1x
2x	SP	!	"	#	$	%	&	'	(	)	*		,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
8x
9x
Ax	NBSP	Ą 0104	˘ 02D8	Ł 0141	¤	Ľ 013D	Ś 015A	§	¨	Š 0160	Ş 015E	Ť 0164	Ź 0179	SHY	Ž 017D	Ż 017B
Bx	°	ą 0105	˛ 02DB	ł 0142	´	ľ 013E	ś 015B	ˇ 02C7	¸	š 0161	ş 015F	ť 0165	ź 017A	˝ 02DD	ž 017E	ż 017C
Cx	Ŕ 0154	Á	Â	Ă 0102	Ä	Ĺ 0139	Ć 0106	Ç	Č 010C	É	Ę 0118	Ë	Ě 011A	Í	Î	Ď 010E
Dx	Đ 0110	Ń 0143	Ň 0147	Ó	Ô	Ő 0150	Ö	×	Ř 0158	Ů 016E	Ú	Ű 0170	Ü	Ý	Ţ 0162	ß
Ex	ŕ 0155	á	â	ă 0103	ä	ĺ 013A	ć 0107	ç	č 010D	é	ę 0119	ë	ě 011B	í	î	ď 010F
Fx	đ 0111	ń 0144	ň 0148	ó	ô	ő 0151	ö	÷	ř 0159	ů 016F	ú	ű 0171	ü	ý	ţ 0163	˙ 02D9

References

^ "Microsoft Outlook Message Encodings". 10 January 2017.
^ "The Czech and Slovak Character Encoding Mess Explained". luki.sdf-eu.org. Retrieved 2022-02-27.
^ "Usage Statistics and Market Share of ISO-8859-2 for Websites, October 2022". w3techs.com. Retrieved 2022-10-23.
^ "Historical trends in the usage statistics of character encodings for websites, February 2022".
^ "Icu-data/Charset/Data/XML/Ibm-912_P100-1995.XML at main · unicode-org/Icu-data". GitHub.
^ "Icu-data/Charset/Data/Ucm/Ibm-912_P100-1999.ucm at main · unicode-org/Icu-data". GitHub.

External links

ISO/IEC 8859-2:1999
Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
ISO-IR 101 Right-Hand Part of Latin Alphabet No.2 (February 1, 1986)
ISO 8859-2 (Latin 2) Resources

[7] The missing letter Å is officially a part of the Finnish alphabet, however it has no native use and its usage is limited to foreign names only.

[8] In 2017, the Council for German Orthography officially added a capital ẞ, but is not actually required as SS can be used instead.

[9] This character set unifies Ș and Ț (S,T with commas below) with Ş and Ţ (S, T with cedillas), as did virtually all other character sets including Microsoft's Windows-1250 and the first version of Unicode. Unicode subsequently disunified them however, this complicated processing of Romanian data; pre-existing data and input methods would still contain the older cedilla codepoints, complicating text searching.^{[citation needed]}

[1] "Microsoft Outlook Message Encodings". 10 January 2017.

[2] "The Czech and Slovak Character Encoding Mess Explained". luki.sdf-eu.org. Retrieved 2022-02-27.

[3] "Usage Statistics and Market Share of ISO-8859-2 for Websites, October 2022". w3techs.com. Retrieved 2022-10-23.

[4] "Historical trends in the usage statistics of character encodings for websites, February 2022".

[5] "Icu-data/Charset/Data/XML/Ibm-912_P100-1995.XML at main · unicode-org/Icu-data". GitHub.

[6] "Icu-data/Charset/Data/Ucm/Ibm-912_P100-1999.ucm at main · unicode-org/Icu-data". GitHub.

[1]

[2]

[3]

[4]

[5]

[6]

[a]

[b]

[c]

ISO/IEC 8859-2

Contents

Language coverage

Code page layout

See also

References

External links