Unicode® 6.2.0
Released: 2012 September 26 (Announcement)
Version 6.2.0 has been superseded by the latest version of the Unicode Standard.
Unicode 6.2.0 is a
minor version of the Unicode Standard. This page summarizes the important changes for the Unicode Standard, Version 6.2.0. In the discussion below, Version 6.2.0 may be abbreviated as "Unicode 6.2" or "Version 6.2."
A. Summary
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Unicode Character Database
Changes
G. Unicode Standard Annex Changes
Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking, including changes to the line break property to improve line breaking for emoji symbols.
For detailed property changes see Section F. Unicode Character Database Changes.
Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for
Version 6.2:
This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus the accelerated publication of a single character: U 20BA TURKISH LIRA SIGN.
Version 6.2 of the Unicode Standard consists of the core specification,
the delta and archival code charts for this version, the Unicode Standard Annexes, and
the Unicode Character Database (UCD).
The core specification gives the general principles,
requirements for conformance, and guidelines for implementers. The
code charts show representative glyphs for all the Unicode
characters. The Unicode Standard Annexes supply detailed normative
information about particular aspects of the standard. The Unicode
Character Database supplies normative and informative data for
implementers to allow them to implement the Unicode Standard.
Version 6.2.0 of the Unicode Standard
should be referenced as:
The Unicode Consortium. The Unicode Standard, Version 6.2.0, (Mountain View, CA: The Unicode Consortium,
2012. ISBN
978-1-936213-07-8)
http://www.unicode.org/versions/Unicode6.2.0/
A complete specification of the contributory files for Unicode
6.2 is found on the page
Components for 6.2.0.That page also provides the recommended reference format for Unicode Standard Annexes.
The navigation bar on the left of this page provides links to
both the core specification as a single file,
as well as to individual chapters, and
the appendices.
Also provided are links to the
code charts, the
radical-stroke indices to CJK
ideographs, the Unicode Standard Annexes
and the data files for Version 6.2 of the Unicode Character Database.
Several sets of code charts are available. They serve different
purposes:
- The latest set of code charts for the Unicode Standard are available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online index by character name is also provided.
For Unicode 6.2.0 in particular two additional sets of code chart pages are provided:
- A set of delta code charts showing the
block in which the Turkish lira sign was added for Unicode 6.2.0. That character is visually highlighted in the relevant chart. These delta code charts also include blocks which contain significant glyph changes to fix errata.
- A set of archival code charts that represent
the entire set of characters, names and representative glyphs at the time of publication of Unicode 6.2.0.
The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.
Errata incorporated into Unicode 6.2 are listed by date in
a separate table. For corrigenda and errata after the release of Unicode 6.2, see the list of current
Updates and Errata.
A property value constraint has been added to guarantee that no new characters will be added to the standard with Decomposition_Mapping values whose first character has a non-zero Canonical_Combining_Class. There are four exceptions, which were encoded long ago, prior to Unicode 2.1.
Note: The Unicode Character Encoding Stability Policy restricts possible future changes to the Unicode Standard, but is not formally a part of the standard itself.
Textual changes are very minimal in this version, and are essentially limited to adding a description for the new Turkish lira sign.
Character Assignment Overview
One new character assignment was made in the BMP for the Unicode Standard, Version 6.2. This addition brings the total number of characters assigned in the standard to 110,117. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)
No new blocks are defined in Version 6.2.
There are no significant conformance changes in the core specification. However, there are minor changes to the text segmentation algorithms in UAX
#14 and UAX
#29.
The detailed listing of all changes to the contributory data files of the Unicode Character Database
for Version 6.2 can be found in
UAX #44, Unicode Character Database.
Segmentation properties (Grapheme_Cluster_Break, Word_Break, Line_Break) have been modified to improve the segmentation of regional indicator symbols. Other modifications have been made to the Line_Break property values for pictographic symbols, to enable better line breaking behavior. A number of small corrections have also been made for numeric, East Asian width, script, and Unihan properties, and one name alias correction has been added.
Starting with Version 6.2, the encoding for the Unicode names list file (NamesList.txt) has been changed from Latin-1 to UTF-8. This change became possible because of an update of the charting tools which use the names list file in the production of the Unicode code charts.
The U-Source data and glyphs associated with UAX #45 have been added to the Unicode Character Database.
The Script_Extension property was changed from provisional to informative.
In Version 6.2, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section
of each UAX, linked directly from the following list of UAXes.
Unicode Standard Annex |
Changes |
UAX #9 Unicode Bidirectional Algorithm
|
No significant changes in this version. |
UAX
#11 East Asian Width |
A note was added to definition ED3 in Section 4 to explain the East Asian Halfwidth property of U 20A9 WON SIGN. |
UAX
#14 Unicode Line Breaking Algorithm |
The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U 200B ZWSP.) |
UAX
#15 Unicode Normalization Forms
|
Additional equivalences were added to the Design Goals. |
UAX
#24 Unicode Script Property
|
The text was rewritten substantially to incorporate a fuller explanation of the Script_Extensions property and its property value assignments. A disclaimer was added about the stability of Script and Script_Extensions property values. |
UAX
#29 Unicode Text Segmentation |
The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U 200B ZWSP.) Regular expressions have been clarified in Table 1b, Combining Character Sequences and Grapheme Clusters. |
UAX
#31 Unicode Identifier and Pattern Syntax
|
No significant changes in this version. |
UAX
#34 Unicode Named Character Sequences |
No significant changes in this version. |
UAX
#38 Unicode Han Database (Unihan) |
No significant changes in this version. |
UAX
#41 Common References for Unicode Standard Annexes |
No significant changes in this version. |
UAX
#42 Unicode Character Database in XML |
No significant changes in this version. |
UAX
#44
Unicode Character Database |
The status of Script_Extensions was updated to informative and the
type of Bidi_Mirroring was updated from String to Miscellaneous. The
Unicode_1_Name property was marked as obsolete. A clarification
was added regarding change control for normative and informative property values. |
UAX
#45
U-Source Ideographs |
UAX #45 has been updated from a Unicode Technical Report to a Unicode Standard Annex for this version. The data files for UAX #45 have been added to the Unicode Character Database. |