Unicode 6.2.0

Unicode 6.2.0 is a minor version of the Unicode Standard. This page summarizes the important changes for the Unicode Standard, Version 6.2.0. In the discussion below, Version 6.2.0 may be abbreviated as "Unicode 6.2" or "Version 6.2."

Contents of This Document

A. Summary

Version 6.2 of the Unicode Standard is a special release dedicated to the early publication of the newly encoded Turkish lira sign. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database. In addition, there are some significant changes to the Unicode algorithms for text segmentation and line breaking, including changes to the line break property to improve line breaking for emoji symbols.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.2:

This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus the accelerated publication of a single character: U 20BA TURKISH LIRA SIGN.

B. Version Information

Version 6.2 of the Unicode Standard consists of the core specification, the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

A complete specification of the contributory files for Unicode 6.2 is found on the page Components for 6.2.0.That page also provides the recommended reference format for Unicode Standard Annexes.

Code Charts

For Unicode 6.2.0 in particular two additional sets of code chart pages are provided:

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 6.2 are listed by date in a separate table. For corrigenda and errata after the release of Unicode 6.2, see the list of current Updates and Errata.

C. Stability Policy Update

A property value constraint has been added to guarantee that no new characters will be added to the standard with Decomposition_Mapping values whose first character has a non-zero Canonical_Combining_Class. There are four exceptions, which were encoded long ago, prior to Unicode 2.1.

D. Textual Changes and Character Additions

Textual changes are very minimal in this version, and are essentially limited to adding a description for the new Turkish lira sign.

Character Assignment Overview

One new character assignment was made in the BMP for the Unicode Standard, Version 6.2. This addition brings the total number of characters assigned in the standard to 110,117. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)

E. Conformance Changes

There are no significant conformance changes in the core specification. However, there are minor changes to the text segmentation algorithms in UAX #14 and UAX #29.

F. Unicode Character Database Changes

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.2 can be found in UAX #44, Unicode Character Database.

Segmentation properties (Grapheme_Cluster_Break, Word_Break, Line_Break) have been modified to improve the segmentation of regional indicator symbols. Other modifications have been made to the Line_Break property values for pictographic symbols, to enable better line breaking behavior. A number of small corrections have also been made for numeric, East Asian width, script, and Unihan properties, and one name alias correction has been added.

Starting with Version 6.2, the encoding for the Unicode names list file (NamesList.txt) has been changed from Latin-1 to UTF-8. This change became possible because of an update of the charting tools which use the names list file in the production of the Unicode code charts.

The U-Source data and glyphs associated with UAX #45 have been added to the Unicode Character Database.

G. Unicode Standard Annex Changes

In Version 6.2, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard Annex	Changes
UAX #9 Unicode Bidirectional Algorithm	No significant changes in this version.
UAX #11 East Asian Width	A note was added to definition ED3 in Section 4 to explain the East Asian Halfwidth property of U 20A9 WON SIGN.
UAX #14 Unicode Line Breaking Algorithm	The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U 200B ZWSP.)
UAX #15 Unicode Normalization Forms	Additional equivalences were added to the Design Goals.
UAX #24 Unicode Script Property	The text was rewritten substantially to incorporate a fuller explanation of the Script_Extensions property and its property value assignments. A disclaimer was added about the stability of Script and Script_Extensions property values.
UAX #29 Unicode Text Segmentation	The text was modified so that property values and rules prevent breaks between Regional Indicator (RI) characters. (Sequences of more than two RI characters should be separated by other characters, such as U 200B ZWSP.) Regular expressions have been clarified in Table 1b, Combining Character Sequences and Grapheme Clusters.
UAX #31 Unicode Identifier and Pattern Syntax	No significant changes in this version.
UAX #34 Unicode Named Character Sequences	No significant changes in this version.
UAX #38 Unicode Han Database (Unihan)	No significant changes in this version.
UAX #41 Common References for Unicode Standard Annexes	No significant changes in this version.
UAX #42 Unicode Character Database in XML	No significant changes in this version.
UAX #44 Unicode Character Database	The status of Script_Extensions was updated to informative and the type of Bidi_Mirroring was updated from String to Miscellaneous. The Unicode_1_Name property was marked as obsolete. A clarification was added regarding change control for normative and informative property values.
UAX #45 U-Source Ideographs	UAX #45 has been updated from a Unicode Technical Report to a Unicode Standard Annex for this version. The data files for UAX #45 have been added to the Unicode Character Database.

Unicode® 6.2.0

Released: 2012 September 26 (Announcement)