Page MenuHomePhabricator

Node.js 10 changes encoding for at least one Georgian character
Closed, DeclinedPublic

Description

The MCS diff tests usually run on Node.js 6.
After temporarily switching to Node.js 10 on one of our development machines we noticed that the first character in one canonical title pointing to Georgian wiki (kawiki) is now different. This patch reverts the change that was made accidentally with Node.js 10.

Which version is the correct one? Should we be concerned about that?


See Also:

Related Objects

StatusSubtypeAssignedTask
StalledNone
ResolvedNone
Resolvedakosiaris
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedBawolff
ResolvedAnomie
ResolvedBawolff
ResolvedBawolff
ResolvedLegoktm
Resolved Lucas_Werkmeister_WMDE
ResolvedBawolff
Resolvedsbassett
Resolvedsbassett
ResolvedJdforrester-WMF
Resolvedsbassett
Resolvedsbassett
ResolvedReedy
ResolvedReedy
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedReedy
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedReedy
ResolvedReedy
ResolvedReedy
ResolvedJdforrester-WMF
Resolvedhashar
Resolvedhashar
ResolvedJdforrester-WMF
Resolvedhashar
DeclinedMoritzMuehlenhoff
Invalidthcipriani
Resolved mmodell
Resolvedhashar
ResolvedJoe
ResolvedJMeybohm
ResolvedJMeybohm
DuplicateDzahn
DeclinedDzahn
ResolvedJdforrester-WMF
OpenNone
OpenNone
ResolvedJdforrester-WMF
Resolvedakosiaris
DeclinedNone

Event Timeline

bearND renamed this task from Node.js 10 changes encoding for at least Georgian character to Node.js 10 changes encoding for at least one Georgian character.Feb 7 2019, 5:53 PM
bearND added projects: I18n, Services.
bearND updated the task description. (Show Details)
bearND added a subscriber: mobrovac.
Pchelolo subscribed.

Interesting!

So, %u10DE is 'GEORGIAN LETTER PAR' and it's supported from Unicode v1.1.0. The %u1C9E is GEORGIAN MTAVRULI CAPITAL LETTER PAR and it's only supported from Unicode v11.0 which was released in June 2018.

So apparently nodejs has updated the version of Unicode it's using, so in node 6 the title normalization first letter capitalization didn't do anything, with a new version of Unicode is actually correctly capitalizing the letter.

I'm not quite sure what can we do here - there will always be disparity between PHP and Node version of Unicode, and I don't think we can do anything about it. Adding exceptions for all the corner cases will not be manageable, I think we just should not worry and hope these situations are rare enough.

Probably the unicode version should be considered part of the content version string, so that running in node 10 is considered a different "API version" than running in node 6, even if there were no other code changes?

Yes, we should bump the version number when we switch to Node 10. Do you propose to add something else Unicode specific there as well?

The first one is correct. See Unicode 11.0.0 changelog

Casing Issues
Casing behavior for the Georgian script has changed significantly. There is a new set of Mtavruli capital letters (U 1C90..U 1CBA, U 1CBD..U 1CBF) in Unicode 11.0, with case mappings to the existing Mkhedruli letters (U 10D0..U 10FA, U 10FD..U 10FF). In prior versions of the Unicode Standard, Mkhedruli Georgian was considered a monocameral (non-casing) script, and the Mkhedruli Georgian letters were gc=Lo. Starting with Version 11.0, those Mkhedruli Georgian letters are now gc=Ll, and have uppercase mappings to Mtavruli Georgian capital letters. This change will have major implications for Georgian implementations, including changes for input methods, fonts, casing, and string matching. Existing implementations have treated Mtavruli headlines and other uses for textual emphasis as a text style, so there will also be significant issues for document conversion and upgrade.

Another complication for Georgian is that the primary orthography does not use titlecasing, and the Mkhedruli Georgian letters do not have titlecase mappings to Mtavruli letters. This is unique among bicameral systems in the Unicode Standard, so casing implementations should be prepared for this exception.

Is this ticket actionable? We have to go up to node 10, soon, regardless.

Ah, I see some comments above about bumping version numbers, at least.

This ship has sailed. Declining as non-actionable.