Page MenuHomePhabricator

ICU transition towards ICU 67
Closed, DeclinedPublic

Description

High level steps:

  • Create co-installable ICU 67 backports for buster
  • Build PHP 7.4. against the ICU 67 backport (and validate what other components using ICU might need a backport)
  • Migrate collation data using the backported PHP
  • When complete, upgrade production systems/images to the build using ICU 67
  • Unblocks using servers with Bullseye and images based on Bullseye (since ICU 67 is already in use)

Event Timeline

Change 888645 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add component/icu67

https://gerrit.wikimedia.org/r/888645

Change 888645 merged by Muehlenhoff:

[operations/puppet@production] Add component/icu67

https://gerrit.wikimedia.org/r/888645

Mentioned in SAL (#wikimedia-operations) [2023-02-14T15:52:30Z] <moritzm> uploaded src:icu67 67.1-7~wmf1 to buster-wikimedia/component/icu67 T329491

Change 889795 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] package_builder: Add build hook for component/icu67

https://gerrit.wikimedia.org/r/889795

Change 889795 merged by Muehlenhoff:

[operations/puppet@production] package_builder: Add build hook for component/icu67

https://gerrit.wikimedia.org/r/889795

Mentioned in SAL (#wikimedia-operations) [2023-02-21T16:06:57Z] <moritzm> imported libxml2 2.9.4 dfsg1-7 deb10u5 icu67 wmf1 to component/icu67 for buster-wikimedia T329491

PHP build depends on libxml2, which itself also uses ICU by default. I have patched it to build without ICU for the component/icu67 component, it falls back to iconv internally.

Change 893014 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Install pbuilder hook for ICU67 component

https://gerrit.wikimedia.org/r/893014

Change 893014 merged by Muehlenhoff:

[operations/puppet@production] Install pbuilder hook for ICU67 component

https://gerrit.wikimedia.org/r/893014

Mentioned in SAL (#wikimedia-operations) [2023-03-03T11:13:36Z] <moritzm> imported PHP 7.4 1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 to component/icu67 (build of PHP against co-installable ICU67) T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T10:02:20Z] <moritzm> imported dh-php 0.35 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T10:26:43Z] <moritzm> imported php-defaults 7.4 76 wmf1~buster2 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T10:38:25Z] <moritzm> imported php-pcov 1.0.6-4 wmf1~buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T10:55:19Z] <moritzm> imported php-imagick 3.4.4 php8.0 3.4.4-2 deb11u2 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T11:11:15Z] <moritzm> imported php-msgpack 2.1.2 0.5.7-2 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T11:31:14Z] <moritzm> imported php-apcu 5.1.19 4.0.11-3 wmf2 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T11:46:33Z] <moritzm> imported php-igbinary 3.2.1 2.0.8-2 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T11:58:44Z] <moritzm> imported php-memcached 3.1.5 2.2.0-5 deb11u1 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T12:19:34Z] <moritzm> imported php-redis 5.3.2 4.3.0-2 deb11u1 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T12:37:01Z] <moritzm> imported php-geoip 1.1.1-7 wmf2 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T13:00:08Z] <moritzm> imported php-wmerrors 2.0.0~git20190628.183ef7d-3 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T13:11:51Z] <moritzm> imported php-luasandbox 4.0.2-3 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T13:25:05Z] <moritzm> imported php-excimer 1.0.2-1 wmf2 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T13:40:00Z] <moritzm> imported wikidiff2 1.13.0-1 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T15:35:40Z] <moritzm> imported php-yaml 2.2.1 2.1.0 2.0.4 1.3.2-2 wmf1~buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T16:00:42Z] <moritzm> imported xdebug 3.0.3 2.9.8 2.8.1 2.5.5-0 deb11u1 wmf1 buster1 icu67u1 T329491

Mentioned in SAL (#wikimedia-operations) [2023-03-13T16:20:19Z] <moritzm> imported tideways 5.0.4-2 wmf1 buster1 icu67u1 T329491

These packages have been rebuilt against the ICU67-enabled PHP packages and imported to the component/icu67 component (some packages depend on others, e.g. igbinary on apcu and memcached on igbinary):

  • dh-php 0.35 wmf1 buster1 icu67u1
  • php-defaults 76 wmf1~buster2 icu67u1
  • php-pcov 1.0.6-4 wmf1~buster1 icu67u1
  • php-imagick 3.4.4 php8.0 3.4.4-2 deb11u2 wmf1 buster1 icu67u1
  • php-msgpack 2.1.2 0.5.7-2 wmf1 buster1 icu67u1
  • php-apcu 5.1.19 4.0.11-3 wmf2 buster1 icu67u1
  • php-igbinary 3.2.1 2.0.8-2 wmf1 buster1 icu67u1
  • php-memcached 3.1.5 2.2.0-5 deb11u1 wmf1 buster1 icu67u1
  • php-redis 5.3.2 4.3.0-2 deb11u1 wmf1 buster1 icu67u1
  • php-geoip 1.1.1-7 wmf2 buster1 icu67u1
  • php-wmerrors 2.0.0~git20190628.183ef7d-3 wmf1 buster1 icu67u1
  • php-luasandbox 4.0.2-3 wmf1 buster1 icu67u1
  • php-excimer 1.0.2-1 wmf2 buster1 icu67u1
  • wikidiff2 1.13.0-1 wmf1 buster1 icu67u1
  • php-yaml 2.2.1 2.1.0 2.0.4 1.3.2-2 wmf1~buster1 icu67u1
  • xdebug 3.0.3 2.9.8 2.8.1 2.5.5-0 deb11u1 wmf1 buster1 icu67u1
  • tideways 5.0.4-2 wmf1 buster1 iceu67u1

The backports are complete and support Unicode 13 now!

jmm@jmm-mw-icu67:~$ php -r "var_dump(IntlChar::getUnicodeVersion());"
array(4) {
  [0]=>
  int(13)
  [1]=>
  int(0)
  [2]=>
  int(0)
  [3]=>
  int(0)
}

Tested with the CLI, next step would be to temporarily upgrade a node in deployment-prep for some additional tests:

jmm@jmm-mw-icu67:~$ dpkg --list | grep icu67
ii  libicu67:amd64                       67.1-7~wmf1                                                    amd64        International Components for Unicode
ii  libxml2:amd64                        2.9.4 dfsg1-7 deb10u5 icu67 wmf1                               amd64        GNOME XML library
ii  php-common                           2:76 wmf1~buster2 icu67u1                                      all          Common files for PHP packages
ii  php7.4-cli                           1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 amd64        command-line interpreter for the PHP scripting language
ii  php7.4-common                        1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 amd64        documentation, examples and common module for PHP
ii  php7.4-intl                          1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 amd64        Internationalisation module for PHP
ii  php7.4-json                          1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 amd64        JSON module for PHP
ii  php7.4-opcache                       1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 amd64        Zend OpCache module for PHP
ii  php7.4-readline                      1:7.4.33-1 0~20221108.73 debian10~1.gbpa00350a wmf10u2 icu67u1 amd64        readline module for PHP

I want to add a bit of context given I'm about to go on PTO, so that others can pick up this work. Thanks @MoritzMuehlenhoff for your work up to this point.

Since the last ICU transition happened, T263437 was solved. We now have a less painful way to do an ICU upgrade. The concept is that we will setup a shellbox deployment using a php base image with the new ICU version, then point mediawiki to use it to write to a secondary category links table.

There are still some doubts about the feasibility of that approach as some wikis have a huge table, like i.e. commons.

If we have to go the old way (see T264991):

On the day of the announced transition

  • Upgrade, over a couple of days, all production servers to use the new packages using the newer collation. Typically try the canaries one day, everything else the next
  • Once all servers are updated, run updateCollation.php in parallel, one for each section, using foreachwikiindblist. On large wikis, this can take a week or longer.

We can explore the alternative - which would be even more elaborate but implies no user-noticeable effect, only if the DBAs think there isn't an excessive risk involved. @Ladsgroup I remember you raising some concerns on the task, especially regarding commons. Is the strategy of duplicating the categorylinks table feasible? If the answer is positive, I'll try to enumerate the steps for that.

So I looked at categorylinks tables everywhere. There are the top ten biggest ones:

root@clouddb1021:/srv# ls -Ssh sqldata.s*/*/categorylinks.ibd | head
 188G sqldata.s4/commonswiki/categorylinks.ibd
  50G sqldata.s1/enwiki/categorylinks.ibd
  36G sqldata.s3/ruwikinews/categorylinks.ibd
  33G sqldata.s7/arwiki/categorylinks.ibd
  20G sqldata.s3/arzwiki/categorylinks.ibd
  16G sqldata.s6/frwiki/categorylinks.ibd
  12G sqldata.s7/fawiki/categorylinks.ibd
  11G sqldata.s3/ruwiktionary/categorylinks.ibd
 9.5G sqldata.s7/ukwiki/categorylinks.ibd
 8.8G sqldata.s6/ruwiki/categorylinks.ibd

Except like eight wikis, sure. They are so small the impact will be negligible. Out of rest, except commons, it'll be a bit large but given that it's going to be temporary, it's fine. For commons wiki 200GB categorylinks table, it's simply not doable. I did a check and it's as half as big as all other wikis combined.

I haven't got to do clean up on categorylinks but once there, it'd might improve things but that's years in future. So using a config to do the new way for all other wikis, but old way for the commonswiki would be okay from my side.

So I looked at categorylinks tables everywhere. There are the top ten biggest ones:

root@clouddb1021:/srv# ls -Ssh sqldata.s*/*/categorylinks.ibd | head
 188G sqldata.s4/commonswiki/categorylinks.ibd
  50G sqldata.s1/enwiki/categorylinks.ibd
  36G sqldata.s3/ruwikinews/categorylinks.ibd
  33G sqldata.s7/arwiki/categorylinks.ibd
  20G sqldata.s3/arzwiki/categorylinks.ibd
  16G sqldata.s6/frwiki/categorylinks.ibd
  12G sqldata.s7/fawiki/categorylinks.ibd
  11G sqldata.s3/ruwiktionary/categorylinks.ibd
 9.5G sqldata.s7/ukwiki/categorylinks.ibd
 8.8G sqldata.s6/ruwiki/categorylinks.ibd

Except like eight wikis, sure. They are so small the impact will be negligible. Out of rest, except commons, it'll be a bit large but given that it's going to be temporary, it's fine. For commons wiki 200GB categorylinks table, it's simply not doable. I did a check and it's as half as big as all other wikis combined.

I haven't got to do clean up on categorylinks but once there, it'd might improve things but that's years in future. So using a config to do the new way for all other wikis, but old way for the commonswiki would be okay from my side.

Out of curiosity, did you check how many of the wikis above have a non-standard collation?

Good question:
Commons, arwiki, use the default and the rest don't

	'enwiki' => 'uca-default-u-kn', // T136150
	'ruwiktionary' => 'uca-ru',
	'frwiki' => 'uca-fr-u-kn', // T56680, T146675
	'fawiki' => 'uca-fa', // T139110
	'ruwiktionary' => 'uca-ru', // T54997
	'ukwiki' => 'uca-uk-u-kn', // T47444, T148682
	'ruwiki' => 'uca-ru-u-kn', // T54997, T146675

As long as commons wouldn't need this, I'm happy.

Mentioned in SAL (#wikimedia-operations) [2023-07-12T12:52:26Z] <moritzm> imported wikidiff2 1.14.1-0 wmf1 buster1 icu67u1 to component/icu67 T340087 T329491

A note that I did a test run of sql/xml dumps on deployment-prep with the new icu version and it looks fine to me, though I didn't check for any weird details of category sorting or whatever.

Change 954700 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/docker-images/production-images@master] Add temporary buster-based PHP7.4 icu67 images

https://gerrit.wikimedia.org/r/954700

I 've uploaded changes for icu67 php7.4 images for use with a shellbox deployment. I 'll also create a temporary shellbox deployment based on those.

Change 954700 merged by Alexandros Kosiaris:

[operations/docker-images/production-images@master] Add temporary buster-based PHP7.4 icu67 images

https://gerrit.wikimedia.org/r/954700

ICU67 images, built and pushed.

Change 955747 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[mediawiki/libs/Shellbox@master] Create an icu67 variant for shellbox

https://gerrit.wikimedia.org/r/955747

Change 955747 merged by jenkins-bot:

[mediawiki/libs/Shellbox@master] Create an icu67 variant for shellbox

https://gerrit.wikimedia.org/r/955747

Change 955772 had a related patch set uploaded (by Alexandros Kosiaris; author: Alexandros Kosiaris):

[operations/deployment-charts@master] icu67: Setup shellbox-icu67

https://gerrit.wikimedia.org/r/955772

JMeybohm subscribed.

Because of time constraints we're going to do the ICU upgrade the "old way" again. Closing this in favor of T345561: Upgrade the MediaWiki servers to ICU 67

Change 955772 abandoned by Alexandros Kosiaris:

[operations/deployment-charts@master] icu67: Setup shellbox-icu67

Reason:

We went the old way after all

https://gerrit.wikimedia.org/r/955772