Page MenuHomePhabricator

Migrate to and read from new store for item terms
Closed, ResolvedPublic

Description

Status:

  • Reading new store everywhere, up to Q30 million
  • Writing new store up to Q87 million
  • Writing old store, all items

TODOs:

  • SQL rebuilt up to Q 86 million (20 million items left)
  • SQL rebuild Q86 million (1-2 million items left)
  • Enable writing for "all" items (when created)
  • Final rebuild / catch up from the last migration to the first new item that wrote to the new store
  • Finish reading from the new store for everything

Details

SubjectRepoBranchLines /-
operations/mediawiki-configmaster 2 -4
operations/mediawiki-configmaster 4 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 1 -2
operations/mediawiki-configmaster 1 -2
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 2 -2
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 1 -1
operations/mediawiki-configmaster 2 -2
operations/puppetproduction 0 -11
operations/mediawiki-configmaster 2 -2
operations/puppetproduction 1 -1
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
ResolvedAddshore
ResolvedArielGlenn
ResolvedAddshore
ResolvedLadsgroup
InvalidNone
DuplicateLadsgroup
ResolvedLadsgroup
DuplicateLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedAddshore
ResolvedAddshore
ResolvedAddshore
ResolvedLadsgroup
ResolvedAddshore
ResolvedLadsgroup
ResolvedLadsgroup

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Mentioned in SAL (#wikimedia-operations) [2020-03-04T13:47:31Z] <addshore> START warm cache for db1111 & db1126 for Q25-30 million T219123 (pass 3)

Mentioned in SAL (#wikimedia-operations) [2020-03-04T17:41:01Z] <addshore> stop item term rebuild at Q Q60345318 as I generate more lists (T219123)

I'm generating some new lists to work from to reduce the amount of time the rest of the migration will take.

Identifying "holes" in the tables, where some records exist, but some in other tables have gone missing due to bugs:

addshore@stat1007:~$ analytics-mysql wikidatawiki -e "SELECT DISTINCT wbit_item_id as id FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbx_text IS NULL ORDER BY wbit_item_id ASC;" -N -B > 4march1740-holes-nulls.list

Another one identifying all items that have no records yet (have not been migrated):

addshore@stat1007:~$ cat 4march1740-holes-86000000.sql
SELECT n
FROM
(
SELECT (a.digit   (10 * b.digit)   (100 * c.digit)   (1000 * d.digit)   (10000 * e.digit)   (100000 * f.digit)   (1000000 * g.digit)   (10000000 * h.digit)) as n
    from (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as d
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as e
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as f
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as g
    cross join (select 0 as digit union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as h
) as a
LEFT JOIN
( SELECT DISTINCT wbit_item_id from wbt_item_terms where wbit_item_id BETWEEN 0 AND 86000000 ) as b
ON wbit_item_id = n
WHERE wbit_item_id IS NULL
AND n BETWEEN 0 AND 86000000
;

After these lists have been migrated we will also have to deal with items above 86 million.

When running these queries earlier I identified roughly 26 million more items to pass over

addshore@stat1007:~$ wc -l 4march1143-holes-*.list
 19237277 4march1143-holes-87061632.list
  6753614 4march1143-holes-nulls.list
 25990891 total
addshore@stat1007:~$ sort 4march1143-holes-*.list | uniq | wc -l
25990891

Mentioned in SAL (#wikimedia-operations) [2020-03-05T08:50:28Z] <addshore> START warm cache for db1111 & db1126 for Q25-30 million T219123 (pass 1 today)

Mentioned in SAL (#wikimedia-operations) [2020-03-05T10:11:06Z] <addshore> START warm cache for db1111 & db1126 for Q25-30 million T219123 (pass 2 today)

Change 577209 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] Read from the new term store up to Q30 mill everywhere

https://gerrit.wikimedia.org/r/577209

Change 577211 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] Write to the new terms store up to Q 87 million

https://gerrit.wikimedia.org/r/577211

Archive of the old description as I am now rebuilding ONLY using SQL generated lists.

Hadoop based Rebuild number 3

  • 31 January, Q 1.8 million
  • 4 February Q 9 million
  • 7 February Q 14.5 million
  • 17 February Q 37.5 million
  • 23 February Q 44.5 million
  • 27 February Q 50 million
  • 2 March Q54 million
  • 4 March Q60 million

Post Hadoop, SQL based rebuild number 3:

Rough prediction of when we could finish the bulk of the migration:

image.png (573×909 px, 127 KB)

Change 577209 merged by jenkins-bot:
[operations/mediawiki-config@master] Read from the new term store up to Q30 mill everywhere

https://gerrit.wikimedia.org/r/577209

Mentioned in SAL (#wikimedia-operations) [2020-03-05T11:04:55Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Reading up to Q30M for the new term store everywhere (was Q25M) warm db1126 & db1111 caches (T219123) (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2020-03-05T11:06:08Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Reading up to Q30M for the new term store everywhere (was Q25M) warm db1126 & db1111 caches (T219123) cache bust (duration: 01m 04s)

Change 577211 merged by jenkins-bot:
[operations/mediawiki-config@master] Write to the new terms store up to Q 87 million

https://gerrit.wikimedia.org/r/577211

Mentioned in SAL (#wikimedia-operations) [2020-03-05T11:19:26Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87 million, was 86 (T219123) (duration: 01m 04s)

Mentioned in SAL (#wikimedia-operations) [2020-03-05T11:20:34Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87 million, was 86 (T219123) cache bust (duration: 01m 03s)

Addshore updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2020-03-05T12:52:42Z] <addshore> START warm cache for db1111 & db1126 for Q30-32 million (100k batch selects, 30s sleep) T219123 (pass 1)

Just added the "rebuild" of 86-87 million to the lists that are currently running.

image.png (648×1 px, 55 KB)

addshore@mwmaint1002:~/sqlterms$ cat ./6/*part* | wc -l
950622

Looks like this rebuild pass should finish mid week

Change 579213 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] Write to the new terms store up to Q 87.5 million

https://gerrit.wikimedia.org/r/579213

Change 579213 merged by jenkins-bot:
[operations/mediawiki-config@master] Write to the new terms store up to Q 87.5 million

https://gerrit.wikimedia.org/r/579213

Mentioned in SAL (#wikimedia-operations) [2020-03-12T08:26:33Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87.5 million, was 87 (T219123) (duration: 01m 12s)

Mentioned in SAL (#wikimedia-operations) [2020-03-12T08:27:45Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Write to new term store up to Q87.5 million, was 87 (T219123) cache bust (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2020-03-12T08:36:02Z] <addshore> start "rebuild" of Q87 -> 87.5 million for T219123

Change 579326 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set term store to WRITE_BOTH for all of Wikidata

https://gerrit.wikimedia.org/r/579326

Change 579326 merged by jenkins-bot:
[operations/mediawiki-config@master] Set term store to WRITE_BOTH for all of Wikidata

https://gerrit.wikimedia.org/r/579326

Mentioned in SAL (#wikimedia-operations) [2020-03-12T18:29:43Z] <tgr@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:579326|Set term store to WRITE_BOTH for all of Wikidata (T219123)]] (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T10:02:20Z] <Amir1> start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=0 --file=15march2217-holes-nulls.list on screen (T219123)

Change 579908 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Revert "Revert "Set term store to WRITE_BOTH for all of Wikidata""

https://gerrit.wikimedia.org/r/579908

Change 579908 merged by jenkins-bot:
[operations/mediawiki-config@master] Revert "Revert "Set term store to WRITE_BOTH for all of Wikidata""

https://gerrit.wikimedia.org/r/579908

Mentioned in SAL (#wikimedia-operations) [2020-03-16T10:43:26Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" (T219123) (duration: 01m 13s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T10:45:09Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" (T219123), take II (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T10:55:44Z] <Amir1> warming up db1026 for up to Q35M for the new term store (T219123)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T11:04:04Z] <Amir1> Warming up InnoDB buffer pool cache in db1111, db1126, db1104, db1092 (T219123)

Change 579913 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set up read new term store up to Q35M

https://gerrit.wikimedia.org/r/579913

Change 579913 merged by jenkins-bot:
[operations/mediawiki-config@master] Set up read new term store up to Q35M

https://gerrit.wikimedia.org/r/579913

Mentioned in SAL (#wikimedia-operations) [2020-03-16T12:05:32Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913|Set up read new term store up to Q35M (T219123)]] (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T12:09:39Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913|Set up read new term store up to Q35M (T219123)]], take II (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T12:09:53Z] <Amir1> warming up cache for Q35M to Q40M for new term store on db1111, db1126, db1104, db1092 (T219123)

Change 579925 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set up read new term store up to Q40M

https://gerrit.wikimedia.org/r/579925

Change 579925 merged by jenkins-bot:
[operations/mediawiki-config@master] Set up read new term store up to Q40M

https://gerrit.wikimedia.org/r/579925

Mentioned in SAL (#wikimedia-operations) [2020-03-16T14:15:02Z] <Amir1> ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id 87500000 --to-id 87767570 --batch-size=10 --sleep=5 (T219123)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T14:18:54Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q40M (T219123)]] (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T14:22:13Z] <Amir1> warming up cache for Q40M to Q50M for new term store on db1111, db1126, db1104, db1092 (T219123)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T14:22:19Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q40M (T219123)]], take II (duration: 01m 06s)

Change 580082 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set up read new term store up to Q50M

https://gerrit.wikimedia.org/r/580082

Change 580082 merged by jenkins-bot:
[operations/mediawiki-config@master] Set up read new term store up to Q50M

https://gerrit.wikimedia.org/r/580082

Mentioned in SAL (#wikimedia-operations) [2020-03-16T17:03:36Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q50M (T219123)]] (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T17:06:56Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q50M (T219123)]], take II (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2020-03-16T17:08:12Z] <Amir1> warming up cache for Q50M to Q60M for new term store on db1111, db1126, db1104, db1092 (T219123)

Change 580285 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set up read new term store up to Q60M

https://gerrit.wikimedia.org/r/580285

Change 580285 merged by jenkins-bot:
[operations/mediawiki-config@master] Set up read new term store up to Q60M

https://gerrit.wikimedia.org/r/580285

Mentioned in SAL (#wikimedia-operations) [2020-03-17T09:54:19Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q60M (T219123)]] (duration: 01m 09s)

Mentioned in SAL (#wikimedia-operations) [2020-03-17T09:55:53Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q60M (T219123)]], take II (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2020-03-17T10:03:03Z] <Amir1> warming up cache for Q60M to Q70M for new term store on db1111, db1126, db1104, db1092 (T219123)

Change 580331 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set up read new term store up to Q70M

https://gerrit.wikimedia.org/r/580331

Change 580331 merged by jenkins-bot:
[operations/mediawiki-config@master] Set up read new term store up to Q70M

https://gerrit.wikimedia.org/r/580331

Mentioned in SAL (#wikimedia-operations) [2020-03-17T14:28:00Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q70M (T219123)]] (duration: 01m 10s)

Mentioned in SAL (#wikimedia-operations) [2020-03-17T14:29:39Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q70M (T219123)]], take II (duration: 01m 04s)

Mentioned in SAL (#wikimedia-operations) [2020-03-17T17:52:07Z] <Amir1> warming up cache for Q70M to Q80M for new term store on db1111, db1126, db1104, db1092 (T219123)

Change 580397 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set up read new term store up to Q80M

https://gerrit.wikimedia.org/r/580397

Change 580397 merged by jenkins-bot:
[operations/mediawiki-config@master] Set up read new term store up to Q80M

https://gerrit.wikimedia.org/r/580397

Mentioned in SAL (#wikimedia-operations) [2020-03-17T19:00:28Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q80M (T219123)]] (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-17T19:01:40Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Set up read new term store up to Q80M (T219123)]], take II (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2020-03-17T22:49:25Z] <Amir1> warming up cache for Q80M to Q88M for new term store on db1111, db1126, db1104, db1092 (T219123)

Change 580416 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Read from the new term store everywhere

https://gerrit.wikimedia.org/r/580416

Change 580603 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] labs: Stop writing to the old term store for wikidatawiki

https://gerrit.wikimedia.org/r/580603

Change 580603 merged by jenkins-bot:
[operations/mediawiki-config@master] labs: Stop writing to the old term store for wikidatawiki

https://gerrit.wikimedia.org/r/580603

Mentioned in SAL (#wikimedia-releng) [2020-03-18T03:16:34Z] <Amir1> dropping wb_terms table from wikidatawiki in beta cluster (T219123 T219175 T208425)

Change 580416 merged by jenkins-bot:
[operations/mediawiki-config@master] Read from the new term store everywhere

https://gerrit.wikimedia.org/r/580416

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:12:48Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Read from the new term store everywhere (T219123)]] (duration: 01m 08s)

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:14:44Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:31:54Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Read from the new term store everywhere (T219123)]] (duration: 01m 07s)

Mentioned in SAL (#wikimedia-operations) [2020-03-18T10:33:11Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925|Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)