Page MenuHomePhabricator

newRevisionSlots argument check breaks stub dumps for svwiki
Closed, ResolvedPublic

Description

Incremental dumps are broken for svwiki, and the regular stubs will be broken too when they run tomorrow:

[07052587822907ecbf7c3687] [no req]   TypeError: Argument 5 passed to MediaWiki\Revision\RevisionStore::newRevisionSlots() must implement interface MediaWiki\Page\PageIdentity, null given, called in /srv/mediawiki/php-1.37.0-wmf.14/includes/Revision/RevisionStore.php on line 1713
Backtrace:
from /srv/mediawiki/php-1.37.0-wmf.14/includes/Revision/RevisionStore.php(1487)
#0 /srv/mediawiki/php-1.37.0-wmf.14/includes/Revision/RevisionStore.php(1713): MediaWiki\Revision\RevisionStore->newRevisionSlots(string, stdClass, array, integer, NULL)
#1 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/XmlDumpWriter.php(346): MediaWiki\Revision\RevisionStore->newRevisionFromRowAndSlots(stdClass, array, integer, NULL)
#2 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(542): XmlDumpWriter->writeRevision(stdClass, array)
#3 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(485): WikiExporter->outputPageStreamBatch(Wikimedia\Rdbms\ResultWrapper, stdClass)
#4 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(306): WikiExporter->dumpPages(string, boolean)
#5 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(204): WikiExporter->dumpFrom(string)
#6 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/includes/BackupDumper.php(318): WikiExporter->revsByRange(integer, integer)
#7 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/dumpBackup.php(90): BackupDumper->dump(integer, integer)
#8 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/doMaintenance.php(108): DumpBackup->execute()
#9 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/dumpBackup.php(144): require_once(string)
#10 /srv/mediawiki/multiversion/MWScript.php(116): require_once(string)
#11 {main}

Event Timeline

ArielGlenn created this task.

I guess there's code in stubs generation that needs to add TypeError to everything else being caught.

Change 705346 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/core@master] prevent PageIdentity checks in RevisionStore from breaking xml dumps

https://gerrit.wikimedia.org/r/705346

The above has been tested and I used it to manually complete the broken adds-changes run from last night/this morning.

This case is interesting and should be investigated further, even after the dumps are back to running.

The page in question, "Stockholm Business School", was restored on July 18, see https://sv.wikipedia.org/w/index.php?title=Stockholm_Business_School&action=history&uselang=en
The entry in the page table for this title:

[email protected](svwiki)> select * from page where page_namespace=0 and page_title = "Stockholm_Business_School";
 --------- ---------------- --------------------------- ------------------- ------------------ ------------- ---------------- ---------------- -------------------- ------------- ---------- -------------------- ----------- 
| page_id | page_namespace | page_title                | page_restrictions | page_is_redirect | page_is_new | page_random    | page_touched   | page_links_updated | page_latest | page_len | page_content_model | page_lang |
 --------- ---------------- --------------------------- ------------------- ------------------ ------------- ---------------- ---------------- -------------------- ------------- ---------- -------------------- ----------- 
| 3872662 |              0 | Stockholm_Business_School |                   |                0 |           0 | 0.207856671786 | 20210718185812 | 20210718185812     |    49457568 |     7354 | wikitext           | NULL      |
 --------- ---------------- --------------------------- ------------------- ------------------ ------------- ---------------- ---------------- -------------------- ------------- ---------- -------------------- ----------- 
1 row in set (0.001 sec)

However there are entries in the archive table for this same title:

[email protected](svwiki)> select * from archive where ar_namespace=0 and  ar_title = "Stockholm_Business_School";
 --------- -------------- --------------------------- --------------- ---------- ---------------- --------------- ----------- ------------ -------- ------------ -------------- --------------------------------- 
| ar_id   | ar_namespace | ar_title                  | ar_comment_id | ar_actor | ar_timestamp   | ar_minor_edit | ar_rev_id | ar_deleted | ar_len | ar_page_id | ar_parent_id | ar_sha1                         |
 --------- -------------- --------------------------- --------------- ---------- ---------------- --------------- ----------- ------------ -------- ------------ -------------- --------------------------------- 
| 5218637 |            0 | Stockholm_Business_School |      23822663 |     1136 | 20210717092556 |             0 |  49453482 |          0 |     78 |    8579229 |            0 | i5aq79phzojjd2mn019iqikye1cns89 |
| 5217451 |            0 | Stockholm_Business_School |      23827934 |     1136 | 20210718100730 |             0 |  49456280 |          0 |     75 |    8579418 |            0 | ilxrsk5zyft2zwhht95kw234yzgxced |
 --------- -------------- --------------------------- --------------- ---------- ---------------- --------------- ----------- ------------ -------- ------------ -------------- --------------------------------- 
2 rows in set (0.001 sec)

When it comes time to try to iterate over the revisions, the wrong page id must have been chosen, as I see it has been written out by XmlDumpWriter:

--<snip--
...
  <page>
    <title>Dubai Millennium</title>
    <ns>0</ns>
    <id>8579417</id>
    <revision>
      <id>49456257</id>
      <timestamp>2021-07-18T09:59:18Z</timestamp>
      <contributor>
        <username>Kuriosatempel</username>
        <id>111748</id>
      </contributor>
      <comment>[[Wikipedia:Automatisk sammanfattning|←]]Skapade sidan med '{{Häst | namn = Dubai Millennium | bild =  | bildtext =  | kön = [[Hingst]] | född = {{Sportdatum|1996|3|2}} | födelseland = Storbritannien | död = {{Hästdöd datum och ålder|2001|04|29|1996|03|02}} | död_land = Storbritannien | färg = [[Hästfärg|Brun]] | tecken =  | ras = [[Engelskt fullblod]] | sport = Galoppsport | aktiv = 1998–2000 | efter = [[Seeking The Gold]] | under = [[Colorado Dancer]] | underefter = Sha...'</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text bytes="10912" id="49550759" />
      <sha1>p0stf8dyt26b3ipsp171jtwt7smlzkh</sha1>
    </revision>
  </page>
  <
    <title>Stockholm Business School</title>
    <ns>0</ns>
    <id>8579418</id>

That page id isn't in the page table of course:

[email protected](svwiki)> select * from page where page_id=8579418;
Empty set (0.001 sec)

Naturally nothing good is going to come from that. So my question is, why are those rows still in the archive table after page restoration, and how does the page id from those rows wind up being the unlucky id chosen for use?

A case like this should trigger a RevisionAccessException, not a TypeError. Looking at newRevisionFromRowAndSlots(), I don't see how $page can end up being null in the call to newRevisionSlots(). But I can just add a check and throw the correct exception anyway.

Change 705368 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Add sanity check to newRevisionFromRowAndSlots.

https://gerrit.wikimedia.org/r/705368

Change 705368 merged by jenkins-bot:

[mediawiki/core@master] Add sanity check to newRevisionFromRowAndSlots.

https://gerrit.wikimedia.org/r/705368

A case like this should trigger a RevisionAccessException, not a TypeError. Looking at newRevisionFromRowAndSlots(), I don't see how $page can end up being null in the call to newRevisionSlots(). But I can just add a check and throw the correct exception anyway.

This is not enough. Run with the new code:

MediaWiki\Revision\RevisionAccessException from line 1695 of /srv/mediawiki/php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Failed to determine page associated with revision 49456280
#0 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/XmlDumpWriter.php(346): MediaWiki\Revision\RevisionStore->newRevisionFromRowAndSlots(Object(stdClass), Array, 0, NULL)
#1 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(542): XmlDumpWriter->writeRevision(Object(stdClass), Array)
#2 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(485): WikiExporter->outputPageStreamBatch(Object(Wikimedia\Rdbms\ResultWrapper), Object(stdClass))
#3 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(306): WikiExporter->dumpPages('rev_id >= 49453...', false)
#4 /srv/mediawiki/php-1.37.0-wmf.14/includes/export/WikiExporter.php(204): WikiExporter->dumpFrom('rev_id >= 49453...')
#5 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/includes/BackupDumper.php(318): WikiExporter->revsByRange(49453805, 49456452)
#6 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/dumpBackup.php(90): BackupDumper->dump(16, 1)
#7 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/doMaintenance.php(108): DumpBackup->execute()
#8 /srv/mediawiki/php-1.37.0-wmf.14/maintenance/dumpBackup.php(144): require_once('/srv/mediawiki/...')
#9 /srv/mediawiki/multiversion/MWScript.php(116): require_once('/srv/mediawiki/...')
#10 {main}

Shall I catch this in XmlDumpWriter (instead of TypeError)?

Yes, XmlDumpWriter (or WikiExporter) should catch RevisionAccessException. I was assuming it was already doing that

Change 705448 had a related patch set uploaded (by Ahmon Dancy; author: Daniel Kinzler):

[mediawiki/core@wmf/1.37.0-wmf.14] Add sanity check to newRevisionFromRowAndSlots.

https://gerrit.wikimedia.org/r/705448

Change 705448 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.14] Add sanity check to newRevisionFromRowAndSlots.

https://gerrit.wikimedia.org/r/705448

Mentioned in SAL (#wikimedia-operations) [2021-07-19T19:14:03Z] <dancy@deploy1002> Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448|Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)

Change 705346 merged by jenkins-bot:

[mediawiki/core@master] prevent PageIdentity checks in RevisionStore from breaking xml dumps

https://gerrit.wikimedia.org/r/705346

Change 705467 had a related patch set uploaded (by Ahmon Dancy; author: ArielGlenn):

[mediawiki/core@wmf/1.37.0-wmf.14] prevent PageIdentity checks in RevisionStore from breaking xml dumps

https://gerrit.wikimedia.org/r/705467

Change 705467 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.14] prevent PageIdentity checks in RevisionStore from breaking xml dumps

https://gerrit.wikimedia.org/r/705467

Mentioned in SAL (#wikimedia-operations) [2021-07-19T20:08:31Z] <dancy@deploy1002> Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467|prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)

daniel claimed this task.

The root cause is still unclear, filed as T291164