Page MenuHomePhabricator

"Invalid DB key" errors on various special pages
Closed, ResolvedPublicPRODUCTION ERROR

Description

(aka Fatal exception of type "Wikimedia\Assert\ParameterAssertionException")

https://ig.wikipedia.org/wiki/Ihü_kárírí:Categories

The error message is : Bad value for parameter $dbkey: invalid DB key 'Article nke ntakiri'

Also on ptwiki:

https://pt.wikipedia.org/w/index.php?title=Especial:Categorias&offset=&limit=250

and english too: https://en.wikipedia.org/w/index.php?title=Special:Categories&offset=Ar

de: https://de.wikipedia.org/w/index.php?title=Spezial:Kategorien&from=Geographie

Now an example from rights log: https://ur.wikipedia.org/w/index.php?title=خاص:نوشتہ&type=rights&user=ثاقب&page=&year=&month=-1&tagfilter=&subtype=

I am not sure it is the same, but looks similar.

Event Timeline

Matanya renamed this task from special:Categories fatals in ig wiki to special:Categories fatals in some wikis.Jan 11 2017, 1:50 PM
Matanya updated the task description. (Show Details)
Matanya renamed this task from special:Categories fatals in some wikis to special:Categories fatals - 1.29.0-wmf.7.Jan 11 2017, 1:59 PM
Matanya updated the task description. (Show Details)
TTO added subscribers: daniel, TTO.
MariaDB [igwiki_p]> select cat_title,cat_pages from category where cat_title like 'Article_nke_ntakiri';
 --------------------- ----------- 
| cat_title           | cat_pages |
 --------------------- ----------- 
| Article nke ntakiri |        -1 |
| Article_nke_ntakiri |         0 |
 --------------------- ----------- 
2 rows in set (0.00 sec)

No categorylinks for this category:

MariaDB [igwiki_p]> select * from categorylinks where cl_to like '% %';
Empty set (0.00 sec)

Same story on ptwiki:

MariaDB [ptwiki_p]> select cat_title,cat_pages from category where cat_title like '% %';
 ------------------------------------------------------------------------- ----------- 
| cat_title                                                               | cat_pages |
 ------------------------------------------------------------------------- ----------- 
| Canais de televisão aberta                                              |       -43 |
| Arquitetos de Portugal                                                  |        -8 |
| Guerras envolvendo o Vietnam                                            |        -8 |
| Futebolistas do Clube de Regatas do Flamengo                            |       -14 |
| !Artigos com trechos que carecem de notas de rodapé desde maio de 2009  |       -45 |
| Políticos do Irão                                                       |       -15 |
| História do Irão                                                        |        -1 |
 ------------------------------------------------------------------------- ----------- 
7 rows in set (0.15 sec)

From the cat_id values, it looks like these entries were created no later than June 2009.

I guess the rows just need to be manually purged from the DB. DELETE FROM category WHERE cat_title LIKE '% %'; would do the trick. It'd be worth running this query over all wikis to ensure this doesn't happen in other places.

Once again the ruthless insistence of the TitleValue constructor that it be passed a valid DB key comes back to bite us. @daniel, what do you make of this? Is there any possibility of making this a softer failure?

Matanya renamed this task from special:Categories fatals - 1.29.0-wmf.7 to special:pages fatals - 1.29.0-wmf.7.Jan 11 2017, 4:24 PM
Matanya renamed this task from special:pages fatals - 1.29.0-wmf.7 to special pages fatals - 1.29.0-wmf.7.
Matanya updated the task description. (Show Details)

Now an example from rights log: https://ur.wikipedia.org/w/index.php?title=خاص:نوشتہ&type=rights&user=ثاقب&page=&year=&month=-1&tagfilter=&subtype=

I am not sure it is the same, but looks similar.

Same deal, this time in the logging table:

*************************** 26. row ***************************
        log_id: 311
      log_type: rights
    log_action: rights
 log_timestamp: 20051115152102
      log_user: 153
 log_user_text:
 log_namespace: 2
     log_title: AFRAZ_ULQURAISH_
   log_comment:  sysop
    log_params:
   log_deleted: 0
       user_id: 153
     user_name: ثاقب
user_editcount: 4390
       ts_tags: NULL
TTO renamed this task from special pages fatals - 1.29.0-wmf.7 to "Invalid DB key" errors on various special pages.Jan 20 2017, 1:48 PM
TTO triaged this task as High priority.

There are three options here: either the broken rows are manually removed from the DB, we make TitleValue's error handling more lenient, or we do something else to stop these broken titles getting through to TitleValue in the first place.

@daniel, do you have suggestions?

I wouldn't touch the DB manually, but try to fix it in some other way. Who knows what inconsistencies we can create by running that DELETE manually.

The rows can be deleted, but this really needs mediawiki middleware (maintenance script) because those titles (if still exist) should regenerate its statistics (not sure what that means- if it requires full reparsing, or something else). It probably cannot be just running DELETE FROM category WHERE cat_title LIKE '% %'; for consistency's sake.

I can promise you those old category entries with spaces in them are not needed. The category entries will have been regenerated since 2009.

Thinking about this some more, we could stop TitleValue throwing exceptions, or whatever, but that's just passing the problem of bogus DB data onto someone else to deal with in a few years. We really need a proper maintenance script to clean up the various tables. I'll see what I can come up with.

TitleValue should generally throw an exception when it gets invalid invalid data. That's what exceptions are for. Calling code can catch that exception on an appropriate level. We can consider throwing a more specific exception to make error handling easier.

In any case, we should not shoot the messenger, not ignore bad data. The broken data should be removed from the database (and be regenerated), and we should find out what code put the invalid data into the database in the first place. Ideally, the code that puts these values into the DB would also use TitleValue. That would guarantee consistency.

As a temporary fix, the code that reads the rows from the database and constructs TitleValues from them could apply normalization (replacing spaces with underscores). This may however introduce duplicates. I don't know if that would be a problem for the code in question.

There are three options here: either the broken rows are manually removed from the DB, we make TitleValue's error handling more lenient, or we do something else to stop these broken titles getting through to TitleValue in the first place.

@daniel, do you have suggestions?

Uh.... why are there negative values in cat_pages anyway? That should never happen. Perhaps it's related to the wrong key being used? Addition uses one key, removal another, so one count grows, and the other shrinks?

This really seems like a symptom of a larger problem. Hiding the problem by making TitleValue more lenient is not the solution.

I'm working on a maintenance script to deal with these invalid DB keys.

We could also do with a script that handles categories with negative cat_pages values. Literally all it'd have to do is $dbr->select( 'category', 'cat_id', [ 'cat_pages < 0' ], __METHOD__ ) then Category::newFromID( $row->cat_id )->refreshCounts() on each row.

Change 333486 had a related patch set uploaded (by TTO):
New maintenance script to clean up rows with invalid DB keys

https://gerrit.wikimedia.org/r/333486

We could also do with a script that handles categories with negative cat_pages values.

See T85696 and its linked tasks for more about this.

TTO updated the task description. (Show Details)
TTO added a subscriber: matmarex.

If we can review that maintenance script and get it run, we will be able to get rid of these ongoing errors.

How can I get all categories (with red) in ru.wiktionary?

I understand this week is going to be very busy for the MediaWiki-Platform-Team due to the datacenter juggling, but I'd really appreciate a review on the associated patch before too long. This bad data continues to cause problems (particularly on Wikisource, apparently) so the sooner it can be removed, the better.

Change 333486 merged by jenkins-bot:
[mediawiki/core@master] New maintenance script to clean up rows with invalid DB keys

https://gerrit.wikimedia.org/r/333486

This has been fixed by running a maintenance script to remove invalid data from the database. If this error is seen on any more special pages, please reopen the task.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM