Page MenuHomePhabricator

Renames getting stuck on mediawiki.org (Sept 13, 2016)
Closed, ResolvedPublic

Description

Renames are getting hung up on mediawiki.org again. This has been a very common occurrence recently, it would seem.

The renames that are stuck are:

Anzeana → CraftByte https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/CraftByte
As.timbro → Adel.S https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Adel.S
BritishRailways → VulcanXH558 https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/VulcanXH558
Danilopeixoto8 → DaniloPeixoto https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/DaniloPeixoto
https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Kelly72913

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
MarcoAurelio triaged this task as Unbreak Now! priority.Sep 14 2016, 9:09 AM

If possible, please investigate why renames are failing so frequently at mediawiki.org

2016-09-14 11:46:36 [V9k2ygpAMF0AAFgRNtsAAAAG] mw1301 mediawikiwiki 1.28.0-wmf.19 exception ERROR: [V9k2ygpAMF0AAFgRNtsAAAAG] /rpc/RunJobs.php?wiki=mediawikiwiki&type=LocalRenameUserJob&maxtime=60&maxmem=300M   DBTransactionError from line 295 of /srv/mediawiki/php-1.28.0-wmf.19/includes/db/loadbalancer/LBFactory.php: LocalRenameJob::run: transaction round 'LocalRenameUserJob::run' still running. {"exception_id":"V9k2ygpAMF0AAFgRNtsAAAAG"} 
[Exception DBTransactionError] (/srv/mediawiki/php-1.28.0-wmf.19/includes/db/loadbalancer/LBFactory.php:295) LocalRenameJob::run: transaction round 'LocalRenameUserJob::run' still running.
  #0 /srv/mediawiki/php-1.28.0-wmf.19/extensions/CentralAuth/includes/LocalRenameJob/LocalRenameJob.php(36): LBFactory->commitMasterChanges(string)
  #1 /srv/mediawiki/php-1.28.0-wmf.19/includes/jobqueue/JobRunner.php(274): LocalRenameJob->run()
  #2 /srv/mediawiki/php-1.28.0-wmf.19/includes/jobqueue/JobRunner.php(184): JobRunner->executeJob(LocalRenameUserJob, LBFactoryMulti, BufferingStatsdDataFactory, integer)
  #3 /srv/mediawiki/rpc/RunJobs.php(47): JobRunner->run(array)
  #4 {main}
Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2016-09-14T20:28:39Z] <hashar@tin> Synchronized php-1.28.0-wmf.19/extensions/CentralAuth/includes/LocalRenameJob/LocalRenameJob.php: Fix LocalRenameJob transaction owner to match JobRunner T143328 T145596 (duration: 00m 48s)

Backorts to 1.28.0-wmf.19 has been done so the issue should be solved on https://www.mediawiki.org/

I have looked at the GlobalRename status on:

Anzeana → CraftByte https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/CraftByte
As.timbro → Adel.S https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Adel.S
BritishRailways → VulcanXH558 https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/VulcanXH558
Danilopeixoto8 → DaniloPeixoto https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/DaniloPeixoto
Diunilomei → Kelly72913 https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Kelly72913

They are stuck in Queued state. Sounds similar to T137973 or T119696#1862039 . AFAIK that needs a manual fix :(

[centralauth]> select ru_oldname, ru_newname, count(*) from renameuser_status group by ru_oldname, ru_newname;
 ----------------- --------------- ---------- 
| ru_oldname      | ru_newname    | count(*) |
 ----------------- --------------- ---------- 
| Anzeana         | CraftByte     |        5 |
| As.timbro       | Adel.S        |        3 |
| BritishRailways | VulcanXH558   |        2 |
| Danilopeixoto8  | DaniloPeixoto |        3 |
| Diunilomei      | Kelly72913    |       29 |
 ----------------- --------------- ---------- 
5 rows in set (0.00 sec)

@Legoktm knows about it. Maybe we can reopen T137973 :(


From the patch, I am pretty sure new renames will worked again from now. So I guess it is not more blocking the train deployment.

Change 310912 had a related patch set uploaded (by Hashar):
Add a script to easily fix stuck global renames

https://gerrit.wikimedia.org/r/310912

Change 310912 merged by jenkins-bot:
Add a script to easily fix stuck global renames

https://gerrit.wikimedia.org/r/310912

Running them eg:

mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=mediawikiwiki --logwiki=metawiki Anzeana CraftByte
Using K6ka as the renamer.
from: Anzeana
to: CraftByte
renamer: K6ka
movepages: 1
suppressredirects: 1
reason: Per [[:w:en:Special:Permalink/739243477|:en:WP:CHUS]]

Starting to run job...
Done!

Do we have the green light to start renaming users again?

Change 310916 had a related patch set uploaded (by Hashar):
Add a script to easily fix stuck global renames

https://gerrit.wikimedia.org/r/310916

Change 310916 merged by jenkins-bot:
Add a script to easily fix stuck global renames

https://gerrit.wikimedia.org/r/310916

Mentioned in SAL (#wikimedia-operations) [2016-09-15T19:06:09Z] <hashar@tin> Synchronized php-1.28.0-wmf.19/extensions/CentralAuth/maintenance/fixStuckGlobalRename.php: To unblock renames stuck on mediawiki.org T145596 (duration: 00m 47s)

Mentioned in SAL (#wikimedia-operations) [2016-09-15T19:07:03Z] <hashar@tin> Synchronized php-1.28.0-wmf.18/extensions/CentralAuth/maintenance/fixStuckGlobalRename.php: To unblock renames stuck on mediawiki.org T145596 (duration: 00m 47s)

@Legoktm had a script to fix the stuck renames manually. I ran it for each of the stuck operations to great success:

[centralauth]> select ru_oldname, ru_newname, count(*) from renameuser_status group by ru_oldname, ru_newname;
Empty set (0.00 sec)

All those renames are shown as complete as well:

https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/CraftByte
https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Adel.S
https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/VulcanXH558
https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/DaniloPeixoto
https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Kelly72913

Please reach out the impacted user as needed. I do not know what is the process on wiki.

Seems it is all fine, please reopen if something is broken. Thank you for your patience!

When attempting to accept this request I get an error message:

[V9r29wpAMFYAAC6vGT8AAAAT] 2016-09-15 19:31:03: Fatal exception of type "MWException"

err.PNG (478×1 px, 44 KB)

When attempting to accept this request I get an error message:

[V9r29wpAMFYAAC6vGT8AAAAT] 2016-09-15 19:31:03: Fatal exception of type "MWException"

err.PNG (478×1 px, 44 KB)

"Failed to run getConfiguration.php"
{
  "file": "/srv/mediawiki/php-1.28.0-wmf.19/includes/jobqueue/JobQueueGroup.php",
  "line": 422,
  "function": "getConfig",
  "class": "SiteConfiguration",
  "type": "->",
  "args": [
    "string",
    "array"
  ]
},

The original issue has been fixed by rECAU20f54121d177f996c155d5feea6c799a8c9ed242

The subsequent error "Failed to run getConfiguration.php" had been filled as T145851. It impacted a lot of different background Jobs and the main one having all the details is T145819.