Page MenuHomePhabricator

Wikimedia sites frequently switching to read-only
Closed, ResolvedPublic

Description

Over the past 2 weeks I ran into quite a few read-only mediawiki warnings trying to save a page. I had it had the Dutch Wikipedia (as a normal user) and my bot seems to run into it on Wikidata about once a day. Please investigate.

Event Timeline

Multichill raised the priority of this task from to Needs Triage.
Multichill updated the task description. (Show Details)
Multichill added a project: acl*sre-team.
Multichill subscribed.
Aklapper removed a project: acl*sre-team.
Aklapper set Security to None.

Hi Multichill, please provide specific steps to reproduce, and the exact and complete output (except for any personal data included) when this happens. See How to report a bug for general info.
There are too many "Wikimedia sites" and too many ways to "save a page"...

André, a wiki switching into read-only mode tends to be logged. I don't have access to these logs, but operations does. So the removal from their queue isn't very productive. This is an intermittent problem. Example timestamp and wiki is 12:26, 26 December 2014‎ (UTC 1) on Wikidata.

André, a wiki switching into read-only mode tends to be logged. I don't have access to these logs, but operations does. So the removal from their queue isn't very productive. This is an intermittent problem. Example timestamp and wiki is 12:26, 26 December 2014‎ (UTC 1) on Wikidata.

We log it? Where?

From the dberror logs on fluorine...

Fri Dec 26 10:35:35 UTC 2014    mw1134  wikidatawiki    Error connecting to 10.64.0.8: Can't connect to MySQL server on '10.64.0.8' (4)
Fri Dec 26 11:29:35 UTC 2014    mw1222  wikidatawiki    Wikibase\TermSqlIndex::getMatchingIDs   10.64.16.10     2013    Lost connection to MySQL server during query (10.64.16.10)      SELECT DISTINCT term_entity_id,term_weight  FROM `wb_terms`   WHERE (term_language='en' AND term_search_key LIKE 'category%'  AND term_type='label' AND term_entity_type='item') OR (term_language='en' AND term_search_key LIKE 'category%'  AND term_type='alias' AND term_entity_type='item')  LIMIT 5000
Fri Dec 26 11:42:14 UTC 2014    mw1231  wikidatawiki    Wikibase\TermSqlIndex::getMatchingIDs   10.64.16.144    2013    Lost connection to MySQL server during query (10.64.16.144)     SELECT DISTINCT term_entity_id,term_weight  FROM `wb_terms`   WHERE (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='label' AND term_entity_type='item') OR (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='alias' AND term_entity_type='item')  LIMIT 5000
Fri Dec 26 11:42:15 UTC 2014    mw1119  wikidatawiki    Wikibase\TermSqlIndex::getMatchingIDs   10.64.16.144    2013    Lost connection to MySQL server during query (10.64.16.144)     SELECT DISTINCT term_entity_id,term_weight  FROM `wb_terms`   WHERE (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='label' AND term_entity_type='item') OR (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='alias' AND term_entity_type='item')  LIMIT 5000
Fri Dec 26 11:42:25 UTC 2014    mw1120  wikidatawiki    Wikibase\TermSqlIndex::getMatchingIDs   10.64.16.144    2013    Lost connection to MySQL server during query (10.64.16.144)     SELECT DISTINCT term_entity_id,term_weight  FROM `wb_terms`   WHERE (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='label' AND term_entity_type='item') OR (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='alias' AND term_entity_type='item')  LIMIT 5000
Fri Dec 26 11:42:30 UTC 2014    mw1201  wikidatawiki    Wikibase\TermSqlIndex::getMatchingIDs   10.64.16.144    2013    Lost connection to MySQL server during query (10.64.16.144)     SELECT DISTINCT term_entity_id,term_weight  FROM `wb_terms`   WHERE (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='label' AND term_entity_type='item') OR (term_language='en' AND term_search_key LIKE 'category:french%'  AND term_type='alias' AND term_entity_type='item')  LIMIT 5000
Fri Dec 26 12:47:10 UTC 2014    mw1127  wikidatawiki    Error connecting to 10.64.32.28: Can't connect to MySQL server on '10.64.32.28' (4)
Fri Dec 26 12:47:10 UTC 2014    mw1127  wikidatawiki    Connection error: Unknown error (10.64.32.28)
Fri Dec 26 13:20:33 UTC 2014    mw1197  wikidatawiki    Error connecting to 10.64.16.154: Can't connect to MySQL server on '10.64.16.154' (4)
Fri Dec 26 13:20:33 UTC 2014    mw1197  wikidatawiki    Connection error: Unknown error (10.64.16.154)

As can be seen on T85341, the server code is raising MediaWiki API exception ReadOnlyError. Is there a log which includes exceptions in the API generator?

Change 181877 had a related patch set uploaded (by Hoo man):
EditPage: Don't claim we're readonly just because $wgUser can't edit

https://gerrit.wikimedia.org/r/181877

Patch-For-Review

I took a quick look at this and found a place where EditPage is raising such an error, although it shouldn't (just because wgUser can't "edit" at that point doesn't mean we're read-only, those mechanisms aren't connected).

Just for the record: A real read-only from the api looks like this:
{

"error": {
    "code": "readonly",
    "info": "The wiki is currently in read-only mode",
    "readonlyreason": "read only test",
    "*": "See http://localhost/core/api.php for API usage"
}

}

Getting a internal_api_error_ReadOnlyError shows an internal error (as the text suggest) and is normally logged.

Some one should search for ReadOnlyError in the exception logs of the api servers to get this type of exception for the api request. A ReadOnlyError over index.php is normally shown as error message and not logged.

It would help when include api error response in the task when having a api error.

As a "normal" user I had this also on the Dutch Wikipedia. Akoopal had the same and he managed to dig up the timestamp:
2014-12-21 22:51:46 akoopal : "Opgelet: De database is op dit moment wegens onderhoudswerkzaamheden geblokkeerd." This is the message we have at https://nl.wikipedia.org/wiki/MediaWiki:Readonlywarning (Amsterdam time, so UTC 1)

And yes Sam, that's a bit of wishful thinking. If readonly warnings don't have proper logging we can file another bug for that..

Reedy asked via IRC if there has been excessive replication lag recently (one possible cause for read-only). Does not appear to be the case based on ganglia.

A recent change to DB load balancing that happens to fit the two-week time line is https://gerrit.wikimedia.org/r/#/c/178787/ . Need @aaron to comment on whether it is relevant here. Possibly not for api, but maybe the web error reports?

Change 181877 abandoned by Hoo man:
EditPage: Don't claim we're readonly just because $wgUser can't edit

Reason:
If the actual exception is being thrown this patch of course is of no value.

https://gerrit.wikimedia.org/r/181877

Aklapper raised the priority of this task from Low to Medium.Dec 28 2014, 11:02 PM

How can operations help here / is this really blocked on ops?

How can operations help here / is this really blocked on ops?

No reply hence I removed the ops tags.

How can operations help here / is this really blocked on ops?

No reply hence I removed the ops tags.

Intermittent problems on production wiki's, that looks like ops to me? Who else could look into that?

I have been travelling and didn't run a bot for quite some time. Does this still happen? If so, could someone please add the timestamps?

If nobody runs into this in the next week I would just close the ticket.

Aklapper claimed this task.

If nobody runs into this in the next week I would just close the ticket.

Doing so.

Lydia_Pintscher subscribed.

@Multichill says it is happening again. Reopening.

Just had one. The edit after https://www.wikidata.org/w/index.php?title=Q28003045&diff=prev&oldid=416085989 on the same item triggered:
pywikibot.data.api.APIError: readonly: The wiki is currently in read-only mode [readonlyreason:Waiting for 4 lagged database(s); help:See https://www.wikidata.org/w/api.php for API usage]
CRITICAL: Closing network session.
<class 'pywikibot.data.api.APIError'>

tools.multichill@tools-bastion-02:~/logs$ grep readonlyreason *.log | wc -l
37
I noticed it a couple of times in the last 2 weeks.

[…]
I noticed it a couple of times in the last 2 weeks.

Schema changes are currently ongoing on all shards and especially s5: T69223: Schema change for page content language and T148967: Fix PK on S5 dewiki.revision. That probably worsens the conditions here :/

I don't think we can do much about this at the moment.

Multichill lowered the priority of this task from Medium to Low.Dec 13 2016, 1:50 PM
hoo claimed this task.

I don't think this is relevant anymore. If this occurs again, it's probably due to a different issue anyway.