Page MenuHomePhabricator

Frequent Echo DB_PRIMARY write queries on HTTP GET
Open, In Progress, MediumPublic

Description

The main two offenders are:

Expectation (writes <= 0) by MediaWiki::restInPeace not met (actual: 5):
query-m: DELETE FROM `echo_unread_wikis` WHERE euw_user = 'X'
#0 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/TransactionProfiler.php(245): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated()
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1318): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion()
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1198): Wikimedia\Rdbms\Database->attemptQuery()
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(3004): Wikimedia\Rdbms\Database->query()
#4 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/UnreadWikis.php(141): Wikimedia\Rdbms\Database->delete()
#5 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/NotifUser.php(440): EchoUnreadWikis->updateCount()
#6 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/NotifUser.php(249): MWEchoNotifUser->resetNotificationCount()
#7 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/EchoHooks.php(940): MWEchoNotifUser->markRead()
#8 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/MWCallableUpdate.php(34): Closure$EchoHooks::processMarkAsRead#2()
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(273): MWCallableUpdate->doUpdate()
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(219): DeferredUpdates::runUpdate()
#11 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(143): DeferredUpdates::execute()
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(909): DeferredUpdates::doUpdates()
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(733): MediaWiki->restInPeace()
#14 (): Closure$MediaWiki::doPostOutputShutdown()
#15 {main}
Expectation (writes <= 0) by MediaWiki::restInPeace not met (actual: 13):
query-m: INSERT INTO `echo_event` (event_type,event_variant,event_deleted,event_extra,event_agent_id,event_page_id) VALUES ('X')
#0 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/TransactionProfiler.php(245): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated()
#1 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1318): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion()
#2 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(1198): Wikimedia\Rdbms\Database->attemptQuery()
#3 /srv/mediawiki/php-1.33.0-wmf.23/includes/libs/rdbms/database/Database.php(2123): Wikimedia\Rdbms\Database->query()
#4 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/mapper/EventMapper.php(22): Wikimedia\Rdbms\Database->insert()
#5 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/model/Event.php(232): EchoEventMapper->insert()
#6 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/model/Event.php(167): EchoEvent->insert()
#7 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/DiscussionParser.php(95): EchoEvent::create()
#8 /srv/mediawiki/php-1.33.0-wmf.23/extensions/Echo/includes/EchoHooks.php(561): EchoDiscussionParser::generateEventsForRevision()
#9 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/MWCallableUpdate.php(34): Closure$EchoHooks::onPageContentSaveComplete()
#10 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(273): MWCallableUpdate->doUpdate()
#11 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(219): DeferredUpdates::runUpdate()
#12 /srv/mediawiki/php-1.33.0-wmf.23/includes/deferred/DeferredUpdates.php(143): DeferredUpdates::execute()
#13 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(909): DeferredUpdates::doUpdates()
#14 /srv/mediawiki/php-1.33.0-wmf.23/includes/MediaWiki.php(733): MediaWiki->restInPeace()
#15 (): Closure$MediaWiki::doPostOutputShutdown()
#16 {main}

Locations where the problem occurs

See https://logstash.wikimedia.org/goto/4fb8c9be478c3795bd2e59b90ad75b5b

UserClearNewTalkNotification hook

/**
 * Handler for UserClearNewTalkNotification hook.
 * @see https://www.mediawiki.org/wiki/Manual:Hooks/UserClearNewTalkNotification
 * @param UserIdentity $user User whose talk page notification should be marked as read
 */
public static function onUserClearNewTalkNotification( UserIdentity $user ) {
    if ( $user->isRegistered() ) {
        $userObj = User::newFromIdentity( $user );
        DeferredUpdates::addCallableUpdate( static function () use ( $userObj ) {
            MWEchoNotifUser::newFromUser( $userObj )->clearUserTalkNotifications();
        } );
    }
}

That in turn is invoked in WatchlistManager::clearTitleUserNotifications() and that is called on GET from WikiPage::doViewUpdates() and Flow's ViewAction::showForAction().

onSkinTemplateNavigationUniversal hook

This hook calls EchoHooks::processMarkAsRead() which ends up doing:

DeferredUpdates::addCallableUpdate( static function () use ( $user, $eventIds ) {
    $notifUser = MWEchoNotifUser::newFromUser( $user );
    $notifUser->markRead( $eventIds );
} );

onUserSaveSettings hook

public static function onUserSaveSettings( $user ) {
// Extensions like AbuseFilter might create an account, but
// the tables we need might not exist. Bug 57335
if ( !defined( 'MW_UPDATER' ) ) {
    // Reset the notification count since it may have changed due to user
    // option changes. This covers both explicit changes in the preferences
    // and changes made through the options API (since both call this hook).
    DeferredUpdates::addCallableUpdate( static function () use ( $user ) {
        MWEchoNotifUser::newFromUser( $user )->resetNotificationCount();
    } );
}

I'm not sure how this ends up in a GET context, this is invoked in User::saveSettings() which is doing database writes.

LoginNotify extension

LoginNotify.php has a function recordLoginFailureFromKnownSystem() which creates an Echo notification:

from /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/TransactionProfiler.php(444)
#0 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/TransactionProfiler.php(280): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated(string, Wikimedia\Rdbms\GeneralizedSql, integer)
#1 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(1471): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, double, boolean, integer)
#2 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(1353): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#3 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(1278): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(2405): Wikimedia\Rdbms\Database->query(string, string, integer)
#5 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(2385): Wikimedia\Rdbms\Database->doInsert(string, array, string)
#6 /srv/mediawiki/php-1.37.0-wmf.7/extensions/Echo/includes/mapper/EventMapper.php(22): Wikimedia\Rdbms\Database->insert(string, array, string)
#7 /srv/mediawiki/php-1.37.0-wmf.7/extensions/Echo/includes/model/Event.php(259): EchoEventMapper->insert(EchoEvent)
#8 /srv/mediawiki/php-1.37.0-wmf.7/extensions/Echo/includes/model/Event.php(189): EchoEvent->insert()
#9 /srv/mediawiki/php-1.37.0-wmf.7/extensions/LoginNotify/includes/LoginNotify.php(717): EchoEvent::create(array)
#10 /srv/mediawiki/php-1.37.0-wmf.7/extensions/LoginNotify/includes/LoginNotify.php(698): LoginNotify\LoginNotify->sendNotice(User, string, integer)
#11 /srv/mediawiki/php-1.37.0-wmf.7/extensions/LoginNotify/includes/LoginNotify.php(783): LoginNotify\LoginNotify->recordLoginFailureFromKnownSystem(User)
#12 /srv/mediawiki/php-1.37.0-wmf.7/extensions/LoginNotify/includes/Hooks.php(154): LoginNotify\LoginNotify->recordFailure(User)
#13 /srv/mediawiki/php-1.37.0-wmf.7/extensions/LoginNotify/includes/Hooks.php(128): LoginNotify\Hooks::doFailedLogin(User)
#14 /srv/mediawiki/php-1.37.0-wmf.7/includes/HookContainer/HookContainer.php(330): LoginNotify\Hooks::onAuthManagerLoginAuthenticateAudit(MediaWiki\Auth\AuthenticationResponse, NULL, string, array)
#15 /srv/mediawiki/php-1.37.0-wmf.7/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#16 /srv/mediawiki/php-1.37.0-wmf.7/includes/HookContainer/HookRunner.php(895): MediaWiki\HookContainer\HookContainer->run(string, array)
#17 /srv/mediawiki/php-1.37.0-wmf.7/includes/auth/AuthManager.php(400): MediaWiki\HookContainer\HookRunner->onAuthManagerLoginAuthenticateAudit(MediaWiki\Auth\AuthenticationResponse, NULL, string, array)
#18 /srv/mediawiki/php-1.37.0-wmf.7/includes/api/ApiLogin.php(162): MediaWiki\Auth\AuthManager->beginAuthentication(array, string)
#19 /srv/mediawiki/php-1.37.0-wmf.7/includes/api/ApiMain.php(1669): ApiLogin->execute()
#20 /srv/mediawiki/php-1.37.0-wmf.7/includes/api/ApiMain.php(639): ApiMain->executeAction()
#21 /srv/mediawiki/php-1.37.0-wmf.7/includes/api/ApiMain.php(610): ApiMain->executeActionWithErrorHandling()
#22 /srv/mediawiki/php-1.37.0-wmf.7/api.php(90): ApiMain->execute()
#23 /srv/mediawiki/php-1.37.0-wmf.7/api.php(45): wfApiMain()
#24 /srv/mediawiki/w/api.php(3): require(string)
#25 {main}
Expectation (writes <=) 0 by ApiMain::setRequestExpectations not met (actual: 1):
query-m: INSERT INTO `echo_event` (event_type,event_variant,event_deleted,event_extra,event_agent_id) VALUES ('X',N)

I suppose either ApiLogin.php needs isWriteMode() set to true, or this could be a good case for enqueuing a job.

EchoModerationController

from /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/TransactionProfiler.php(444)
#0 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/TransactionProfiler.php(280): Wikimedia\Rdbms\TransactionProfiler->reportExpectationViolated(string, Wikimedia\Rdbms\GeneralizedSql, integer)
#1 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(1471): Wikimedia\Rdbms\TransactionProfiler->recordQueryCompletion(Wikimedia\Rdbms\GeneralizedSql, double, boolean, integer)
#2 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(1353): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#3 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(1278): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#4 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/DatabaseMysqlBase.php(1379): Wikimedia\Rdbms\Database->query(string, string, integer)
#5 /srv/mediawiki/php-1.37.0-wmf.7/includes/libs/rdbms/database/Database.php(3434): Wikimedia\Rdbms\DatabaseMysqlBase->doUpsert(string, array, array, array, string)
#6 /srv/mediawiki/php-1.37.0-wmf.7/extensions/Echo/includes/UnreadWikis.php(131): Wikimedia\Rdbms\Database->upsert(string, array, array, array, string)
#7 /srv/mediawiki/php-1.37.0-wmf.7/extensions/Echo/includes/NotifUser.php(463): EchoUnreadWikis->updateCount(string, integer, boolean, integer, MWTimestamp)
#8 /srv/mediawiki/php-1.37.0-wmf.7/extensions/Echo/includes/controller/ModerationController.php(40): MWEchoNotifUser->resetNotificationCount()
#9 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/MWCallableUpdate.php(38): EchoModerationController::{closure}()
#10 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdates.php(513): MWCallableUpdate->doUpdate()
#11 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdates.php(390): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#12 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdates.php(234): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#13 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(MWCallableUpdate, integer)
#14 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#15 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdates.php(237): DeferredUpdatesScope->processUpdates(integer, Closure)
#16 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdatesScope.php(267): DeferredUpdates::{closure}(EchoDeferredMarkAsDeletedUpdate, integer)
#17 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdatesScope.php(196): DeferredUpdatesScope->processStageQueue(integer, integer, Closure)
#18 /srv/mediawiki/php-1.37.0-wmf.7/includes/deferred/DeferredUpdates.php(242): DeferredUpdatesScope->processUpdates(integer, Closure)
#19 /srv/mediawiki/php-1.37.0-wmf.7/includes/MediaWiki.php(1128): DeferredUpdates::doUpdates(string)
#20 /srv/mediawiki/php-1.37.0-wmf.7/includes/MediaWiki.php(838): MediaWiki->restInPeace()
#21 /srv/mediawiki/php-1.37.0-wmf.7/api.php(125): MediaWiki->doPostOutputShutdown()
#22 /srv/mediawiki/php-1.37.0-wmf.7/api.php(45): wfApiMain()
#23 /srv/mediawiki/w/api.php(3): require(string)
#24 {main}

Maybe this is originating from DeferredMarkAsDeletedUpdate.php? Not sure. That might seem like a good candidate for a job.

Event Timeline

aaron renamed this task from Frequent Echo queries on HTTP GET to Frequent Echo DB_MASTER write queries on HTTP GET.Mar 29 2019, 5:32 PM
aaron removed aaron as the assignee of this task.May 1 2019, 5:29 PM

The first stack trace is from a deferred update scheduled from EchoHooks::processMarkAsRead() on read requests with a ?markasread=123 query parameter. If writing from deferred updates on GET requests is not allowed, how are we supposed to schedule a write when we notice that it's needed in a GET request? Do we submit a job to the job queue?

The second stack trace confuses me. It comes from a deferred update scheduled from EchoHooks::onPageContentSaveComplete(). How is it possible for the PageContentSaveComplete hook to be fired in a GET request? It seems like a reasonable assumption that a hook with "SaveComplete" in the name will only be fired in POST requests.

Jobs are fine...though this case is complicated since people want their "latest views" to be immediately reflected...so it would have to do something like WatchedItemStore.

I did a bit of thinking about what a WatchedItemStore-like solution could look like. Here's my best idea so far:

  • When marking a notification as read:
    • Check if the notification is already marked as read in the DB; if it is, bail
    • Add its ID to the cached going-to-be-marked-as-read list in MainStash (using merge())
    • Enqueue a job to mark it as read in the DB
    • Purge the cached notification count in WANCache
  • When computing the notification count (i.e. the getWithSetCallback calback for the cached notification count in WANCache):
    • Get the going-to-be-marked-as-read list from cache
    • Query the number of unread notifications from the DB
    • Subtract the length of the going-to-be-marked-as-read list from the DB query result, and return that
  • In the job that marks things as read:
    • Update the DB to mark the notification as read
    • Remove the notification from the going-to-be-marked-as-read list (using merge())
    • Purge the cached notification count in WANCache (needed for convergence after race conditions, see below)

Complications/details:

  • Also need to support "mark as unread"
    • Have a cached going-to-be-marked-as-unread list that works analogously
    • When computing notification count, add the length of this list (in addition to subtracting the length of the going-to-be-marked-as-read list)
  • Global notification counts
    • Every wiki needs to be able to correctly compute the global notification count, even if another wiki has a pending mark-read/mark-unread operation
    • Put the cached mark-read/mark-unread lists in the global cache, so that every wiki knows every other wiki's pending operations and can adjust for them
    • When writing notification counts to the echo_unread_wikis table, don't account for pending operations, but reflect the state of the DB (otherwise pending operations are double-counted)
    • Only write to the echo_unread_wikis table from the job
  • Race condition considerations
    • The following race condition is possible:
      • Notification is added to the mark-read list
      • Job runs, marks notification as read in the master DB, removes it from the mark-read list
      • Try to mark the notification as read a second time; it still seems unread in the DB due to replication lag, so we'll add it to the mark-read list again
      • At this point notification count computations will be wrong, because we'll subtract a pending action we've already taken
      • The second job runs, marks the notification as read in the DB (no-op), and removes it from the mark-read list
      • At this point the count will be computed correctly again
    • Purging the cached count from the job seems unnecessary, but it's needed to make sure this race condition is short-lived and the wrong count doesn't get stuck in the cache

though this case is complicated since people want their "latest views" to be immediately reflected

This is offtopic, but interesting this is brought up, because I normally do not get acknowledgement as a wiki user of viewed notifications, when I click them or mark them as read, and get alerted multiple times on multiple wikis for some amount of time.

marcella subscribed.

Moving this to our Q3 column so we can assess with our other maintenance tasks for the quarter and decide whether we have time in balance with our feature work to prioritize this task.

@Tgr @mewoph @Gilles Perhaps for the main offending code (onSkinTemplateNavigationUniversal's usage of EchoHooks::processMarkAsRead which does DB writes) we could rely on:

  1. keep existing code that does the subtractions but remove the part that does the writes
  2. add some JavaScript code calling an Echo API with a POST to do the mark-as-read work that's currently done in `processMarkAsRead()
  3. enqueue a job in the processMarkAsRead (which we should rename) to support non-JavaScript clients. That job would be a no-op for JS clients.

That wouldn't fix all the problems noted in the task description, but it would take care of the biggest one; the other problems noted in the description seem easier to sort out (via job queue for example).

The WatchedItemStore-like solution mentioned in T219592#5726609 is very likely going to be beyond the Growth-Team's capacity over the next year.

Solving this with a combination of a post-send queued job and JS-issued POST requests for the DB writes sounds like a fine solution.

I also feel the extra stash layer would add more complexity and maybe fragility than benefits. Can we configure the job to be super high priority (basically a cross-DC deferred)?

I also wonder if some sort of client-side polling to keep notification counts up to date over time would make the timeliness of the notification count update less important. (There's also a WIP patch around for pushing notification updates to the browser, IIRC.)

The JS POST job sounds like a good approach too, but the POST request would have to update the current notification count in the browser (seems easy) and we'd have to handle the situation where the user opens the notification to see what it's about, decides it does not need immediate attention and mark it as unread so they can return later, and then the job runs and tries to mark it as read again. (So basically we'd have to implement some kind of CAS mechanics. Might be as simple as referencing the ID of the notification record.)

I also feel the extra stash layer would add more complexity and maybe fragility than benefits. Can we configure the job to be super high priority (basically a cross-DC deferred)?

I also wonder if some sort of client-side polling to keep notification counts up to date over time would make the timeliness of the notification count update less important. (There's also a WIP patch around for pushing notification updates to the browser, IIRC.)

The client-side polling was implemented in T219222: Make notification counts update without page reload but it hasn't been enabled anywhere (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/ /530639 needs to be restored).

The JS POST job sounds like a good approach too, but the POST request would have to update the current notification count in the browser (seems easy) and we'd have to handle the situation where the user opens the notification to see what it's about, decides it does not need immediate attention and mark it as unread so they can return later, and then the job runs and tries to mark it as read again. (So basically we'd have to implement some kind of CAS mechanics. Might be as simple as referencing the ID of the notification record.)

Good point, yes we'll need to handle this scenario.

Change 703706 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] Mark notifications as read via client-side JS

https://gerrit.wikimedia.org/r/703706

I also feel the extra stash layer would add more complexity and maybe fragility than benefits. Can we configure the job to be super high priority (basically a cross-DC deferred)?

I also wonder if some sort of client-side polling to keep notification counts up to date over time would make the timeliness of the notification count update less important. (There's also a WIP patch around for pushing notification updates to the browser, IIRC.)

The client-side polling was implemented in T219222: Make notification counts update without page reload but it hasn't been enabled anywhere (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/ /530639 needs to be restored).

The JS POST job sounds like a good approach too, but the POST request would have to update the current notification count in the browser (seems easy) and we'd have to handle the situation where the user opens the notification to see what it's about, decides it does not need immediate attention and mark it as unread so they can return later, and then the job runs and tries to mark it as read again. (So basically we'd have to implement some kind of CAS mechanics. Might be as simple as referencing the ID of the notification record.)

Good point, yes we'll need to handle this scenario.

I hadn't realized the Special:Notifications already has good no-JS support; a user can visit that page and mark items and groups of items as read/unread. Given that, it might be OK to postpone the jobqueue implementation for this, but if someone feels strongly that it needs to be part of the patch I've posted above, please comment.

Change 703712 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] Check that request was posted before resetting notification count

https://gerrit.wikimedia.org/r/703712

Change 703714 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/LoginNotify@master] Use job queue for login fail known notification

https://gerrit.wikimedia.org/r/703714

Change 703715 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/core@master] ApiLogin: Set isWriteMode to true

https://gerrit.wikimedia.org/r/703715

Change 703729 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] Implement markasread for new talk notification

https://gerrit.wikimedia.org/r/703729

Change 703734 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] Convert EchoDeferredMarkAsDeleted class to job

https://gerrit.wikimedia.org/r/703734

kostajh added subscribers: DMburugu, MMiller_WMF.

@DMburugu @MMiller_WMF I'm moving this into current sprint for code review, but I'll let you decide how that's prioritized relative to the other streams of work we have in July & August.

Change 703715 abandoned by Reedy:

[mediawiki/core@master] ApiLogin: Set isWriteMode to true

Reason:

https://gerrit.wikimedia.org/r/c/mediawiki/core/ /704180

https://gerrit.wikimedia.org/r/703715

Change 724005 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/Echo@master] [WIP] Mark notifications as read via client-side JS

https://gerrit.wikimedia.org/r/724005

Urbanecm_WMF changed the task status from Open to In Progress.Jan 31 2022, 11:34 AM
kostajh renamed this task from Frequent Echo DB_MASTER write queries on HTTP GET to Frequent Echo DB_PRIMARY write queries on HTTP GET.Mar 31 2022, 1:12 PM
kostajh added a subscriber: Krinkle.

Moving to in progress since there are code review comments to address, but this is also something we haven't found time to get to, and I am not sure when we will be able to actually work on it. @Krinkle is there an update from the Performance team side about the urgency of this work?

I see three different code paths involved here:

  1. Login requests. I believe these are generally POST requests, so we don't need to worry about deferring notification creation here. If it ends up deferred in some way as part of the general service class provided by Echo doing this, that's fine of course. The less needless difference the better, but by itself, entries from ApiLogin should not result in a DBPerformance warning anymore.
  1. Loginwiki requests. The auto-login bounce via login.wikimedia.org with CentralAuth are, due to technical limitations in web browsers, implemented as GET requests. However as per the latest draft of T91820 for MultiDC routing configuration, we will excempt this domain and treat it as effectively a POST request. This is fine in terms of impact as they are either not user-facing (background JS requests), or user-facing as part of a POST request chain (e.g. login form, POST-redirect-GET).
  1. resetNotificationCount during pageviews of the talk page. (see below)
  1. processMarkAsRead during general pageviews anywhere else. (see below)

Contrary to the original 2015 plan, the more minimal first-step direction for Multi-DC since 2018 is that post-send DB writes are "fine" so long as they are 1) "rare" enough to not be a concern for primary DB load or appserver load in the form of holding of PHP threads during the slower connections for cross-dc queries, and 2) accept that their completion may be slow and take longer than normal and 3) accept that they can fail and thus have recovery or eventual-consistency in place.

In practice, this means some rarely used features can get away with the doing a DB write from a post-send deferred update. This is, for example, why T122708 is not a blocker anymore. And in core, the clearing of newtalk and watchlist state for talk pages also does a direct write during those page views as they appear rare enough at the moment. (Logic in WikiPage::doViewUpdates, TalkPageNotificationManager::remove*, WatchedItemStore::reset*)

For more commonly used features, such as the watchlist read marker that is updated on potentially any pageview (resetNotificationTimestamp), we queue a job instead. But, because it is meant to be visible user-facing immediately, it also writes to the MainStashDB from a post-send deferred.

The latter seems like something that would be suitable for Echo as well. I recall that Echo used to do this, or at least did something like it for some of its features, perhaps not for this specifically. I'm referring to the echo-seen timestamps, which used to be written to MainStash and were then migrated to a dedicated Kask instance due to high write traffic. I thought that also controlled mark-as-read, but perhaps not?

Growth team won't be able to get to it immediately as we're short on capacity and may be so for the next Quarters. We'll save it in our maintenance epic so that it remains visible for action when capacity is available.

@kostajh There is some mention of JobQueue both here in relation to future ideas and in the code for existing logic. As part of T314750, perf team disabled the $wgEchoUseJobQueue feature on testwikis, which was not enabled anywhere else.

I'm letting you know as I don't recall whether our mental model of Echo in production was based on this being used or this not being used. In any event, I suggest including in this task deciding what to do with that now-unused code.

Change 703714 abandoned by Kosta Harlan:

[mediawiki/extensions/LoginNotify@master] Use job queue for login fail known notification

Reason:

https://gerrit.wikimedia.org/r/703714

Change 703729 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] Implement markasread for new talk notification

Reason:

https://gerrit.wikimedia.org/r/703729

Change 703706 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] [WIP] Mark notifications as read via client-side JS

Reason:

https://gerrit.wikimedia.org/r/703706

Change 703734 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] Convert EchoDeferredMarkAsDeleted class to job

Reason:

https://gerrit.wikimedia.org/r/703734

Change 703712 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] Check that request was posted before resetting notification count

Reason:

https://gerrit.wikimedia.org/r/703712

Change 724005 abandoned by Kosta Harlan:

[mediawiki/extensions/Echo@master] [WIP] Mark notifications as read via client-side JS

Reason:

https://gerrit.wikimedia.org/r/724005