Page MenuHomePhabricator

Reduce db queries from parsercached logged-in pageview (Nov 2022)
Closed, ResolvedPublic

Description

Done: https://logstash.wikimedia.org/goto/91b0a220c8ed5d02d0fe9540fb7b193d

I see 58 queries:

Notable issues:

Event Timeline

Ladsgroup added a subscriber: Daimona.

The abuse filter mess is an interesting example of slightly messed up concepts feeding each other until we end up with a monster:

  • The page view request needs to set user groups as a js variable (so startup module can load specific modules for admins and so on)
  • It does it by calling UserGroupManager::getUserEffectiveGroups()
  • That function's caching is somehow broken
  • then it calls ::getUserImplicitGroups()
  • Its cache is also somewhat broken/ineffective
  • It calls ::getUserAutopromoteGroups()
  • which tries to figure out autopromoted groups
    • This part is messed up in other ways too, why it would need to check the conditions of auto promote like edit count? Isn't that get updated automatically after edits by the user and stored in user_groups table?
    • Here we have a call to user table to get edit count
  • Now onGetAutoPromoteGroups calls hook handlers
  • Abusefilter sometimes has rules that can block an auto-promote
  • The way it's stored is messed up, if a user does something that causes a filter to trigger blocking of auto promote, it stores it as a cache value in main stash, otherwise it's a miss
  • It has a 30s cache on the service. No APCu cache, no WAN cache.
  • I think even that is broken, if it's a miss, if the user is not blocked from auto promoted (99.9999999% of cases), the return value would be false which automatically means "don't cache". Read getWithSetCallback's documentation.

It's not super easy to fix, I'm going to give it a try later.

It seems that user edit count (i.e. User::mEditCount) used to be cached in memcached alongside other data from User instances until it was removed in rMWb82cf00983c62ff9163686663aa8e2b07bdcdef3. Since then, it is only cached in a request-scoped cache inside UserEditTracker. This probably explains why User::getEditCount is costlier now.

Noting that none of these stuff are cached in APCu, otherwise subsequent requests made wouldn't trigger the abuse filter db query (but they do)

Krinkle renamed this task from Audit and fix db queries in a parsed page visit to Audit and reduce db queries from parsercached page views (late 2022).Nov 7 2022, 5:37 PM

I was puzzled by this for a moment, but I'm gonna say it makes sense. We don't parse or apply user groups for the anonymous contexts of load.php. Not in its startup response, not in its stylesheet responses, not in any of its JS bundles etc.

The one load.php request that does this, is the one for the "user" module like /w/load.php?modules=user&user=Ladsgroup&…. And more specifically, only if OutputPage.php before that during a page view determined that you have a user script and/or are member of a group with at least one matching MediaWiki:Group-•.js page existing. If so, the module is queued and during that request it naturally has to know what groups you're member of. These are rare enough that I wouldn't worry about that much. Plus, they're also long-expiry HTTP cached with version hash injected from index.php.

Having said that, I have no opinion on whether "compute user group memberships" should involve AbuseFilter — generally speaking. If we can avoid that for the majority of callers througout the platform (including and especially during index.php page views, which the user has to go through on every pageview) then that is of course a big win :)

  • I think even that is broken, if it's a miss, if the user is not blocked from auto promoted (99.9999999% of cases), the return value would be false which automatically means "don't cache". Read getWithSetCallback's documentation.

Maybe I'm missing something, but getAutoPromoteBlockStatus() returns an integer exactly for that reason. Also, I don't think there have been significant changes to that code in the last couple years. The last big change was a backend change by Aaron in 2019 (r521018).

  • I think even that is broken, if it's a miss, if the user is not blocked from auto promoted (99.9999999% of cases), the return value would be false which automatically means "don't cache". Read getWithSetCallback's documentation.

Maybe I'm missing something, but getAutoPromoteBlockStatus() returns an integer exactly for that reason. Also, I don't think there have been significant changes to that code in the last couple years. The last big change was a backend change by Aaron in 2019 (r521018).

The biggest change is that the backend for main stash was changed from redis to databases to support multi-dc. It was making a network beforehand, it wasn't just obvious to us.

Noting that still the cache for it (AutoPromoteGroupsHandler) is just an objectcache (ObjectCache::getInstance( 'hash' )) and it's only useful inside a request, with TTL of 30s, it's not really reducing much network calls.

To be clear: I'm not suggesting the abuse filter change should be cached in APCu, it wouldn't really make a difference.

Change 854989 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/PageImages@master] Cache database query result inside the request context

https://gerrit.wikimedia.org/r/854989

Marostegui triaged this task as Medium priority.Nov 11 2022, 2:39 PM
Marostegui moved this task from Triage to Ready on the DBA board.

Change 854989 merged by jenkins-bot:

[mediawiki/extensions/PageImages@master] Add in-process cache for result of getPageImage() DB query

https://gerrit.wikimedia.org/r/854989

Change 858365 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/PageImages@master] Adopt getWithSet idiom for getPageImage in-process cache

https://gerrit.wikimedia.org/r/858365

Change 858365 merged by jenkins-bot:

[mediawiki/extensions/PageImages@master] Adopt getWithSet idiom for getPageImage in-process cache

https://gerrit.wikimedia.org/r/858365

Krinkle renamed this task from Audit and reduce db queries from parsercached page views (late 2022) to Audit and reduce db queries from parsercached page views (Nov 2022).Sep 22 2023, 3:05 AM

I see 58 queries:

As of writing, one year later, I found 19 queries at T347123: Reduce database queries on parsercached logged-out page views (Sep 2023). However, that's logged-out. I believe yours was logged-in.

Notable issues:

Either solved or specific to logged-in.

  • We have a lot of queries checking for user edit count, probably caused by the one above
  • Watchlist is making duplicate back-to-back db queries again, this time three times
  • Maybe MediaWiki\ResourceLoader\WikiModule can batch?
  • User::load tries to load user/actor of the visitor three times with identical queries.

Specific to logged-in.

  • PageImages\PageImages::getPageImage is making duplicate queries back-to-back
  • MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds loads the revision twice with slightly different queries.

Might be solved. Both of these make only 1 query currently at T347123.

Krinkle renamed this task from Audit and reduce db queries from parsercached page views (Nov 2022) to Reduce db queries from parsercached logged-in pageview (Nov 2022).Sep 22 2023, 3:11 AM

I see 58 queries:

As of writing, one year later, I found 19 queries at T347123: Reduce database queries on parsercached logged-out page views (Sep 2023). However, that's logged-out. I believe yours was logged-in.

Yup

Notable issues:

Either solved or specific to logged-in.

  • We have a lot of queries checking for user edit count, probably caused by the one above
  • Watchlist is making duplicate back-to-back db queries again, this time three times
  • Maybe MediaWiki\ResourceLoader\WikiModule can batch?
  • User::load tries to load user/actor of the visitor three times with identical queries.

Specific to logged-in.

  • PageImages\PageImages::getPageImage is making duplicate queries back-to-back
  • MediaWiki\Revision\RevisionStore::fetchRevisionRowFromConds loads the revision twice with slightly different queries.

Might be solved. Both of these make only 1 query currently at T347123.

We put a cache on it :D https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/ /854989