API action=parse should be poolcounter-limited if a re-parse is necessary
Closed, ResolvedPublic
Actions

Description

When some parameter in a request for action=parse makes it bypass the parser cache, or more in general if a full reparse is needed, we should limit such actions per user (or IP) to N (with possibly N=3 ?) concurrent executions, using poolcounter.

We had a bot today making about 100 rps for pages like

http://zh.wikipedia.org/w/api.php?action=parse&pageid=2868367&prop=text&wrapoutputclass=wiki-article&disableeditsection=true&mobileformat=true&mainpage=true&format=json

and that caused a decided slowdown of the API cluster. We need to limit such users not to consume a significant portion of our computing power.

Details

	Subject	Repo	Branch	Lines /-
	api: Wrap getParserOutput by PoolCounterWork in ApiParse module	mediawiki/core	master	163 -9

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Peter.ovchyn	T243803 API action=parse should be poolcounter-limited if a re-parse is necessary
		Resolved		Peter.ovchyn	T249531 Improve PoolCounterWork logic to cover possible raised exceptions

Event Timeline

Joe created this task.Jan 27 2020, 11:41 PM

Restricted Application added subscribers: Stang, Aklapper. · View Herald TranscriptJan 27 2020, 11:41 PM

Joe triaged this task as High priority.Jan 27 2020, 11:41 PM

Paladox subscribed.Jan 27 2020, 11:42 PM

Marostegui subscribed.Jan 27 2020, 11:45 PM

Krinkle edited projects, added Wikimedia-Incident; removed PoolCounter.Jan 27 2020, 11:46 PM

Krinkle moved this task from Active investigation to Follow-up prevention on the Wikimedia-Incident board.

tstarling subscribed.Jan 28 2020, 12:24 AM

Jdforrester-WMF subscribed.Jan 28 2020, 3:55 PM

daniel added a project: Platform Team Workboards (Clinic Duty Team).Feb 5 2020, 7:19 PM

daniel removed a project: Platform Engineering.

Ladsgroup subscribed.Feb 7 2020, 3:08 PM

jbond subscribed.Feb 7 2020, 3:09 PM

WDoranWMF moved this task from Inbox to Backlog on the Platform Team Workboards (Clinic Duty Team) board.Mar 24 2020, 3:49 PM

daniel moved this task from Backlog to Later on the Platform Team Workboards (Clinic Duty Team) board.Mar 30 2020, 2:38 PM

• AMooney assigned this task to • nnikkhoui.Mar 30 2020, 3:09 PM

• AMooney moved this task from Later to Ready (WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.

• nnikkhoui moved this task from Ready (WIP:5) to Doing(WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.Mar 31 2020, 9:15 PM

• nnikkhoui moved this task from Doing(WIP:5) to Ready (WIP:5) on the Platform Team Workboards (Clinic Duty Team) board.Apr 1 2020, 6:37 PM

• AMooney removed • nnikkhoui as the assignee of this task.Apr 1 2020, 6:48 PM

• AMooney added a subscriber: • nnikkhoui.

Peter, Can you take a look at this? If needed work with Brad :)

Talked with @Anomie and he brought up a good point that it may be worth it to implement PoolCounter for all requests, regardless of whether or not it bypasses ParserCache. That would lead to less complicated/case-by-case code and even if the request does hit ParserCache it may still be a cache miss and it would have to do the full slower parse anyway.

@Anomie, @nnikkhoui,

Do I understand correctly that PoolWorkArticleView (inherited from PoolCounterWork) is the main point responsible for pooling all king of requests? So there's no way to bypass it.

	public function execute( $skipcache = false ) {
		if ( $this->cacheable && !$skipcache ) {
			$status = $this->poolCounter->acquireForAnyone();
		} else {
			$status = $this->poolCounter->acquireForMe();
		}

And then is should be possible to verride nested counter to make it more strick for User/IP addresses?
Do I understand this structure correctly?

PoolWorkArticleView is specifically for article views, that probably wouldn't quite work for this code path as it also needs to parse arbitrary wikitext.

It would probably be better to use PoolCounterWorkViaCallback than to try to use PoolCounter directly, as in the latter case you'd probably have to duplicate all the logic from PoolCounterWork::execute() anyway.

You don't need to worry about handling the concurrency limits yourself, PoolCounter does that for you as long as you set up your keys properly. In general, your code just needs to look something like this:

$work = new PoolCounterWorkViaCallback( 'Key-for-$wgPoolCounterConf', $poolKey, [
    'doWork' => function () use ( $whatever ) {
        /* Expensive code to parse things and generate $p_result */
        return $p_result;
    },
    'error' => function () {
        $this->dieWithError( 'apierror-concurrency-limit' );
    },
] );
$p_result = $work->execute();

The 'Key-for-$wgPoolCounterConf' is just that, the key in $wgPoolCounterConf that specifies the limits to apply. Those would get set in Wikimedia's configuration, something like rOMWC8274bc4f3b29: Add PoolCounter configuration for Special:Contributions. $poolKey identifies the concurrency "pool" to draw from for this request; in this case it would contain the user ID or IP address, plus whatever constants are needed to not mix this with any other per-user pools. The doWork callback is called if the concurrency limit hasn't been reached, while error is called if it has.

You might also look at rMWf3819b6e2ecb: SpecialContributions: Use PoolCounter to limit concurrency as an example, as that added a per user/IP concurrency limit to a different code path.

Change 586322 had a related patch set uploaded (by Peter.ovchyn; owner: Peter.ovchyn):
[mediawiki/core@master] WIP: Add PoolCounterWork at the top of api entry point

https://gerrit.wikimedia.org/r/586322

gerritbot added a project: Patch-For-Review.Apr 6 2020, 9:58 AM

• Nikerabbit subscribed.Apr 6 2020, 10:01 AM

@Anomie,
I've pushed temporary solutions. I don't think this is 100% good solution though.

There's a hook called before pooling starts working:
As long as there's pool exceeds the limit no need to call it.

Hooks::run( 'ApiBeforeMain', [ &$processor ] );

Placing all that logic inside the PoolCounter makes code quite complicated and untestable. I'm not sure it's a good option. In this case, it would be good to create another class and write tests on it, but ApiMain and ApiBase exist.

There's another option to put pooling under ApiMain::execute function. But this makes possible to override it in the hook.
So I doubt about best/simplest/reliable options here.

Neither api.php nor ApiMain are the locations being asked for. This task is asking for a narrower scope, concurrency for the parse in ApiParse.

Change 586322 merged by jenkins-bot:
[mediawiki/core@master] api: Wrap getParserOutput by PoolCounterWork in ApiParse module

https://gerrit.wikimedia.org/r/586322

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.28; 2020-04-14).Apr 9 2020, 5:00 PM

Next step is to determine the limits and implement them for Wikimedia sites in a patch similar to rOMWC8274bc4f3b29: Add PoolCounter configuration for Special:Contributions.

'ApiParser' => [
    'class' => 'PoolCounter_Client',
    'timeout' => 15,
    'workers' => 3,
    'maxqueue' => ???,
],

A timeout of 15 seems sensible to match the timeout on ArticleView. 3 workers is (I think) what @Joe suggested in the task description. I don't know what to choose for maxqueue. It probably doesn't need to be too large because this is per-user.

• Pchelolo closed this task as Resolved.Apr 15 2020, 4:13 PM

• Pchelolo moved this task from Ready (WIP:5) to Done on the Platform Team Workboards (Clinic Duty Team) board.

daniel closed subtask T249531: Improve PoolCounterWork logic to cover possible raised exceptions as Resolved.Apr 16 2020, 12:14 PM

Krinkle edited projects, added Sustainability (Incident Followup); removed Wikimedia-Incident.Apr 28 2020, 9:50 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:40 PM

Stang unsubscribed.Nov 13 2021, 8:42 PM