Page MenuHomePhabricator

limit page creation and edit rate on Wikidata
Closed, ResolvedPublic2 Estimated Story Points

Description

We've repeatedly had issues because too many pages were being created and/or edited in a short time. We need to limit this to keep the infrastructure sane.

Why is this a problem?

When people create or edit items too fast on Wikidata it causes problems in various parts of the infrastructure as well as strains on the social system of Wikidata:

  • dispatching changes to the clients (Wikipedias etc) so they show up there in recent changes and watchlist is delayed
  • job queue on the clients gets overloaded due to page purges and reparsings
  • the recent changes table on the clients grows too big
  • the replication lag between the database servers grows to unacceptable sizes
  • the query service misses updates
  • assignment of new entity IDs gets locked
  • ORES scoring can't keep up
  • editors can't keep up with the amount of changes happening and meaningfully review/maintain them

Current monitoring

  • replication lag (already taken into account by a lot of tools/bots)
  • dispatch lag (often not taken into account yet)

TODO

  • limit max page creation rate to 40/min per account
  • limit max edit rate to 80/min per account

Next steps

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I personally would like to strip the right from the bots and enforce a rate limit for them as they just don't care about what we say.

I personally would like to strip the right from the bots and enforce a rate limit for them as they just don't care about what we say.

If bots ignore replag ( https://meta.wikimedia.org/wiki/Bot_policy#Edit_throttle_and_peak_hours ) these bots should just be stripped of the botflag. The good shouldn't suffer because of a small number of problematic users.

Whats the point of noratelimit if admins and bots are rate limited?

However, yes. I agree with everyone that there should be some controllable edit rate.
This is a random suggestion, but feel free to ignore.
Perhaps, for "everyone", there is an enforced 100/min edit rate. For admins and "trusted users", 100/min is default since they KNOW not to do it and thus they get an option to set it higher.

I also suggest a co-share agreement between Magnus and WMF team due to [[Bus factor]].

If bots ignore replag ( https://meta.wikimedia.org/wiki/Bot_policy#Edit_throttle_and_peak_hours ) these bots should just be stripped of the botflag. The good shouldn't suffer because of a small number of problematic users.

The issue is that the infrastructure limits that we are hitting are not linked to maxlag in any way.

Whats the point of noratelimit if admins and bots are rate limited?

We can always have multiple limits.
Also noratelimit is part of mediawiki, we didn't create it for wikibase, but we are now finding out that perhaps it is a bad idea to have this on Wikidata.

However, yes. I agree with everyone that there should be some controllable edit rate.
This is a random suggestion, but feel free to ignore.
Perhaps, for "everyone", there is an enforced 100/min edit rate. For admins and "trusted users", 100/min is default since they KNOW not to do it and thus they get an option to set it higher.

I think a blanket limit should work for everyone, unless we can come up with a concrete reason that any group would need to edit faster.

I also suggest a co-share agreement between Magnus and WMF team due to [[Bus factor]].

I don't really understand this comment

I personally would like to strip the right from the bots and enforce a rate limit for them as they just don't care about what we say.

If bots ignore replag ( https://meta.wikimedia.org/wiki/Bot_policy#Edit_throttle_and_peak_hours ) these bots should just be stripped of the botflag. The good shouldn't suffer because of a small number of problematic users.

Too many edits in too small amount of time can affect infra in several ways, one is replag, one is dispatch lag, one is jobqueue size. If edit is about creating new items it can also affect the infra in some other ways too as we want to be consistent about Q-ids and do not assign same Q-ids to two items.

Overall, if community stick to a rule of thumb like *max* 60 edits / min. and *max* 10 new items/min. and community actually enforces that (and don't wait until the whole infra falls down and we ping them so they stop), there wouldn't be a need to have such ticket in here in the first place but this continues to happen all the time.

I personally would like to strip the right from the bots and enforce a rate limit for them as they just don't care about what we say.

If bots ignore replag ( https://meta.wikimedia.org/wiki/Bot_policy#Edit_throttle_and_peak_hours ) these bots should just be stripped of the botflag. The good shouldn't suffer because of a small number of problematic users.

Too many edits in too small amount of time can affect infra in several ways, one is replag, one is dispatch lag, one is jobqueue size. If edit is about creating new items it can also affect the infra in some other ways too as we want to be consistent about Q-ids and do not assign same Q-ids to two items.

Overall, if community stick to a rule of thumb like *max* 60 edits / min. and *max* 10 new items/min. and community actually enforces that (and don't wait until the whole infra falls down and we ping them so they stop), there wouldn't be a need to have such ticket in here in the first place but this continues to happen all the time.

It's basically Widar (quick statements) that is going to fast, see last comment at https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat&oldid=655772082#Rate_speed_for_QuickStatements on how to fix that. Or are you aware of any other tools that are going to fast. Nuclear option is to disable Widar and enable it again after the speed issues have been fixed.

There are other tools that are going too fast like fatameh as well. This isn't something we should have to correct on a tool-by-tool basis.

There are other tools that are going too fast like fatameh as well. This isn't something we should have to correct on a tool-by-tool basis.

That's what you getting from being a bit more liberal for a long time. Tool rights have to be approved, bot rights have to be approved, all can be revoked when needed. If this really is a problem, take action or clearly identify the problematic tools/bots so admins like me can take action.

It seems Magnus is the only person who maintains the tool. The suite of tools is a set of HUGE amount of work and responsibilties, and not an easy one to fix.
Thus I'm asking if Magnus could share with WMF so that someone in the dev team could work on it..... in the event Magnus decides to not work pk the tools or move on with his life.
If it is shared, then forget what I said.

Thanks to @Multichill for pointing out this discussion to me...

As for [[Bus factor]], I have always welcomes co-maintainers on my tools, and have repeatedly suggested WMF involvement, anywhere from taking over/rewriting to just becoming co-maintainers of important tools. (Others are welcome too, bu I understand why other volunteers don't want to add themselves the burden of co-maintaining).

Now, for the original post: Does this concern just item creation, or any kind of edit? I am quite willing to limit the big guns (in fact, I already have some of them limited) if WMF hard-/software are not up to adding a few rows to the database... ;-)

I have added a 5 second delay after item creations. I also added a mechanism to have a delay after any other edit type.

Note that this limits single threads only. QuickStatements, various SourceMD webtools and bots, Mix'n'match sync etc. can still add up. Also, this does not effect my JavaScript tools on Wikidata proper.

Edit: Rate limiter diff

Back to the topic at hand. It requests a maximum rate limit for quickstatments to be limited to be 100 edits per minute. Can you do that?

For one tool, per user, per tab? Sure. I'd just set both item creation and edit to 1 second delay.

For one tool, all users? No. At least, not easily.

Question: WMF knows best when their servers are overloaded. Why not just add a delay for the API response when the servers are under duress? No need for every volunteer to scramble and sink time into altering their tools to preserve the WMF from users trying to get stuff done. I imagine adding something like 0.2 or 0.5 seconds sleep to *every* API edit when under duress might ease the load quickly and quietly. Why make WMFs problem my problem without need?

I have added a 5 second delay after item creations. I also added a mechanism to have a delay after any other edit type.

Note that this limits single threads only. QuickStatements, various SourceMD webtools and bots, Mix'n'match sync etc. can still add up. Also, this does not effect my JavaScript tools on Wikidata proper.

Edit: Rate limiter diff

Thanks for that. As for the edit delay. Sleep is always in seconds and the subsecond sleep functions all seem to have issues. I suggest you use rand() to sleep a second every n edits. That way you can do something between no throttle and waiting a second after every edit.

I like the suggestion of adding a load-dependent server-side delay for API edits.

this guy artix waits for the rate limit to implemented so he can go back and finish is batch job.

OK, let me summarize my situation, as of today:

  • Essentially, all my Wikidata edits (both bots and JavaScript on Toolforge) run through the same OAuth PHP class I wrote
  • Every single edit (including "create item") runs with a "maxlag=5" parameter.
  • If the edit fails because of maxlag, it will wait 5 seconds, then try again, until it succeeds.
  • This mechanism has been in place for months.
  • As of today, after every edit (including "create item"), it will check the current lag on Wikidata.
  • If the current lag is >1 sec, it will sleep three times the lag (in int seconds; so, if lag is 1.5 sec, it will sleep int(3*1.5)=4 seconds) before returning.
  • If that check fails for some reason, it will sleep a pre-defined time for each edit type (create item=2sec, edit=1 sec)

That is a total of three different rate limitations (two dynamic, one hardcoded fallback). Personally, I consider that beyond due diligence, verging on paranoia. Therefore, I consider this matter resolved, as far as my tools are concerned.

Hi Magnus,

That is a total of three different rate limitations (two dynamic, one hardcoded fallback). Personally, I consider that beyond due diligence, verging on paranoia. Therefore, I consider this matter resolved, as far as my tools are concerned.

All three of them are lag based.

Too many edits in too small amount of time can affect infra in several ways, one is replag, one is dispatch lag, one is jobqueue size. If edit is about creating new items it can also affect the infra in some other ways too as we want to be consistent about Q-ids and do not assign same Q-ids to two items.

If I understand this correctly, the lag can be low, but the systems can still be overloaded.

So,

  • either WMF expose the "overload status" so we can throttle on demand,
  • or WMF delay the API response accordingly, so we don't have to do anything,
  • or WMF scale to demand :-)

So,

  • either WMF expose the "overload status" so we can throttle on demand,
  • or WMF delay the API response accordingly, so we don't have to do anything,
  • or WMF scale to demand :-)

Yep. WMF or WMDE I guess. @Lydia_Pintscher can you discus the best approach with your team and file the relevant bugs?

So can i get back to running QuickStatement batches?

EDIT: Oh well. I'll come back to this whenever this task gets finished. Sometime in the, rather distant, future.

That's what you getting from being a bit more liberal for a long time. Tool rights have to be approved, bot rights have to be approved, all can be revoked when needed. If this really is a problem, take action or clearly identify the problematic tools/bots so admins like me can take action.

In the case of fatemah, the tool isn't a problem really, well, it only allows a user to create a single item with a single request.
In this case it is the users that are calling it too much / too quickly etc.

Question: WMF knows best when their servers are overloaded. Why not just add a delay for the API response when the servers are under duress? No need for every volunteer to scramble and sink time into altering their tools to preserve the WMF from users trying to get stuff done. I imagine adding something like 0.2 or 0.5 seconds sleep to *every* API edit when under duress might ease the load quickly and quietly. Why make WMFs problem my problem without need?

I like the suggestion of adding a load-dependent server-side delay for API edits.

Eww, that sounds evil. And it would actually result in longer running / more PHP processes on said servers.

So,

  • either WMF expose the "overload status" so we can throttle on demand,
  • or WMF delay the API response accordingly, so we don't have to do anything,
  • or WMF scale to demand :-)

I'm rather pro option 1 here and I believe I have discussed this on another ticket relating to maxlag vs dispatch lag and exposing these in a single API.

For example:

curl https://www.wikidata.org/w/api.php?action=wblag

{
  "lag": {
    "overall" : {
      "isLagged" : 1
    },
    "types": {
      "db-replication": {
        "raw": 5,
        "isLagged": 1
      },
      "dispatch-clients": {
        "raw": 60,
        "isLagged": 0
      },
      "manual": {
        "raw": "Large job queue",
        "isLagged": 1
      }
    }
  }
}

Something like this could also include suggested edit rates for clients / tools to abide by.

The other related ticket I had in mind was T48910

@Addshore It depends on what the actual issue with creating/editing is. If it's database-related (most likely), then having a few sleeping processes for load-dependent server-side delays might be a viable solution.

I like the single-shot API request, but additionally I'd like a notification in the edit reply about server-side issues, as is done for replag at the moment. That could save me the additional API query before.

I like the single-shot API request, but additionally I'd like a notification in the edit reply about server-side issues, as is done for replag at the moment. That could save me the additional API query before.

Should be possible to to add something like wblag=1 in the same way we have maxlag=1 IMO.

Also in terms of Mediawiki, perhaps it would be a good time to introduce other types of lag / reasons to slow down to core.

If we make sysop and bot subject to rate limits, the "user"-limits from the following apply (unless we set a higher limit for them specifically):

1$ mwscript eval.php --wiki wikidatawiki
2> print_r($wgRateLimits);
3
4Array
5(
6 [move] => Array
7 (
8 [newbie] => Array
9 (
10 [0] => 2
11 [1] => 120
12 )
13
14 [user] => Array
15 (
16 [0] => 8
17 [1] => 60
18 )
19
20 )
21
22 [edit] => Array
23 (
24 [ip] => Array
25 (
26 [0] => 8
27 [1] => 60
28 )
29
30 [newbie] => Array
31 (
32 [0] => 8
33 [1] => 60
34 )
35
36 )
37
38 [badcaptcha] => Array
39 (
40 [ip] => Array
41 (
42 [0] => 15
43 [1] => 60
44 )
45
46 [newbie] => Array
47 (
48 [0] => 15
49 [1] => 60
50 )
51
52 [user] => Array
53 (
54 [0] => 30
55 [1] => 60
56 )
57
58 )
59
60 [mailpassword] => Array
61 (
62 [ip] => Array
63 (
64 [0] => 5
65 [1] => 3600
66 )
67
68 )
69
70 [emailuser] => Array
71 (
72 [ip] => Array
73 (
74 [0] => 5
75 [1] => 86400
76 )
77
78 [newbie] => Array
79 (
80 [0] => 5
81 [1] => 86400
82 )
83
84 [user] => Array
85 (
86 [0] => 20
87 [1] => 86400
88 )
89
90 )
91
92 [rollback] => Array
93 (
94 [user] => Array
95 (
96 [0] => 10
97 [1] => 60
98 )
99
100 [newbie] => Array
101 (
102 [0] => 5
103 [1] => 120
104 )
105
106 [rollbacker] => Array
107 (
108 [0] => 100
109 [1] => 60
110 )
111
112 )
113
114 [purge] => Array
115 (
116 [ip] => Array
117 (
118 [0] => 30
119 [1] => 60
120 )
121
122 [user] => Array
123 (
124 [0] => 30
125 [1] => 60
126 )
127
128 )
129
130 [linkpurge] => Array
131 (
132 [ip] => Array
133 (
134 [0] => 30
135 [1] => 60
136 )
137
138 [user] => Array
139 (
140 [0] => 30
141 [1] => 60
142 )
143
144 )
145
146 [renderfile] => Array
147 (
148 [ip] => Array
149 (
150 [0] => 700
151 [1] => 30
152 )
153
154 [user] => Array
155 (
156 [0] => 700
157 [1] => 30
158 )
159
160 )
161
162 [renderfile-nonstandard] => Array
163 (
164 [ip] => Array
165 (
166 [0] => 70
167 [1] => 30
168 )
169
170 [user] => Array
171 (
172 [0] => 70
173 [1] => 30
174 )
175
176 )
177
178 [cxsave] => Array
179 (
180 [ip] => Array
181 (
182 [0] => 10
183 [1] => 30
184 )
185
186 [user] => Array
187 (
188 [0] => 10
189 [1] => 30
190 )
191
192 )
193
194 [urlshortcode] => Array
195 (
196 [ip] => Array
197 (
198 [0] => 10
199 [1] => 120
200 )
201
202 [newbie] => Array
203 (
204 [0] => 10
205 [1] => 120
206 )
207
208 [user] => Array
209 (
210 [0] => 50
211 [1] => 120
212 )
213
214 )
215
216 [thanks-notification] => Array
217 (
218 [user] => Array
219 (
220 [0] => 10
221 [1] => 60
222 )
223
224 )
225
226 [badoath] => Array
227 (
228 [&can-bypass] =>
229 [user] => Array
230 (
231 [0] => 10
232 [1] => 60
233 )
234
235 )
236
237)
238

Potential problems I could see:

  • move 8 per minute (although not relevant to entity namespaces),
  • emailuser 20 per day (can be problematic when reaching out to users)
  • rollback 10 per minute -> should probably be set equal to (or slightly higher than) the edit limits
  • purge 30 per minute -> maybe set equal to edit limits?
  • linkpurge 30 per minute -> maybe set equal to edit limits?

Note: We currently also have dispatch problems while no one is going faster than 75 edits per minute (as far as I can tell)

Actually, it seems we could also have only specific limits that are not bypassable. While this is not documented (as far as I can tell), one can add '&can-bypass' => false to a specific rate limit action, like the following:

> print_r($wgRateLimits);

Array
( 
…
    [badoath] => Array
        (
            [&can-bypass] => 
            [user] => Array
                (
                    [0] => 10
                    [1] => 60
                )

        )

)

(for details, see User::pingLimiter or bc6e4d008216411958f285c831cb557a0804dd00)

That would mean that we can limit only the number of edits… if we find reasonable numbers.

@hoo, @Ladsgroup and I sat down together and talked it through. I updated the task description accordingly with what I believe is the minimum requirement to keep the infrastructure sane.

hoo changed the task status from Stalled to Open.Apr 14 2018, 10:15 AM

Change 427156 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Limit page creation and edit rate on Wikidata

https://gerrit.wikimedia.org/r/427156

Change 427156 merged by jenkins-bot:
[operations/mediawiki-config@master] Limit page creation and edit rate on Wikidata

https://gerrit.wikimedia.org/r/427156

Mentioned in SAL (#wikimedia-operations) [2018-04-18T13:16:53Z] <ladsgroup@tin> Synchronized wmf-config/InitialiseSettings.php: [[gerrit:427156|Limit page creation and edit rate on Wikidata (T184948)]] (duration: 01m 17s)

Did we announce this limit to the community?

I have added some lines to the graph on grafana representing these limits.

image.png (262×954 px, 41 KB)

Did we announce this limit to the community?

I don't think so.

I have added some lines to the graph on grafana representing these limits.

image.png (262×954 px, 41 KB)

Why is there a line at 60 edits per minute? Creation is at 40 per minute and edits are at 80 per minute.

Why is there a line at 60 edits per minute? Creation is at 40 per minute and edits are at 80 per minute.

That would be because I miss read the patch *fixes now*

Léa will post a note on project chat.

The main issue I see with the hard limit of 40 creations/minute and 80 edits/minute is that it prevents bots from speeding up during quieter times of each day.

RFC 6585 provides the "429 Too Many Requests" HTTP status code (see https://tools.ietf.org/html/rfc6585#page-3)

The API could return a 429 status code if the server determines that the edit rate is too high, including a "Retry-After: x" header in the response to make bots pause for a few seconds. This removes the burden on the bots to do anything other than check for a 429 status code and check the "Retry-After: x" header. The bots no longer have to decide what lag/delays are acceptable at a given time of the day, and don't need to be updated each time the hard limits are changed.

I don't like this "solution" at all, this is more like a work around to prevent more incidents instead of solving the real problem. Re-opening because:

  • I don't see a link to where the documentation is updated on Wikidata. Probably https://www.wikidata.org/wiki/Wikidata:Bots should be updated
  • How do I test this in a sandbox way to figure out my code handles this properly. For replag I can send a dummy value, but how do I do this here?
  • Third and most important: This is not a real solution, but a work around.

@Multichill agreed. The replag solution works AFAICT, why not duplicate that.

1 to using replag based solutions instead of artificial rate limits. I suspect that most bots do not have proper rate limit handling because the expectation is that they don't run into them. I would suggest implementing something like rMWde9f9bda7db9: API: Optionally include in job queue size in maxlag, using whatever metrics Wikidata looks for to see if things are overloaded.

@Multichill let's look at the remaining issues at the hackathon together. I've put it on my list of things to discuss.

I think we can close this now. We are working on the ticket for including the dispatch lag into maxlag. Once that's done we can reduce the limit from admins and bots.

Change 408629 abandoned by Ladsgroup:
Add edit and create rate limit for wikidatawiki

Reason:
Duplicate

https://gerrit.wikimedia.org/r/408629