Page MenuHomePhabricator

Statement GUIDs should not appear in AbuseFilter text for Wikibase
Closed, ResolvedPublic

Description

Abusefilter can make editing wikibases with large entities slow, see T205252.
Some investigation along with the community was done in T205254 and it was decided that statement GUIDs do not need to appear in abusefilter output.

Statement guids currently use the "id" key in JSON output which is currently how abuse filter text is generated and filtered.
This "id" field is very generic and will also match a bunch of other stuff in the JSON that we are not sure about.
Thus something needs to change regarding how item and property abusefilter text is collected and or filtered.
Either:

  • Rather than starting with everything and then filtering down, we selectively a pick out things to add to the text? (might end up removing more than we intend? that might not be a bad thing? Just add back what the community want?)
  • Improve our method of filtering, possibly allowing fileting different keys and different levels / down different paths in the JSON tree?

Whatever filtering and collection of values occurs, it must be very performant.
We have had speed issues in this bit of code before, (previously fixed up in T204109)

This is technically a breaking change, so this should be announced.

To Do

An example of lines that will be removed:
From: https://www.wikidata.org/wiki/Special:AbuseLog/8313111

BeforeAfter
1Q47651066
2Correction to: Stigma in the context of pregnancy termination after diagnosis of fetal anomaly: associations with grief, trauma, and depression.
3Correction to: Stigma in the context of pregnancy termination after diagnosis of fetal anomaly: associations with grief, trauma, and depression.
4scientific article published on 20 January 2018
5επιστημονικό άρθρο
6scienca artikolo
72018年学术文章
8научни чланак
92018年学术文章
10bài báo khoa học
11naučni članak
12articolo scientifico
132018年學術文章
14مقالة علمية نشرت في 20 يناير 2018
15artigo científico
16мақолаи илмӣ
17vědecký článek
18teaduslik artikkel
19artigo científico
20artikel ilmiah
21artículo científico publicado en 2018
22scientific article published on 20 January 2018
23научная статья
24מאמר מדעי
25wetenschappelijk artikel
26artigo científico
272018年學術文章
28vitenskapelig artikkel
29bilimsel makale
302018年学术文章
31artikulong pang-agham
32บทความทางวิทยาศาสตร์
33articol științific
34article científic
35artykuł naukowy
36article scientifique
37научна статия
38artículu científicu
392018年学术文章
40২০ জানুয়ারি ২০১৮-এ প্রকাশিত বৈজ্ঞানিক নিবন্ধ
41wissenschaftlicher Artikel
422018年学术文章
432018년 논문
44videnskabelig artikel
45tieteellinen artikkeli
462018年學術文章
47tudományos cikk
482018年の論文
49scientific article published on 20 January 2018
50სამეცნიერო სტატია
51vitskapeleg artikkel
522018年学术文章
53научни чланак
54artikull shkencor
552018 nî lūn-bûn
56article scientific
57vetenskaplig artikel
582018年學術文章
59vedecký článok
60наукова стаття, опублікована в січні 2018
612018年學術文章
62value
63P1433
642e640472f6ed7844962282af69903e8c80ce0f63
65item
6615753703
67Q15753703
68Q47651066$2E9A3FC0-16E0-4BF3-9CEE-6EE2FD6FCA32
69normal
7067021b4782ed53d58fc65264cb364c1df5ccb660
71value
72P248
7393747a24d4fc614d1eb34cac855a7d80bbb04057
74item
755412157
76Q5412157
77value
78P698
7983994c768f7169229d1c20e377309851ca8e249d
8029353329
81value
82P813
832a6b20e16a61e534b2588188b5b2616ec8acca6b
84 2018-02-04T00:00:00Z
850
860
870
8811
89http://www.wikidata.org/entity/Q1985727
90value
91P854
9254ade566da1d50cc3a5da95523c200357498fe50
93http://europepmc.org/abstract/MED/29353329
94P248
95P698
96P813
97P854
98value
99P577
1005b879cc41f62ecdcdf68d36963f021d5c1a056d3
101 2018-01-20T00:00:00Z
1020
1030
1040
10511
106http://www.wikidata.org/entity/Q1985727
107Q47651066$B64F24EE-367D-48E8-92D3-5AF6E0813334
108normal
10967021b4782ed53d58fc65264cb364c1df5ccb660
110value
111P248
11293747a24d4fc614d1eb34cac855a7d80bbb04057
113item
1145412157
115Q5412157
116value
117P698
11883994c768f7169229d1c20e377309851ca8e249d
11929353329
120value
121P813
1222a6b20e16a61e534b2588188b5b2616ec8acca6b
123 2018-02-04T00:00:00Z
1240
1250
1260
12711
128http://www.wikidata.org/entity/Q1985727
129value
130P854
13154ade566da1d50cc3a5da95523c200357498fe50
132http://europepmc.org/abstract/MED/29353329
133P248
134P698
135P813
136P854
137value
138P1476
139f93c66676847a0a627ba05c8ee7c960cc025f81a
140Correction to: Stigma in the context of pregnancy termination after diagnosis of fetal anomaly: associations with grief, trauma, and depression.
141Q47651066$89D75FB5-25E0-4789-8433-EDE17850D835
142normal
14367021b4782ed53d58fc65264cb364c1df5ccb660
144value
145P248
14693747a24d4fc614d1eb34cac855a7d80bbb04057
147item
1485412157
149Q5412157
150value
151P698
15283994c768f7169229d1c20e377309851ca8e249d
15329353329
154value
155P813
1562a6b20e16a61e534b2588188b5b2616ec8acca6b
157 2018-02-04T00:00:00Z
1580
1590
1600
16111
162http://www.wikidata.org/entity/Q1985727
163value
164P854
16554ade566da1d50cc3a5da95523c200357498fe50
166http://europepmc.org/abstract/MED/29353329
167P248
168P698
169P813
170P854
171value
172P698
17383994c768f7169229d1c20e377309851ca8e249d
17429353329
175Q47651066$9648204A-434B-450A-8330-04041EAB1E0B
176normal
17767021b4782ed53d58fc65264cb364c1df5ccb660
178value
179P248
18093747a24d4fc614d1eb34cac855a7d80bbb04057
181item
1825412157
183Q5412157
184value
185P698
18683994c768f7169229d1c20e377309851ca8e249d
18729353329
188value
189P813
1902a6b20e16a61e534b2588188b5b2616ec8acca6b
191 2018-02-04T00:00:00Z
1920
1930
1940
19511
196http://www.wikidata.org/entity/Q1985727
197value
198P854
19954ade566da1d50cc3a5da95523c200357498fe50
200http://europepmc.org/abstract/MED/29353329
201P248
202P698
203P813
204P854
205value
206P31
20729465f78f13add11b617f0de4ade56cd1122c19c
208item
20913442814
210Q13442814
211Q47651066$977B3B9A-21AB-426A-8C02-D8A5EEF8BA29
212normal
21367021b4782ed53d58fc65264cb364c1df5ccb660
214value
215P248
21693747a24d4fc614d1eb34cac855a7d80bbb04057
217item
2185412157
219Q5412157
220value
221P698
22283994c768f7169229d1c20e377309851ca8e249d
22329353329
224value
225P813
2262a6b20e16a61e534b2588188b5b2616ec8acca6b
227 2018-02-04T00:00:00Z
2280
2290
2300
23111
232http://www.wikidata.org/entity/Q1985727
233value
234P854
23554ade566da1d50cc3a5da95523c200357498fe50
236http://europepmc.org/abstract/MED/29353329
237P248
238P698
239P813
240P854
241value
242P31
2431a003b7a0b2b7e19a41ce4cf809980743ca194ba
244item
2451348305
246Q1348305
247Q47651066$A18CFBBA-74C8-4915-ACF5-DD5A9FA64ED1
248normal
249value
250P356
251a1e12eb7df5d93d202b7ad1ce5f447ef219b6ee8
25210.1007/S00737-018-0811-8
253Q47651066$06A97D9C-EEC7-443C-8FE5-344F98B9DDB1
254normal
25567021b4782ed53d58fc65264cb364c1df5ccb660
256value
257P248
25893747a24d4fc614d1eb34cac855a7d80bbb04057
259item
2605412157
261Q5412157
262value
263P698
26483994c768f7169229d1c20e377309851ca8e249d
26529353329
266value
267P813
2682a6b20e16a61e534b2588188b5b2616ec8acca6b
269 2018-02-04T00:00:00Z
2700
2710
2720
27311
274http://www.wikidata.org/entity/Q1985727
275value
276P854
27754ade566da1d50cc3a5da95523c200357498fe50
278http://europepmc.org/abstract/MED/29353329
279P248
280P698
281P813
282P854
283value
284P2093
285b41c9fdd94f6439efa4d1565475634e1193fa94d
286Publisher
287value
288P1545
2892a1ced1dca90648ea7e306acbadd74fc81a10722
2901
291P1545
292Q47651066$149DB201-63F3-4271-BCB2-7A4D5530870F
293normal
29467021b4782ed53d58fc65264cb364c1df5ccb660
295value
296P248
29793747a24d4fc614d1eb34cac855a7d80bbb04057
298item
2995412157
300Q5412157
301value
302P698
30383994c768f7169229d1c20e377309851ca8e249d
30429353329
305value
306P813
3072a6b20e16a61e534b2588188b5b2616ec8acca6b
308 2018-02-04T00:00:00Z
3090
3100
3110
31211
313http://www.wikidata.org/entity/Q1985727
314value
315P854
31654ade566da1d50cc3a5da95523c200357498fe50
317http://europepmc.org/abstract/MED/29353329
318P248
319P698
320P813
321P854
322value
323P921
3244eab4be9a8789ca49ec835dbcce2521cee14bbe0
325item
3261026040
327Q1026040
328Q47651066$67400AA6-F04E-43A0-992D-1E56DC44D643
329normal
1Correction to: Stigma in the context of pregnancy termination after diagnosis of fetal anomaly: associations with grief, trauma, and depression.
2Correction to: Stigma in the context of pregnancy termination after diagnosis of fetal anomaly: associations with grief, trauma, and depression.
3scientific article published on 20 January 2018
4επιστημονικό άρθρο
5scienca artikolo
62018年学术文章
7научни чланак
82018年学术文章
9bài báo khoa học
10naučni članak
11articolo scientifico
122018年學術文章
13مقالة علمية نشرت في 20 يناير 2018
14artigo científico
15мақолаи илмӣ
16vědecký článek
17teaduslik artikkel
18artigo científico
19artikel ilmiah
20artículo científico publicado en 2018
21scientific article published on 20 January 2018
22научная статья
23מאמר מדעי
24wetenschappelijk artikel
25artigo científico
262018年學術文章
27vitenskapelig artikkel
28bilimsel makale
292018年学术文章
30artikulong pang-agham
31บทความทางวิทยาศาสตร์
32articol științific
33article científic
34artykuł naukowy
35article scientifique
36научна статия
37artículu científicu
382018年学术文章
39২০ জানুয়ারি ২০১৮-এ প্রকাশিত বৈজ্ঞানিক নিবন্ধ
40wissenschaftlicher Artikel
412018年学术文章
422018년 논문
43videnskabelig artikel
44tieteellinen artikkeli
452018年學術文章
46tudományos cikk
472018年の論文
48scientific article published on 20 January 2018
49სამეცნიერო სტატია
50vitskapeleg artikkel
512018年学术文章
52научни чланак
53artikull shkencor
542018 nî lūn-bûn
55article scientific
56vetenskaplig artikel
572018年學術文章
58vedecký článok
59наукова стаття, опублікована в січні 2018
602018年學術文章
61value
62P1433
63item
6415753703
65Q15753703
66normal
67value
68P248
69item
705412157
71Q5412157
72value
73P698
7429353329
75value
76P813
77 2018-02-04T00:00:00Z
780
790
800
8111
82http://www.wikidata.org/entity/Q1985727
83value
84P854
85htP248
86P698
87P813
88P854
89value
90P577
91 2018-01-20T00:00:00Z
920
930
940
9511
96http://www.wikidata.org/entity/Q1985727
97normal
98value
99P248
100item
1015412157
102Q5412157
103value
104P698
10529353329
106value
107P813
108 2018-02-04T00:00:00Z
1090
1100
1110
11211
113http://www.wikidata.org/entity/Q1985727
114value
115P854
116htP248
117P698
118P813
119P854
120value
121P1476
122Correction to: Stigma in the context of pregnancy termination after diagnosis of fetal anomaly: associations with grief, trauma, and depression.
123normal
124value
125P248
126item
1275412157
128Q5412157
129value
130P698
13129353329
132value
133P813
134 2018-02-04T00:00:00Z
1350
1360
1370
13811
139http://www.wikidata.org/entity/Q1985727
140value
141P854
142htP248
143P698
144P813
145P854
146value
147P698
14829353329
149normal
150value
151P248
152item
1535412157
154Q5412157
155value
156P698
15729353329
158value
159P813
160 2018-02-04T00:00:00Z
1610
1620
1630
16411
165http://www.wikidata.org/entity/Q1985727
166value
167P854
168htP248
169P698
170P813
171P854
172value
173P31
174item
17513442814
176Q13442814
177normal
178value
179P248
180item
1815412157
182Q5412157
183value
184P698
18529353329
186value
187P813
188 2018-02-04T00:00:00Z
1890
1900
1910
19211
193http://www.wikidata.org/entity/Q1985727
194value
195P854
196htP248
197P698
198P813
199P854
200value
201P31
202item
2031348305
204Q1348305
205normal
206value
207P356
20810.1007/S00737-018-0811-8
209normal
210value
211P248
212item
2135412157
214Q5412157
215value
216P698
21729353329
218value
219P813
220 2018-02-04T00:00:00Z
2210
2220
2230
22411
225http://www.wikidata.org/entity/Q1985727
226value
227P854
228htP248
229P698
230P813
231P854
232value
233P2093
234Publisher
235value
236P1545
2371
238P1545
239normal
240value
241P248
242item
2435412157
244Q5412157
245value
246P698
24729353329
248value
249P813
250 2018-02-04T00:00:00Z
2510
2520
2530
25411
255http://www.wikidata.org/entity/Q1985727
256value
257P854
258htP248
259P698
260P813
261P854
262value
263P921
264item
2651026040
266Q1026040
267normal

Event Timeline

Addshore added a subscriber: Lydia_Pintscher.

Moving to needs work for some review by @Lydia_Pintscher and @alaa_wmde

Addshore triaged this task as Medium priority.Jun 22 2019, 10:51 PM

Change 524552 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Remove hashes from (Item|Property)Content::getTextForFilters()

https://gerrit.wikimedia.org/r/524552

Change 524554 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/WikibaseLexeme@master] Remove hashes from LexemeContent::getTextForFilters()

https://gerrit.wikimedia.org/r/524554

This "id" field is very generic and will also match a bunch of other stuff in the JSON that we are not sure about.

It drops entity id as well (and nothing else AFAIC), people can obtain it using page_title (except in MediaInfo but we can keep it in mediainfo and I'm not touching it anyway)

alaa_wmde renamed this task from Statement GUIDS should not appear in AbuseFilter text for Wikibase to Statement GUIDs should not appear in AbuseFilter text for Wikibase.Jul 23 2019, 12:22 PM
alaa_wmde updated the task description. (Show Details)
alaa_wmde added a subscriber: Lea_Lacroix_WMDE.

Change 524563 had a related patch set uploaded (by Alaa Sarhan; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Remove id from (Item|Property)Content::getTextForFilters()

https://gerrit.wikimedia.org/r/524563

Change 524552 had a related patch set uploaded (by Alaa Sarhan; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Remove hashes from (Item|Property)Content::getTextForFilters()

https://gerrit.wikimedia.org/r/524552

Change 524554 had a related patch set uploaded (by Alaa Sarhan; owner: Ladsgroup):
[mediawiki/extensions/WikibaseLexeme@master] Remove hashes from LexemeContent::getTextForFilters()

https://gerrit.wikimedia.org/r/524554

@Lea_Lacroix_WMDE This one needs announcement for the breaking change (change is approved but won't be merged before the announced release date per this announcement).

@Ladsgroup can you please help in drafting the announcement and/or checking the technical details in it?

@Lea_Lacroix_WMDE This one needs announcement for the breaking change (change is approved but won't be merged before the announced release date per this announcement).

@Ladsgroup can you please help in drafting the announcement and/or checking the technical details in it?

AFAIK, we already announced this. The release dates were put to August 6th which is wmf.17 and announced as such (so changes can be merged now).

Regarding this being a breaking change. The stable interface policy doesn't mention abusefilter output as either stable or unstable so I'm not sure which way we should go.

@Lea_Lacroix_WMDE This one needs announcement for the breaking change (change is approved but won't be merged before the announced release date per this announcement).

@Ladsgroup can you please help in drafting the announcement and/or checking the technical details in it?

AFAIK, we already announced this. The release dates were put to August 6th which is wmf.17 and announced as such (so changes can be merged now).

That's great thanks!

Regarding this being a breaking change. The stable interface policy doesn't mention abusefilter output as either stable or unstable so I'm not sure which way we should go.

As for this, do we anticipate lots of other code depending on this output to break? If we can be fairly sure there is almost nothing that will break, then it should be fine I think. If we cannot know (or we know things will break) then maybe it deserves a BC announcement?

Final call of course is to @Lea_Lacroix_WMDE of course, but I'd love to back up her decision with some data if possible ;)

As for this, do we anticipate lots of other code depending on this output to break? If we can be fairly sure there is almost nothing that will break, then it should be fine I think. If we cannot know (or we know things will break) then maybe it deserves a BC announcement?

Thanks @Ladsgroup for clarifying the situation .. in the linked announcement there's details about the fixes to all scripts that would break after this change. So I think we are not expecting anything to break anymore due to this change and a BC can probably be skipped. @Lea_Lacroix_WMDE just waiting for your confirmation on that before merging.

Yep, I think it's fine the way it is now (I announced it, with a reasonable delay for people to fix the things, people reacted positively).
However, for the future, we should take a decision about if the abusefilter output is stable or unstable, so we don't run into this unclear situation again.

However, for the future, we should take a decision about if the abusefilter output is stable or unstable, so we don't run into this unclear situation again.

Yeap, I'll schedule smth to have discussion and decide on that. Thank you!

Change 524554 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Remove hashes from LexemeContent::getTextForFilters()

https://gerrit.wikimedia.org/r/524554

Change 524552 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Remove hashes from (Item|Property)Content::getTextForFilters()

https://gerrit.wikimedia.org/r/524552

Change 524563 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Remove id from (Item|Property)Content::getTextForFilters()

https://gerrit.wikimedia.org/r/524563

Discussions are still happening on the talk page. I announced a deployment on August 6th but we may have to delay it if some questions are not properly answered.

Discussions are still happening on the talk page. I announced a deployment on August 6th but we may have to delay it if some questions are not properly answered.

The only thing that seems needed as an example. I just put an example in this ticket for all of the removals. Does this work for you?

I believe this one was deployed with:

19:36 	<brennen@deploy1001> 	Synchronized php: group1 wikis to 1.34.0-wmf.17 (duration: 00m 54s)

If so no real impact has been immediately evident while looking at https://grafana.wikimedia.org/d/000000615/wikibase-editentity

But while looking at the profiling for abusefilter itself at the time of deployment it appears that we can see the impact.

image.png (277×1 px, 38 KB)

I believe this one was deployed with:

19:36 	<brennen@deploy1001> 	Synchronized php: group1 wikis to 1.34.0-wmf.17 (duration: 00m 54s)

If so no real impact has been immediately evident while looking at https://grafana.wikimedia.org/d/000000615/wikibase-editentity

But while looking at the profiling for abusefilter itself at the time of deployment it appears that we can see the impact.

image.png (277×1 px, 38 KB)

Yeah I know, I think the reason is that the graphs are in log scale and this change is definitely not big enough to make an impact on orders of magnitude. Testing it without log scale is not super useful given the amount of noise and jumps and outliers. I think if we make a daily moving average of the p95, it would be more visible.

I think the reason is that the graphs are in log scale

can't see where they are in log scale, they are all in time or percentage here https://grafana.wikimedia.org/d/000000615/wikibase-editentity?orgId=1

was this much bigger effort than the actual impact, you think? can be very useful learning for later ;)
as far as I can tell looking at profiling dashboard, there aren't noticeable drops

I think the reason is that the graphs are in log scale

can't see where they are in log scale, they are all in time or percentage here https://grafana.wikimedia.org/d/000000615/wikibase-editentity?orgId=1

The top 7 panels y axis are all log base 10

was this much bigger effort than the actual impact, you think? can be very useful learning for later ;)
as far as I can tell looking at profiling dashboard, there aren't noticeable drops

The effort here from the coding side was minimal.
And the impact (If we manage to remove more things) has already been calculated in T204109#4602542

The issue is we need to tread lightly as in theory the things listed in the linked comment that we could remove could be used in a filter.
The first 2 tickets removing things are just the 2 right now we are sure are not used.