Detect manual reverts
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Ostrzyciel
	Jun 22 2020, 2:11 PM

Description

As part of T254074: Implement the reverted edit tag we want to implement manual revert detection in addition to the current undo- and rollback-based triggers. A manual revert is made when the editor reverts the page to an exact previous state, not using the built-in undo or rollback functionality. See T252366 for a discussion on this.

This should be implemented using exact SHA-1 matching of revisions in the database vs the current revision. As that field lacks an index over it, we'll need to search only within a certain radius of revisions, which can be configurable. This paper suggests using 15 revisions as the default radius (that is also what Product Analytics team uses by default for their mwreverts library). I think this can be done with a single relatively inexpensive DB query utilizing the LIMIT clause. I hope the query optimizer is smart enough to make it a scan of 15 rows, instead of the entire table, I'll have to double-check this, though. I think we'll need to read this data from master, to avoid issues with replication lag, though I'm not 100% sure on this.

Most of the logic will probably live in the EditResultBuilder. The code will be triggered by EditResultBuilder::buildEditResult().

Also: should we mark manual reverts with a software-defined edit tag, like mw-undo and mw-rollback? This would certainly be consistent with the other forms of reverts, and the fact that these manual reverts will mark reverted edits just like undo and rollback will.

Details

Subject	Repo	Branch	Lines /-
PageUpdater: Update @since tag from 1.35 to 1.36 for $wgManualRevertSearchRadius	mediawiki/core	master	2 -2
Add mw-manual-revert change tag	mediawiki/core	master	58 -31
EditResultBuilder: manual revert detection	mediawiki/core	master	569 -11

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	SBisson	T162675 'Integrated filters'—add all RC Page filters into the new UI along with selected new functions
Resolved	MMiller_WMF	T164307 Add Reverted filter to RecentChanges Filters
Stalled	None	T159725 AbuseFilter: Indicate that an edit was a revert
Declined	None	T155512 Trending service should track reverts more accurately
Resolved	Ostrzyciel	T56007 "reverted"-type notifications should be based on the actual page content
Resolved	Ostrzyciel	T152434 Add method to Revision to check if it was a Revert, and whether an edit was Reverted
Resolved	Ostrzyciel	T248775 Proposal: Add Reverted filter to RecentChanges Filters
Resolved	Ostrzyciel	T254074 Implement the reverted edit tag
Resolved	Ostrzyciel	T256001 Detect manual reverts

Event Timeline

Ostrzyciel created this task.Jun 22 2020, 2:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 22 2020, 2:11 PM

After some experimenting I came up with this mockup (I tested it on MariaDB 10.3):

select * from (
        select * from revision
        where rev_page = 248348
        order by rev_id desc
        limit 15
    ) revs_filtered
    where rev_sha1 = 'civ8b9x9x0xbvyvy8cnr5a9fle2cdqc';

This will select the 15 most recent revisions and then check if any of them has the SHA-1 we're looking for, so the scan is limited to 15 rows. This particular page has over 1500 revisions.
EXPLAIN for this query:

 ------ ------------- ------------ ------ ------------------------------------------------ ------------- --------- ------- ------ ------------- 
| id   | select_type | table      | type | possible_keys                                  | key         | key_len | ref   | rows | Extra       |
 ------ ------------- ------------ ------ ------------------------------------------------ ------------- --------- ------- ------ ------------- 
|    1 | PRIMARY     | <derived2> | ALL  | NULL                                           | NULL        | NULL    | NULL  |   15 | Using where |
|    2 | DERIVED     | revision   | ref  | rev_page_id,page_timestamp,page_user_timestamp | rev_page_id | 4       | const | 1556 | Using where |
 ------ ------------- ------------ ------ ------------------------------------------------ ------------- --------- ------- ------ -------------

And, according to MySQL manual:

If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast.

In this case we do have a key like this, rev_page_id that consists of rev_page and rev_id. Sooo I think we should be fine here, we have an index scan, then an index-based sort operation limited to N rows and then a table scan on a small number of rows.

Correction: we should order revisions by timestamp, not by their IDs (which can do all kinds of crazy things). So the order clause should look like order by rev_timestamp desc.

The conclusion though is the same: we still have an index that would suit this use case, page_timestamp.

Change 607855 had a related patch set uploaded (by Ostrzyciel; owner: Ostrzyciel):
[mediawiki/core@master] WIP: EditResultBuilder: manual revert detection

https://gerrit.wikimedia.org/r/607855

gerritbot added a project: Patch-For-Review.Jun 25 2020, 6:14 PM

Ostrzyciel updated the task description. (Show Details)Jun 25 2020, 6:31 PM

I just found T216297 which contains an interesting discussion on this topic. One thing in particular made me curious:

In T216297#5035877, @daniel wrote:

Note that manual reverts (by clicking edit on an older revision, and then saving) is detected by hash, but the "original rev ID" is not set in this case. This could be done without too much trouble, but if I recall correctly, this may confuse some extensions.

@daniel, do you remember what extensions could be affected by such an approach? Or do you have some hints on how to find them? Currently my plan is to do just that, I'll be setting the original rev ID to the edit a user manually reverted to. I had hoped this wouldn't break anything :D

Proc subscribed.Jun 30 2020, 3:50 PM

Change 609242 had a related patch set uploaded (by Ostrzyciel; owner: Ostrzyciel):
[mediawiki/core@master] Add mw-manual-revert change tag

https://gerrit.wikimedia.org/r/609242

Change 607855 merged by jenkins-bot:
[mediawiki/core@master] EditResultBuilder: manual revert detection

https://gerrit.wikimedia.org/r/607855

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.1; 2020-07-21).Jul 14 2020, 3:00 PM

BEANS-X2 subscribed.Jul 14 2020, 3:34 PM

Change 609242 merged by jenkins-bot:
[mediawiki/core@master] Add mw-manual-revert change tag

https://gerrit.wikimedia.org/r/609242

Maintenance_bot removed a project: Patch-For-Review.Jul 16 2020, 12:11 PM

Change 613801 had a related patch set uploaded (by DannyS712; owner: Shirayuki):
[mediawiki/core@master] PageUpdater: Update @since tag from 1.35 to 1.36 for $wgManualRevertSearchRadius

https://gerrit.wikimedia.org/r/613801

gerritbot added a project: Patch-For-Review.Jul 18 2020, 6:52 AM

Change 613801 merged by jenkins-bot:
[mediawiki/core@master] PageUpdater: Update @since tag from 1.35 to 1.36 for $wgManualRevertSearchRadius

https://gerrit.wikimedia.org/r/613801

Maintenance_bot removed a project: Patch-For-Review.Jul 18 2020, 8:10 AM

Ostrzyciel closed this task as Resolved.Jul 21 2020, 5:59 PM

Ostrzyciel mentioned this in T258921: Create a maintenance script for filling mw-manual-revert and mw-reverted tags.Jul 27 2020, 9:23 AM

Ostrzyciel mentioned this in T248775: Proposal: Add Reverted filter to RecentChanges Filters.Aug 30 2020, 8:51 AM

As a bot owner, I would like to mention several situations where the "manual revert" tag is applied in a situation where it doesn't seem relevant. The tag itself doesn't do much harm, but as far as I understand, there is a plan to notify users from those manual reverts (T154637), so this could become more annoying.

Requests pages: when a bot archives requests that have been processed (e.g. https://fr.wikipedia.org/w/index.php?title=Wikipédia:Demande_d'intervention_sur_un_message_système&diff=174930164&oldid=174843790). If the last X requests are marked as done and archived, this is seen as a "manual revert" of all edits done on those requests.
Lists of things to do: I maintain pages with lists of pages to fix, such as https://fr.wikipedia.org/w/index.php?title=Projet:Restauration_lien_rouge/par_distance/Q&action=history. Typically, the bot adds pages to the list every X days and then and users remove items that they fix, occasionally "reverting" to the state where the list is empty. In that case, the notification is triggered by humans.
Removing a protection template once the protection has expired (https://fr.wikipedia.org/w/index.php?title=Histoire_du_Sahara_occidental&diff=prev&oldid=174948749). Admittedly, it makes sense in that case to tag it as a revert, but still, I don't think sysops who add the template manually would want to get notified of a revert when my bot removes the template.

Will there be a way to exclude some users or pages from that mechanism?

Orlodrim mentioned this in T154637: Notify users when manually reverting to an older revision.Sep 23 2020, 10:53 PM

Detect manual revertsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Detect manual reverts
Closed, ResolvedPublic
Actions

Related Objects
Search...