Page MenuHomePhabricator

Add support to watch pages for more than 1 year
Open, LowPublicFeature

Description

I recently noticed that the maximum expiry value for action=watch is 1 year.
A similar thing is mentioned in T306198, but I wonder why this limitation even exists?
If we specify an expiration time of more than 1 year, that gives us a warning message:

mw.loader.using('mediawiki.api', function() {
    if (!confirm('Continue?')) return;
    new mw.Api().postWithToken('watch', {
        action: 'watch',
        titles: 'Main page',
        expiry: '3 years',
        formatversion: '2'
    }).then(function(res) {
        console.log(res);
    }).catch(function(code, err) {
        console.log(err);
    });
});
{
    "batchcomplete": true,
    "warnings": {
        "watch": {
            "warnings": "Given value \"3 years\" for parameter \"expiry\" exceeds the maximum of \"1 year\". Using maximum instead."
        }
    },
    "watch": [
        {
            "title": "Main page",
            "ns": 0,
            "expiry": "2024-05-08T02:01:13Z",
            "watched": true
        }
    ]
}

I think it would be best if we could specify an expiration time of any duration. This is because, at least on my local project (jawiki), we often protect pages for more than 1 year (mostly 3 years or so) when we need to make countermeasures against LTAs. With the max watchlist expiration time of 1 year, we can't surveil the relevant pages in the most effective way. I believe that this is worth considering (or at least, API documentation pages should mention this maximum expiration time).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

For this scenario, would watching the page permanently be an option?

action=watch
format=json
formatversion=2
titles={title}
expiry=infinite

@kostajh That's an option but not the best one, as far as I'm concerned, because any time we protect a page for more than 1 year, it'll be necessary to watch the page indefinitely. This will take up too much "space" in the watchlist.
The thing is, older users have so many pages in the watchlist, and probably the situation is even worse for sysops who surveil pages that have previously been targeted for vandalism. The more pages the watchlist has, the slower it is loaded.

I've been developing (and actually rewriting per request) a script for mass protection, and this is the context in which I noticed that there's the 1-year limit (I had had no idea because API docs don't say anything about it). One way to resolve the dilemma above is adding the pages to the watchlist temporarily, and in my case I prepared a functionality to watch protection targets for the duration of the protection plus some extra time, like 1 year 1 month (motivated by the fact that vandals often show up right after protection expires). Currently, this just doesn't work when we protect the pages for a year or more (and for this reason, the script is currently coded to coerce the watchlist expiry into indefinite, when the relative watch-page duration exceeds 1 year). My casual thought is that the watchlist should work just like action=block, in the sense that the latter has no problem with an expiration time of like "3 years".

OK, fair enough. I've added Community-Tech as the team that implemented the feature. From a product side, I guess the risk is that by adding too many options, the dropdown becomes harder to use.

I appreciate your help. I guess the dropdown options can be the same and don't have to have an option like " 3 years". For my purposes, the API is the only one that I want to be modified.

I appreciate your help. I guess the dropdown options can be the same and don't have to have an option like " 3 years". For my purposes, the API is the only one that I want to be modified.

Ah, OK. Cc'ing DBA -- the code comments for the Watchlist expiry feature mention that 1 year exists as an upper limit due to concerns about database table size for watchlist_expiry.

I just checked the table size on enwiki, commons and wikidata and it is fine. I would have no problem increasing it to 3y. @Ladsgroup anything from your side?

Ah, OK. Cc'ing DBA -- the code comments for the Watchlist expiry feature mention that 1 year exists as an upper limit due to concerns about database table size for watchlist_expiry.

Alright, that makes sense. Permanently having 10k pages in the watchlist or some of them getting removed periodically, I wonder which is parsimonious for the database. Thanks, Community-Tech.

Ah, OK. Cc'ing DBA -- the code comments for the Watchlist expiry feature mention that 1 year exists as an upper limit due to concerns about database table size for watchlist_expiry.

Alright, that makes sense. Permanently having 10k pages in the watchlist or some of them getting removed periodically, I wonder which is parsimonious for the database. Thanks, Community-Tech.

It's not as simple as it looks, watchlist_expiry holds an extra column as the timestamp that's binary(14), in enough size can undo the benefits by being bigger than of holding them unconditionally in watchlist.

It's not as simple as it looks, watchlist_expiry holds an extra column as the timestamp that's binary(14), in enough size can undo the benefits by being bigger than of holding them unconditionally in watchlist.

I see. Thank you all for your help.

@MusikAnimal is there any context as to why Watchlist Expiry has a max limit of 1 year? What are the pros/cons of implementing it in this manner? Seems like some communities want to extend the expiry 1 years.

@MusikAnimal is there any context as to why Watchlist Expiry has a max limit of 1 year? What are the pros/cons of implementing it in this manner? Seems like some communities want to extend the expiry 1 years.

It's both a performance concern and a product one. Every row in the watchlist_expiry table has a corresponding row in the normal watchlist table. The watchlist table has long had issues for being too large. Watchlist Expiry is feature intended to allow users to keep their watchlists tidy, but it also has the benefit of helping keep the watchlist table from growing too large, since the rows are eventually removed from it as opposed to staying there indefinitely.

That said, if we allow for very long expiries (such as over a year), we start to lose the benefit (T336142#8833256). So that's the performance part.

The product part is similar though -- if you wanted to watch for a really long time, why not just watch indefinitely? The use-case stated here is about monitoring for long-term abuse. Say an LTA (long-term abuser) always vandalizes the same article, but maybe doesn't do it but every 18 months or something. Okay, so we increase the max to 2 years. Then someone encounters an LTA that vandalizes every 3 years, so now we're asked to increase the maximum expiry even more. So on and so forth; there will always be a use-case for an even longer expiry, but we have to have some sort of limitation.

My thoughts are that this use-case (long-term abuse) may be better solved with something like multipe watchlists (T3492). Then you could have a dedicated watchlist just for LTAs, and hence it's easier to manage it and you may not need watchlist expiry at all. We came very close to working on this wish in 2021 but other projects consumed our time. I hope it gets proposed and voted on again, as I think it would be a fun project and really beneficial for all sorts of use cases.

Removing Growth-Team as this doesn't really concern them. I've also made a note on the MediaWiki-Watchlist profile page stating that tasks about watchlist expiry can be filed under only Expiring-Watchlist-Items to avoid unwanted noise on other boards. I'm also removing MediaWiki-Action-API as changing the maximum expiry time is just a configuration setting and doesn't have anything to do with the API, specifically.

JWheeler-WMF moved this task from Needs Discussion to Freezer on the Community-Tech board.

Watchlist expiry is in passive maintenance, which means this feature optimization would go against our frameworks. This is something to watch in the future, especially for watchlists as a whole