See this Village Pump post that describes vast over-access to a specific Wikipedia article, namely Neatsville, Kentucky. This is skewing views data and rankings. This has been happening for many months and there seems to be no apparent reason. If there's any way to stop or throttle this access so views data integrity can be improved, that would be helpful. I recognize this may be seen as more an annoyance than a severe problem (i.e., there is no DDOS effect that I know of) but I think discouraging this kind of access would help us keep views data as trustworthy as practicable.
Description
Details
- Risk Rating
- Low
- Author Affiliation
- Wikimedia Communities
Related Objects
Event Timeline
I don't think this qualifies as a security issue? See https://www.mediawiki.org/wiki/Reporting_security_bugs#What_is_considered_a_security_issue
Well, it's not a bug and not a feature request. The instructions led me here as a security issue in the link you provide: "When the integrity of data hosted by the Wikimedia Foundation or affiliated entities is at risk of being corrupted, tampered with, or otherwise modified in an unauthorized manner."
Ah, thanks! I encourage Gehel or anyone else to update the description of https://phabricator.wikimedia.org/tag/pageviews-anomaly/ accordingly and potentially add an entry to https://www.mediawiki.org/wiki/Developers/Maintainers - thanks.
Note that Kepler's Supernova has been added to the Village pump discussion for also having received a huge number of views in recent times.
As long as this isn't affecting availability and has no private information on it, I don't see what the benefit is of keeping this task secret.
Fact is, we will probably never know why that article randomly gets a lot of hits. Probably someone just uses it in a test of some automated script.
Some editors responding in the Village Pump discussion have suggested this is only a problem with how we interpret views. I disagree with that position, but nevertheless, it might be a useful band-aid if the views from odd hits like this were treated as bots or put into a new category that either way would not cause issues in views reports. Might doing that belong in its own Phab task?
Btw, I'm fine with admins no longer considering this task security-related, if they genuinely believe it is not. I believe some other editors want to put in their two cents.
With all the back and forth and some misunderstandings, I decided to write a TL;DR summary of the Village Pump discussion:
A data integrity problem (but not currently a performance problem) is being caused by some entity running up hundreds of thousands of fake views per month of select articles, particularly "Neatsville, Kentucky", leading to corrupted presentations in reports based on this views data. Apparent solutions include more smartly identifying such views and recategorizing them (as they highly likely aren't views from real people) and figuring out what exactly is the origin or origins of this access and taking steps such as blocking to handle them.