Page MenuHomePhabricator

Views data integrity compromised by entity running up fake views
Open, Needs TriagePublicSecurity

Description

See this Village Pump post that describes vast over-access to a specific Wikipedia article, namely Neatsville, Kentucky. This is skewing views data and rankings. This has been happening for many months and there seems to be no apparent reason. If there's any way to stop or throttle this access so views data integrity can be improved, that would be helpful. I recognize this may be seen as more an annoyance than a severe problem (i.e., there is no DDOS effect that I know of) but I think discouraging this kind of access would help us keep views data as trustworthy as practicable.

Details

Risk Rating
Low
Author Affiliation
Wikimedia Communities

Event Timeline

Well, it's not a bug and not a feature request. The instructions led me here as a security issue in the link you provide: "When the integrity of data hosted by the Wikimedia Foundation or affiliated entities is at risk of being corrupted, tampered with, or otherwise modified in an unauthorized manner."

@Aklapper removed projects: Data-Engineering, Analytics-Data-Problem.

Why? @Gehel says this is owned by Data-Engineering

Ah, thanks! I encourage Gehel or anyone else to update the description of https://phabricator.wikimedia.org/tag/pageviews-anomaly/ accordingly and potentially add an entry to https://www.mediawiki.org/wiki/Developers/Maintainers - thanks.

Note that Kepler's Supernova has been added to the Village pump discussion for also having received a huge number of views in recent times.

As long as this isn't affecting availability and has no private information on it, I don't see what the benefit is of keeping this task secret.

Fact is, we will probably never know why that article randomly gets a lot of hits. Probably someone just uses it in a test of some automated script.

Some editors responding in the Village Pump discussion have suggested this is only a problem with how we interpret views. I disagree with that position, but nevertheless, it might be a useful band-aid if the views from odd hits like this were treated as bots or put into a new category that either way would not cause issues in views reports. Might doing that belong in its own Phab task?

Btw, I'm fine with admins no longer considering this task security-related, if they genuinely believe it is not. I believe some other editors want to put in their two cents.

With all the back and forth and some misunderstandings, I decided to write a TL;DR summary of the Village Pump discussion:

A data integrity problem (but not currently a performance problem) is being caused by some entity running up hundreds of thousands of fake views per month of select articles, particularly "Neatsville, Kentucky", leading to corrupted presentations in reports based on this views data. Apparent solutions include more smartly identifying such views and recategorizing them (as they highly likely aren't views from real people) and figuring out what exactly is the origin or origins of this access and taking steps such as blocking to handle them.

sbassett changed Author Affiliation from N/A to Wikimedia Communities.Jun 28 2024, 4:10 PM
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".
sbassett changed the edit policy from "Custom Policy" to "All Users".
sbassett changed Risk Rating from N/A to Low.