Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | jcrespo | T132431 labsdb1001 and labsdb1003 short on available space | |||
Resolved | kaldari | T133326 p50380g50816__pop_stats (popularpages) using 53G on labsdb1001 (enwiki) |
Event Timeline
It looks like it's still in use, but it has data going back to 2009. Since the reports are output to Wikipages, I don't think there's any reason we need to keep the old data in the database indefinitely. I'll clean out some of the older data.
I deleted all the data older than 2010 as a test. If there are no issues with the next report generation, I'll delete more.
To connect to the database in question:
mysql --defaults-file="${HOME}"/replica.my.cnf -h enwiki.labsdb p50380g50816__pop_stats
I don't really have the time to keep up with the increasing maintenance on this tool anymore. My plan is to put up some notices to see if anyone else is willing to take it over, and if not (which seems likely), I'll delete the rest.
@Qgil: If you know any volunteer developers who might be interested in maintaining this tool, please let us know. The popular pages reports are very valuable to WikiProjects and it would be a shame to lose this feature.
See https://tools.wmflabs.org/popularpages/config.php and https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Birds/Popular_pages.
I wonder whether Mr.Z-bot would be easier to maintain in combination of https://tools.wmflabs.org/massviews/ or whetever APIs that tools is querying? Sorry if this is a silly comment in technical terms.
We don't have a way to look for possible maintainers other than sending an email to the Labs or wikitech-l lists. Related tasks that come to mind, just in case you find them useful:
@Qgil: As soon as T118508 is fixed, it should actually be pretty easy to replace this tool entirely. Unfortunately, due to T118508, https://tools.wmflabs.org/massviews/ is currently limited to sets with 500 or fewer pages, which excludes most WikiProjects, so this tool is still needed. Sadly, I don't think T118508 is getting fixed soon though.
@Mr.Z-man: Hmm, it looks like the tool hasn't actually run since April. Any idea what might be wrong with it? What's the command to run it manually from the command line? Can you run it for just one project as a test and see if there are any errors?
I've actually been running it manually for a while, because it has never been quite as reliable as I was hoping. For some reason in April it missed a lot of data. I didn't have time to diagnose it and rebuild the table.
What are the current development needs of this tool? I'm really missing its output in projects I'm associated with. (note that I'm not committing to anything at this point.)
Can you see how much disk space is being used currently?
du -hs p50380g50816__pop_stats 50G p50380g50816__pop_stats
It has been over 4 months since the most recent popular pages updates, and the pages in projects in which I'm a member are getting stale. Is anything being done to resolve this?
@kaldari T118508: AQS: query multiple articles at the same time was declined (not for great reasons IMO, but whatever), but I thought that the rate limits were raised so that https://tools.wmflabs.org/massviews/ was more effective. In either case, can you purge a lot more data from this historical table?
Thanks @kaldari, this went down to 35G
# du -sh /srv/sqldata/p50380g50816__pop_stats/ 35G /srv/sqldata/p50380g50816__pop_stats/
I am going to close this as resolved, as right now we have quite some space available on labsdb1001 after we merged the partitions and this is no longer a big deal.
If someone still feels this needs to be cleaned up more - feel free to reopen!
I don't think we need to keep those 35G worth of data anymore even. The bot is gone forever and that data is pretty outdated. I can't think of a reason we'd ever need it.