Page MenuHomePhabricator

p50380g50816__pop_stats (popularpages) using 53G on labsdb1001 (enwiki)
Closed, ResolvedPublic

Description

@Mr.Z-man / @kaldari, is this table in active use? If not, would it be possible to clear it?

Event Timeline

valhallasw renamed this task from p50380g50943__cache (???) using 53G on labsdb1001 (enwiki) to p50380g50816__pop_stats (popularpages) using 53G on labsdb1001 (enwiki).Apr 21 2016, 6:49 PM
valhallasw updated the task description. (Show Details)
valhallasw added subscribers: kaldari, Mr.Z-man.

@Mr.Z-man / @kaldari Any update on this question? Is the table active, and if so is there any way to reduce its footprint?

It looks like it's still in use, but it has data going back to 2009. Since the reports are output to Wikipages, I don't think there's any reason we need to keep the old data in the database indefinitely. I'll clean out some of the older data.

I deleted all the data older than 2010 as a test. If there are no issues with the next report generation, I'll delete more.

To connect to the database in question:
mysql --defaults-file="${HOME}"/replica.my.cnf -h enwiki.labsdb p50380g50816__pop_stats

I don't really have the time to keep up with the increasing maintenance on this tool anymore. My plan is to put up some notices to see if anyone else is willing to take it over, and if not (which seems likely), I'll delete the rest.

@Qgil: If you know any volunteer developers who might be interested in maintaining this tool, please let us know. The popular pages reports are very valuable to WikiProjects and it would be a shame to lose this feature.

See https://tools.wmflabs.org/popularpages/config.php and https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Birds/Popular_pages.

I deleted all the data older than 2010 as a test. If there are no issues with the next report generation, I'll delete more.

To connect to the database in question:
mysql --defaults-file="${HOME}"/replica.my.cnf -h enwiki.labsdb p50380g50816__pop_stats

did you get a chance at further deletion @kaldari?

I wonder whether Mr.Z-bot would be easier to maintain in combination of https://tools.wmflabs.org/massviews/ or whetever APIs that tools is querying? Sorry if this is a silly comment in technical terms.

We don't have a way to look for possible maintainers other than sending an email to the Labs or wikitech-l lists. Related tasks that come to mind, just in case you find them useful:

@Qgil: As soon as T118508 is fixed, it should actually be pretty easy to replace this tool entirely. Unfortunately, due to T118508, https://tools.wmflabs.org/massviews/ is currently limited to sets with 500 or fewer pages, which excludes most WikiProjects, so this tool is still needed. Sadly, I don't think T118508 is getting fixed soon though.

@Mr.Z-man: Hmm, it looks like the tool hasn't actually run since April. Any idea what might be wrong with it? What's the command to run it manually from the command line? Can you run it for just one project as a test and see if there are any errors?

I've actually been running it manually for a while, because it has never been quite as reliable as I was hoping. For some reason in April it missed a lot of data. I didn't have time to diagnose it and rebuild the table.

What are the current development needs of this tool? I'm really missing its output in projects I'm associated with. (note that I'm not committing to anything at this point.)

did you get a chance at further deletion @kaldari?

@chasemp: I've deleted everything older than 2011 now. Can you see how much disk space is being used currently?

Can you see how much disk space is being used currently?

du -hs p50380g50816__pop_stats
50G	p50380g50816__pop_stats

It has been over 4 months since the most recent popular pages updates, and the pages in projects in which I'm a member are getting stale. Is anything being done to resolve this?

@Stevietheman: Please create a seperate bug for that issue.

I thought this was the bug. Also, what would my bug state and where would it go?

@kaldari T118508: AQS: query multiple articles at the same time was declined (not for great reasons IMO, but whatever), but I thought that the rate limits were raised so that https://tools.wmflabs.org/massviews/ was more effective. In either case, can you purge a lot more data from this historical table?

I just deleted everything older than 2014, which was about half the tables.

Marostegui assigned this task to kaldari.

Thanks @kaldari, this went down to 35G

# du -sh  /srv/sqldata/p50380g50816__pop_stats/
35G	/srv/sqldata/p50380g50816__pop_stats/

I am going to close this as resolved, as right now we have quite some space available on labsdb1001 after we merged the partitions and this is no longer a big deal.
If someone still feels this needs to be cleaned up more - feel free to reopen!

I don't think we need to keep those 35G worth of data anymore even. The bot is gone forever and that data is pretty outdated. I can't think of a reason we'd ever need it.

If you guys think it can be dropped...go ahead! :-)