This doesn't seem to be tracked yet.
It's been discussed countless times in the past few years: for all sorts of GLAM initiatives and any other initiative to improve content on the projects, we currently rely on Henrik's stats.grok.se data in JSON format, e.g. https://toolserver.org/~emw/index.php?c=wikistats , http://toolserver.org/~magnus/glamorous.php etc.
The data on domas' logs should be available for easy querying on the Toolserver databases and elsewhere, but previous attempts to create such a DB lead nowhere as far as I know.
I suppose this is already one of the highest priorities in the analytics team plans for the new infrastructure, but I wasn't able to confirm it by reading the public documents and it needs to be done anyway sooner or later.
(Not in "Usage statistics" aka "Statistics" component because that's only about raw pageviews data.)
Involved sub-tasks:
- T101785: Generate test data for Pageview API {slug} [5 pts]
- T101786: Test Cassandra as a storage strategy {slug} [5 pts]
- T101792: {slug} Pageview API Ops
- T106821: POC RestBase with cassandra in labs on test data [8 pts] {slug}
- T107053: create RESTBase endpoints [34 pts] {slug}
- T107054: create second RESTBase endpoint [8 pts] {slug}
- T107055: create third RESTBase endpoint [8 pts] {slug}
- T107056: Puppetize a server with a role that sets up Cassandra on Analytics machines [13 pts] {slug}
- T108174: Create Hadoop Job to load data into cassandra [34 pts] {slug}
- T113991: Deploy the Analytics RESTBase {slug} [13 pts]
- T114830: configure RESTBase pageview proxy to Analytics' cluster {slug} [34 pts]
- T115351: Improve loading Analytics Query Service with data {slug} [5 pts]
- T115353: improve timeuuid writing {slug} [5 pts]
- T115355: run job using oozie <mapreduce> {slug} [13 pts]
- T115356: special character stripping on cassandra loading (tabs) {slug} [5 pts]
- T115360: cassandra backfill monitoring {slug}
- T115361: optimize Analytics Query Service {slug}
- T116209: Improve record size on cassandra storage for pageview API data (RESTBase changes) {slug} [8 pts]
- T116407: Document Cassandra SLAS and storage requirements for daily and hourly data {slug} [5 pts]
- T116408: Remove loading of hourly data from Cassandra loading scripts and hql [5 pts] {slug}
- T116409: Druid testing on labs to asses whether is a suitable Cassandra replacement. {slug} [8 pts]
- T116763: Test Elastic search pageview data loading/retrieval on labs {slug} [8 pts]
- T116764: Testing druid data loading/retrieval on labs {slug}
- T117017: Reformat pageview API responses to allow for status reports and messages {slug}
- T117226: Pageview API documentation for end users {slug} [8 pts]
- T117242: Inspect Pageview API queries (after launch ) {slug}
- T118402: Make AQS return 0 instead of no values {slug}
- T118403: AQS should expect article names uriencoded just once {slug}
- T118447: Update Cassandra loading job - per-project [5 pts] {slug}
- T118448: Update cassandra monthly top job [3 pts] {slug}
- T118449: Investigate cassandra daily top job [5 pts] {slug}
- T118450: Backfill cassandra pageview data - September [5 pts] {slug}
- T118785: Missing Pageview API data for one article {slug} [3 pts]
- T118845: Backfill cassandra pageview data - August [5 pts] {slug}
- T120845: Gather preliminary metrics of Pageview API usage for quaterly review {slug} [5pts]
- T121300: Remove all-access and spider from top endpoint {slug}
Version: wmf-deployment
Severity: enhancement
Discussions (partial list):
- http://lists.wikimedia.org/pipermail/analytics/2012-December/000266.html
- http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/43657
- https://intern.wikimedia.ch/lists/private/cultural-partners/2010-November/000281.html
- https://intern.wikimedia.ch/lists/private/cultural-partners/2011-July/001476.html
- https://intern.wikimedia.ch/lists/private/cultural-partners/2011-December/002477.html
- https://intern.wikimedia.ch/lists/private/cultural-partners/2012-December/005100.html
- http://lists.wikimedia.org/pipermail/analytics/2013-January/000351.html
- http://lists.wikimedia.org/pipermail/analytics/2013-May/000618.html
- https://intern.wikimedia.ch/lists/private/cultural-partners/2013-June/006163.html
- http://lists.wikimedia.org/pipermail/wikitech-l/2013-June/069692.html
- http://lists.wikimedia.org/pipermail/wikitech-l/2013-September/071714.html
- http://lists.wikimedia.org/pipermail/wikimedia-l/2013-September/128060.html
- http://lists.wikimedia.org/pipermail/analytics/2013-October/001062.html
- https://intern.wikimedia.ch/lists/private/cultural-partners/2013-October/006753.html
- https://meta.wikimedia.org/wiki/Grants_talk:APG/Proposals/2013-2014_round2/Wikimedia_Foundation/Proposal_form#Multiplication_of_tools
- http://magnusmanske.de/wordpress/?p=173
- https://en.wikipedia.org/w/index.php?title=User_talk:Henrik&diff=600917917&oldid=600897425
- http://comments.gmane.org/gmane.org.wikimedia.analytics/142
- https://en.wikipedia.org/?curid=43853841
- https://intern.wikimedia.ch/lists/private/cultural-partners/2015-May/008397.html
- https://intern.wikimedia.ch/lists/private/cultural-partners/2015-October#8702
- https://lists.wikimedia.org/pipermail/wikimedia-l/2016-February/thread.html#82136