Page MenuHomePhabricator

Add page_id and namespace to X-Analytics header in App / api requests
Closed, ResolvedPublic

Description

Currently, Mediawiki has the ability to add items to the X-Analytics header via a nice little extension[1] written by Ori. This is deployed in production and is working just fine.

However, it is not activated for requests that come from mobile apps. In order to reliably count page views, we need to count based on page_id, and not a page title extracted from the request uri path. To include mobile apps page views in these counts, mobile apps requests will need to have these fields added to their X-Analytics headers.

I don't know much about how mobile apps make requests. AFAIK, this could be done in the apps themselves, or at the Mediawiki API level. Since Mediawiki already supports this, perhaps it is better to do so there.

Thoughts? Anyone know how we can actually do this?

Thanks!

[1] https://github.com/wikimedia/mediawiki-extensions-XAnalytics

Event Timeline

Ottomata raised the priority of this task from to Medium.
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: Ottomata, dr0ptp4kt, yuvipanda and 2 others.
Ottomata set Security to None.

I prefer header enrichment be done at the MediaWiki API tier if possible. The parameter name-value pairs in the query string of course dictate *whether* something qualifies as a potential pageview, and then the MediaWiki API tier could denote which, for example, actual page & namespace is being accessed (or whatever else is performant to add). Avoidance of additional cache fragmentation or cache pollution is necessarily in important facet of any MediaWiki API VCL solution.

1 for doing it at the mediawiki API level. I think that would necessitate a patch to MobileFrontend, which houses the action=mobileview that the apps use.

Hmm, no XAnalytics ? As far as I understand this, that's where this really belongs.

The XAnalytics extension should probably be using the APIAfterExecute hook to do the same thing it does using the BeforePageDisplay hook for the web UI.

Seems like it needs to be done in the extension, rather than in the app.

Hm? The extension is meant to be used by MW code. We wouldn't code usages of the extension in the extension itself.

I suppose the APIAfterExecute hook should do this:

https://phabricator.wikimedia.org/rEWMV017f9d845c45697c1e84b50d8a16e65f60b68fee

?

No. It would more likely do like rEXAN64e43ad61363ff69d2bad173a6693ff311876ec0#C1546702NL26, and be in the XAnalytics extension rather than WikimediaEvents.

It could almost expose the same XAnalyticsSetHeader hook to stuff like WikimediaEvents, except for the part where that hook is being passed an OutputPage which makes little sense in an API context.

BTW, MediaWiki core has nothing to do with this "X-Analytics" header, it's all done by extensions MobileFrontend and XAnalytics. Possibly that should be merged into core somehow, but it isn't yet.

I'm not sure I understand. Why would we make a generic extension actually do anything? Isn't the point of having a generic extension so that it can be used other callers? The commit you link to is the extension creation. The commit I link to uses it.

Your problem is that the generic extension is not doing its thing for API queries, because it's only doing its thing in response to the BeforePageDisplay hook which isn't fired during (most) API queries.

Change 202800 had a related patch set uploaded (by Legoktm):
Add X-Analytics header for API requests too

https://gerrit.wikimedia.org/r/202800

Change 202802 had a related patch set uploaded (by Legoktm):
Add page_id to X-Analytics header for action=mobileview requests

https://gerrit.wikimedia.org/r/202802

Change 202801 had a related patch set uploaded (by Ottomata):
Allow extensions to add items at runtime with XAnalytics::addItem()

https://gerrit.wikimedia.org/r/202801

Change 202799 had a related patch set uploaded (by Ottomata):
Move BeforePageDisplay hook to separate class

https://gerrit.wikimedia.org/r/202799

Change 202802 merged by jenkins-bot:
Add page_id and ns to X-Analytics header for action=mobileview requests

https://gerrit.wikimedia.org/r/202802

Change 202799 merged by jenkins-bot:
Move BeforePageDisplay hook to separate class

https://gerrit.wikimedia.org/r/202799

Change 202800 merged by jenkins-bot:
Add X-Analytics header for API requests too

https://gerrit.wikimedia.org/r/202800

I prefer header enrichment be done at the MediaWiki API tier if possible. The parameter name-value pairs in the query string of course dictate *whether* something qualifies as a potential pageview, and then the MediaWiki API tier could denote which, for example, actual page & namespace is being accessed (or whatever else is performant to add).

It doesn't look like this was done. Instead it passes an OutputPage (from $module->getOutput(); as @Anomie said, is this even meaningful for an API module?) to the same hook.

The WikimediaEvents listener is now called by that hook for API requests, despite saying "but only if the request is for an actual page".

On a typical title-specific API request with api.php, e.g.http://127.0.0.1:8080/w/api.php?action=query&prop=revisions&titles=Main Page , the title is "Badtitle/dummy title for API calls set in api.php", and this is skipped because that page doesn't exist.

The MobileFrontend extension does not override this title either AFAICT (no relevant setTitle call).

I'll add a NULL check, but I'm not sure if this is useful for API requests at all in its current form.

Change 206352 had a related patch set uploaded (by Mattflaschen):
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206352

Change 206352 merged by jenkins-bot:
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206352

Change 206959 had a related patch set uploaded (by Mattflaschen):
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206959

Change 206959 merged by jenkins-bot:
The title can be null for internal API requests.

https://gerrit.wikimedia.org/r/206959

Status update? Its hard for me to follow all of the patches.

No one's responded to https://phabricator.wikimedia.org/T92875#1233108 .

In summary, I don't think it should be using the output title for API requests. Instead, it should log a meaningful title (or titles) related to the API request.

Change 202801 merged by jenkins-bot:
Allow extensions to add items at runtime with XAnalytics::addItem()

https://gerrit.wikimedia.org/r/202801

Ok, thanks Matt.

So, does that mean that the API code needs to figure out which title or titles are being requested and add them to X-Analytics via a different path? There must be a way to get the requested titles, if not in a nice way, at least from the query parameters.

Yeah, take a look at https://www.mediawiki.org/wiki/API:Query#Specifying_pages . Not all API modules relevant to pages subclass API:Query, but a lot do.

In the code, see ApiPageSet.php, ApiQuery.php, and ApiQueryBase.php.

Change 202801 merged by jenkins-bot:
Allow extensions to add items at runtime with XAnalytics::addItem()

https://gerrit.wikimedia.org/r/202801

It seems addItem() doesn't work. self::$items is not used.

Change 338931 had a related patch set uploaded (by Krinkle):
Remove unused $items member

https://gerrit.wikimedia.org/r/338931

Change 338931 merged by jenkins-bot:
Remove unused $items member

https://gerrit.wikimedia.org/r/338931

Krinkle claimed this task.

This is working and documented at https://wikitech.wikimedia.org/wiki/X-Analytics.

Closing as such. The bit I mentioned above about the addItem() method not working can be resolved in T190381.

Wait, really, so we get page_id and ns set in API requests now? I thought this was tricky, since one API request might be for multiple page_ids.

@Ottomata The vast majority of API modules either don't relate to a page, or (as you say) can relate to many pages at once using titles pageids or the result of another query (like user contributions, page history, recent changes etc.). However, while those can act on many pages, they also don't relate to page views. (Not "our" page views any way, it's certainly possible for a third-party to display something somewhere based on our API data).

The specific case of ApiMobileView, which can provide page content (not just meta data), is not based on the general ApiQueryModule interface and only acts on a single page. MobileFrontend (which registers ApiMobileView) adds ns and page_id props for those requests. Details at T190381.

Great, thanks! It might nice if in the future we were somehow able to associate each webrequest with a set of relevant page_ids, even if they don't ultimately count towards our 'pageviews' metric.