Page MenuHomePhabricator

Expose all slots to the search interface
Open, MediumPublic

Description

Search engines such as Cirrus should examine the content of all slots when updating the search index.

Related Objects

StatusSubtypeAssignedTask
OpenBUG REPORTNone
OpenNone
StalledNone
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenFeatureNone
DuplicateNone
ResolvedNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
ResolvedNone
ResolvedNone
OpenFeatureNone
OpenNone
OpenFeatureNone
StalledNone
OpenNone
OpenNone

Event Timeline

daniel triaged this task as Medium priority.Mar 19 2018, 4:02 PM
daniel created this task.
Smalyshev added subscribers: dcausse, EBernhardson.

Two big questions here are:

  1. One document or multiple documents? (I think the trend is for now for one document)
  2. If the answer is one document, how to reconcile slots with potential intersections? I.e., if both slots want to put something in opening_text, what happens? Etc.
  1. If the answer is one document, how to reconcile slots with potential intersections? I.e., if both slots want to put something in opening_text, what happens? Etc.

For now, I'd blindly concatenate. That's the baseline.
We have to answer similar questions for a lot of things, including the generation of the HTML the user will see. I plan an RFC about that question.

  1. At least for cirrus, it pretty much needs to be one document if we want any kind of interaction between fields of multiple content types.
  1. I think, again only wrt cirrus, this is going to depend heavily on how those fields get into the queries issued. The current method with a variety of hard coded field names really pushes for the ability to overwrite, such as work on file media info which will overwrite opening_text field on file pages. The two will have to be figured out in parallel i suppose.

(sorry I'm very new to MCR)
How will this work regarding namespaces?
I mean can there be a mix of namespaces here or is there a single top level namespace somewhere?

Should we set up some kind of meeting to sync on this and develop strategy? Maybe on the hackathon? I am personally still rather fuzzy on how this whole thing is supposed to work and on MCR details too, and I am suspecting I am not the only one :)

1 ... if we're setting up a meeting please count me in (I'll be at the hackathon)

not an empic, this is a concrete task

FWIW for the initial release of the SDoC multi-lingual captions stuff, I used the onSearchDataForIndex hook to write search data for MediaInfo slots

FWIW for the initial release of the SDoC multi-lingual captions stuff, I used the onSearchDataForIndex hook to write search data for MediaInfo slots

Update: We switched to CirrusSearchBuildDocumentParse in 2a0610b8a2d05d872878da292117f140520f5098.

Update: We switched to CirrusSearchBuildDocumentParse in 2a0610b8a2d05d872878da292117f140520f5098.

That hook's interface is actually not MCR compatible, since it only takes a singe Content object. I commented on the patch here in phab.

I worked around that in MediaInfo by using WikiPage::factory( $title )->getRevisionRecord() ... ought we raise a ticket to make the hook MCR compatible? Not really sure what's using the hook, so I'm not sure how to proceed ...

@Cparle this ticket here *is* about making sure all slots are passed to cirrus. Cirrus should then also pass them on via its own hooks. Changing a hook signature isn't trivial though, it's generally better to introduce a new hook.

I think this ticket here is sufficient to track the need to do this. Your workaround should be fine for MediaInfo for now. Perhaps, add a comment to your hook handler that points to this ticket.

Change 472647 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/WikibaseMediaInfo@master] Adding note about workaround pending T190066

https://gerrit.wikimedia.org/r/472647

Change 472647 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Adding note about workaround pending T190066

https://gerrit.wikimedia.org/r/472647

Change 837128 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/Wikibase@master] Add a way to extract content scoped search index data

https://gerrit.wikimedia.org/r/837128

Change 837128 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add a way to extract content scoped search index data

https://gerrit.wikimedia.org/r/837128