Page MenuHomePhabricator

Implement searching of 'depicts' on commons with the 'quantity' qualifier
Closed, ResolvedPublic

Description

The quantity property, when used as a qualifier for 'depicts', indicates the quantity of the thing-that-is-depicted

Example from wikidata https://www.wikidata.org/wiki/Q1195035 (see depicts > human > quantity, etc)

We want users to be able to include quantities in their searches - for example to find images depicting >10 people (probably with a search term like haswbstatement:P180=Q5[P1114>10]), or exactly one cat (haswbstatement:P180=Q146[P1114=1]).

I can think of 3 different options for implementing searching on quantity

Option 1
We can store the qualifier in the normal way like this P180=Q5[P1114=2] (see T193407), in which case we would only be able to find exact matches. For example a user could search for images depicting exactly 1 cat, but could not search for images with >1 cats

Option 2
Store the data in an individual numeric field in elasticsearch, and come up with some kind of hack to relate the numeric quantity to a specific 'depicts' statement. Tricky, and not even sure it's possible

Option 3
Use the Wikidata Query Service (WDQS) to run a SPARQL query (which allows operators like > and <), and then use the ids as a filter for an elasticsearch query - basically we'd ask WDQS for all pictures depicting 'cat' with quantity qualifier >1, then search elasticsearch for anything else we wanted to search for but only among the (max 1000) IDs we got from WDQS. Note that because of limitations passing data between WDQS and elasticsearch there will be edge cases where no results will be returned even if appropriate results exist.

The deepcat feature uses WDQS with elasticsearch, so we could base our approach on the deepcat code (see https://gerrit.wikimedia.org/r/#/c/405059/)

This option depends on T194401


Note that there are currently 1025 items in wikidata that have a 'depicts' statement with a 'quantity' qualifier out of a total of ~70k items with depicts statements (~1.5%)

Related Objects

Event Timeline

Cparle triaged this task as Medium priority.May 9 2018, 9:52 AM
Cparle created this task.
Cparle updated the task description. (Show Details)
Cparle updated the task description. (Show Details)
Cparle updated the task description. (Show Details)

Few questions:

Fair question!

For the first cut of this we'll just be going with exact matches, and not traversing 'instance of' or 'subclass of' trees. Not sure if there's a plan to traverse the trees for search eventually - @Ramsey-WMF do you know?

Hello! Answers for @dcausse :

  • As Cparle mentioned, for v1 we don't plan to traverse the instance of and subclass trees. We're not yet sure if that's even reasonable or computationally practical. It may be better to be a little redundant and have all the detail at the top level of depicts. So, for your example, within Commons you might tag the image as both a Ferrari Testarossa and a car
  • For your 2nd example, at the moment the thinking is to have something like:

Depicts: Australian Shepherd (quantity:2) :: Belgian Shepherd (quantity:3) :: dog (quantity:5)

However, we're still doing research to see if we can reasonably and reliably have the system "figure out" that 2 of one breed and 3 of another means 5 dogs. It's not hard for very specific instances like this one, but across all possibilities it gets challenging.

Change 433748 had a related patch set uploaded (by DCausse; owner: Severino Mateus Jr.):
[search/extra@master] Add TermFreqTokenFilter

https://gerrit.wikimedia.org/r/433748

Change 435180 had a related patch set uploaded (by DCausse; owner: DCausse):
[search/extra@master] Add term_freq a query to filter on term frequency

https://gerrit.wikimedia.org/r/435180

For the record - we're going with Option 2 from the description

Change 433748 merged by jenkins-bot:
[search/extra@master] Add TermFreqTokenFilter

https://gerrit.wikimedia.org/r/433748

Change 435753 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/Wikibase@master] Add entity quantity data to new field in search index

https://gerrit.wikimedia.org/r/435753

Change 435180 merged by Gehel:
[search/extra@master] Add term_freq a query to filter on term frequency

https://gerrit.wikimedia.org/r/435180

Mentioned in SAL (#wikimedia-operations) [2018-06-14T12:23:09Z] <gehel> rolling restart of elasticsearch codfw for plugin upgrade - T194245

Mentioned in SAL (#wikimedia-operations) [2018-06-14T17:34:56Z] <gehel> rolling restart of elasticsearch codfw completed - T194245

Mentioned in SAL (#wikimedia-operations) [2018-06-15T08:05:39Z] <gehel> rolling restart of elasticsearch eqiad for plugin upgrade - T194245

Mentioned in SAL (#wikimedia-operations) [2018-06-15T15:19:08Z] <gehel> rolling restart of elasticsearch eqiad for plugin upgrade completed - T194245

Vvjjkkii renamed this task from Implement searching of 'depicts' on commons with the 'quantity' qualifier to mbdaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii removed Cparle as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from mbdaaaaaaa to Implement searching of 'depicts' on commons with the 'quantity' qualifier.Jul 2 2018, 6:07 AM
CommunityTechBot assigned this task to Cparle.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.

Change 435753 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add entity quantity data to new field in search index

https://gerrit.wikimedia.org/r/435753

I think this wouldn't work until we add the proper configs and reindex, but I am not sure this was meant for Wikidata, so I think maybe this can be closed and when the time comes a new task can be opened for setting up configs for SDC.