Page MenuHomePhabricator

[Epic] Better understanding of WDQS users and use cases
Closed, ResolvedPublic

Description

To better understand how WDQS can be improved (particularly in terms of scaling) we need a better understanding of our users and their use cases.

  • Define the questions we want answered from the data we have
  • Implement additional data collection if needed
  • Work with analysts to answer those questions

Various questions consolidated from Search Platform virtual offsite

  • 2% of queries are taking 95% of the server time: what are those 2% of queries doing? Can / should we restrict them? Are those broken bot queries, or actually valuable?
  • what are the most expensive User Agents? Can we identify heavy users and work with them to reduce that load?
  • what percentage of queries / which kind of queries care about the freshness of the data?
  • how important is it to have the full graph to answer questions that people are asking? can we infer that from the queries data?
  • do we have strongly connected components in the Wikidata graph? Can this be used to split the graph in sub graphs?

random notes from Search Platform virtual offsite:

  • we need to pair someone who knows what to look for with someone who knows how to look for things (@Addshore and @JAllemandou?)
  • Currently, we only log the queries. For search, we also log the results, maybe something similar could help answer our questions. Maybe that’s a lot of data. Even just the size of the response might be interesting.
  • If we can find entities used in queries and can we group them. If queries are person A, person B, … we should be able to know that queries are about people.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel updated the task description. (Show Details)
Gehel added subscribers: JAllemandou, Addshore.
Gehel claimed this task.

Work has moved on to more specific analysis tickets.