To better understand how WDQS can be improved (particularly in terms of scaling) we need a better understanding of our users and their use cases.
- Define the questions we want answered from the data we have
- Implement additional data collection if needed
- Work with analysts to answer those questions
Various questions consolidated from Search Platform virtual offsite
- 2% of queries are taking 95% of the server time: what are those 2% of queries doing? Can / should we restrict them? Are those broken bot queries, or actually valuable?
- what are the most expensive User Agents? Can we identify heavy users and work with them to reduce that load?
- what percentage of queries / which kind of queries care about the freshness of the data?
- how important is it to have the full graph to answer questions that people are asking? can we infer that from the queries data?
- do we have strongly connected components in the Wikidata graph? Can this be used to split the graph in sub graphs?
random notes from Search Platform virtual offsite:
- we need to pair someone who knows what to look for with someone who knows how to look for things (@Addshore and @JAllemandou?)
- Currently, we only log the queries. For search, we also log the results, maybe something similar could help answer our questions. Maybe that’s a lot of data. Even just the size of the response might be interesting.
- If we can find entities used in queries and can we group them. If queries are person A, person B, … we should be able to know that queries are about people.