Type of activity: Pre-scheduled session
Main topic: Building on Wikimedia services: APIs and Developer Resources
The problem
There is around 20 TB of data stored on Wikimedia's relational database infrastructure.
An interface to access that is provided with tools such as Quarry and tools replica access. However, accessing that amount of data is not necessarily easy or obvious- sometimes complex queries or tricks are needed in order to get more data, faster. Doing the right queries can speed up query time 100x or 1000x.
The idea of this is talking about:
- Learning the mediawiki structure, and where things are and why
- Helping queries run faster thought careful planning and query optimization
- Answer community questions about how to do some things better
- Get suggestions on how to improve the labsdb service
- Tell about the new labsdb infrastucture, with 5x more capacity
Expected outcome
- Community members (tool creators, developers, researchers, staff, etc. ) understand mediawiki data model and its relational nature
- Community members can use labsdb faster and get more out of it
- Resources are used efficiently, allowing a larger amount of users utilize them
- Feedback is given on how to improve the service
- Feedback is provided on latest infrastructure improvements
Current status of the discussion
Needs feedback