EntityPerPage::getEntitiesWithoutTerm should move into a separate EntityWithoutTermFinder, and should be re-implemented based on the page table instead of entity_per_page.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T30599 Deadlock tracking bug (tracking) | |||
Resolved | hoo | T111535 Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks | |||
Declined | None | T51982 Add missing wb_entity_per_page entries on LinksUpdate | |||
Invalid | Lydia_Pintscher | T70176 EntityPerPageTable class should be usable from the client | |||
Resolved | Ladsgroup | T67333 Wikibase\EntityPerPageTable::getItemsWithoutSitelinks slow query with large LIMIT offset | |||
Resolved | Addshore | T114902 Remove numeric entity IDs from database schema | |||
Resolved | Ladsgroup | T95685 Drop wb_entity_per_page table | |||
Resolved | Ladsgroup | T140890 Drop EntityPerPage service | |||
Resolved | hoo | T140891 Factor EntityPerPage::getEntitiesWithoutTerm out into it's own service | |||
Resolved | hoo | T147638 Introduce a setting for entity types supported by Special:EntitiesWithoutLabel/ Special:EntitiesWithoutDescription |
Event Timeline
I just discussed with Daniel if we actually need Special:EntitiesWithoutDescription and Special:EntitiesWithoutLabel They lost their usefulness mostly as Wikidata grew. If you want to find items without a label in a given language the resultset is in most cases too large to be useful. A tool like https://tools.wmflabs.org/wikidata-terminator/ is needed to help find the items where a label or description is actually important to have.
The one remaining usecase for the special pages is then doing this for properties. Here the number is manageable and useful. So we can optimize for this case and remove the item case.
True. I would suggest to restrict them by configuration rather than hard coding the supported entity types. They might still be useful for third parties with only a few items and we get support for that (almost) for free as it seems.
@hoo What drives the restriction is our desire to drop the wb_entity_per_page table. Does terminator need that table? Cab it maintain its own copy?
The idea behind restricting to Properties is: Without wb_entity_per_page, joining against the wb_terms table is inefficient. It would still work for Properties though, because there aren't that many of them.
Generally agreed with the rationale here, but we'll get questions from newbies "where did this page go"/"there's one for properties but not items, why?".
Maybe instead the special page can be extended to enable a more powerful query? You identify that there exist tools that can do this, and there is obviously a use case for identifying these pages, and I think there's enough reason to have support for an out-of-the-box solution for e.g. 3rd parties per hoo.
Change 305496 had a related patch set uploaded (by Hoo man):
Split the EntityPerPage interface
Change 314182 had a related patch set uploaded (by Hoo man):
Move EntitiesWithoutTermFinder::getEntitiesWithoutTerm
Change 314182 merged by jenkins-bot:
Move EntitiesWithoutTermFinder::getEntitiesWithoutTerm
Change 314554 had a related patch set uploaded (by Hoo man):
New SqlEntitiesWithoutTermFinder implementation
Change 314555 had a related patch set uploaded (by Hoo man):
Introduce the "entitiesWithoutTermEntityTypes" setting
Change 315694 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Remove hard-coded supportedEntityTypesForEntitiesWithoutTermListings default
Just to re-explain this, because I think it might not have been clear enough before:
Currently we join the wb_entity_per_page table against the wb_terms table (term_entity_id = epp_entity_id AND term_entity_type = epp_entity_type).
After this change, we will join the page table against the wb_terms table. Given the wb_terms table only has the entity type and the numeric entity id, we will need to use REPLACE() in SQL for this join (term_entity_type = "known-entity-type" AND term_entity_id = REPLACE(page_title, 'known-entity-prefix', '')). Due to this we can only provide this functionality for entity types where we know how to programmatically construct the entity id serialization from the entity-type and the numeric entity id.
we can only provide this functionality for entity types where we know how to programmatically construct the entity id serialization from the entity-type and the numeric entity id.
We do know this for all entity types that provide an entity-id-composer, see https://phabricator.wikimedia.org/diffusion/EWBI/browse/master/WikibaseMediaInfo.entitytypes.php;a9a53ac3ffb8d1470cec5813ed641cca76484eac$101. You can get this via WikibaseRepo::getDefaultInstance()->getEntityIdComposer(). For legacy reasons the EntityIdComposer class always supports Items and Properties, even if they do not have an entity-id-composer configured.
@thiemowmde: The essential part is that we need to be able to do this in SQL, not (only) in PHP.
Change 314555 merged by jenkins-bot:
Introduce the "supportedEntityTypesForEntitiesWithoutTermListings" setting
Change 315694 merged by jenkins-bot:
Remove hard-coded supportedEntityTypesForEntitiesWithoutTermListings default