Currently the new service (DoctrinePropertyTermStore) uses naive updating. It deletes everything and then inserts everything. The old wb_terms code (TermSqlIndex) first does a select and then a diff to only delete and insert those terms that changed. The new service should also do this.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Addshore | T208425 [EPIC] Kill the wb_terms table | |||
Resolved | ArielGlenn | T226167 audit public tables and make sure we dump them all | |||
Resolved | Addshore | T219175 [Mega] - Migrate data from wb_terms to new schema | |||
Resolved | Addshore | T219121 [Checkpoint 2] Refactor Write & Migration code for Properties | |||
Resolved | JeroenDeDauw | T219295 Create service for writing property terms | |||
Declined | None | T220169 Optimize term updating |
Event Timeline
I looked into this for a bit.
I'm not sure if doing a select&diff is better than the current implementation.. it requires joining all the tables to get to the text nodes and lots of wasted processing power in cases where we update on term out of hundreds (is that the common case actually).
A significant improvement (both architecturally and for this sake) I can think of is to pass what have actually changed to the store writer (instead of passing EntityDocument, I would pass EntityId and Term[] of those terms that have actually been added/updated/deleted. nevermind this, it won't work that way.
I came to a similar conclusion after trying to write some code without looking at this ticket first :) It might still be worth it to do the diff because it helps https://phabricator.wikimedia.org/T220150. I'll comment more there.