Page MenuHomePhabricator

Lexemes: in English, "unknown language" appears instead of "English" in search, and "Q1860" appears instead of "English" on lexeme pages
Closed, ResolvedPublic

Description

"English" is not appearing in some contexts related to lexemes on Wikidata, as reported in this discussion on d:Wikidata:Project chat. This only occurs in the English localization and in localizations that use English as a fallback in those contexts.

  • en: "Unknown language" appears instead of "English" at Special:Search, and "Q1860" appears instead of "English" on lexemes.
  • qqx: "wikibaselexeme-unknown-language" appears instead of "English" at Special:Search, and "Q1860" appears instead of "English" on lexemes.

This does not happen for languages that are not English (e.g. "French" for French lexemes), and does not happen when the interface is localized into a different language (e.g. "anglais" with uselang=fr).

The reason for this is not clear.

Event Timeline

Jc86035 renamed this task from Lexemes: in English and qqx only, "unknown language" appears instead of "English" in search, and "Q1860" appears instead of "English" on lexeme pages to Lexemes: in English, "unknown language" appears instead of "English" in search, and "Q1860" appears instead of "English" on lexeme pages.Jan 26 2020, 8:56 AM
Jc86035 updated the task description. (Show Details)
Jc86035 updated the task description. (Show Details)

So, looks like the table has a hole in it:

mysql:[email protected] [wikidatawiki]> SELECT   wbit_item_id as id,   wby_name as type,   wbxl_language as language,   wbx_text as text FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON
wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON wbxl_text_id = wbx_id WHERE wbit_item_id = 1 860 and wby_name = 'label' and wbxl_language = 'en';
Empty set (0.01 sec)

Interestingly the "formatter cache" appears to still have an entry for the label which can be seen by purging https://www.wikidata.org/wiki/Q54 where English can still be seen.
This cache has a TTL of 1 day, which means the row in the DB probably disappeared in the last 24 hours.

This could be related to T237984: Some property labels are not displayed on Item pages which we saw when properties were being read from the new tables.
The first set of Items are now also being read from the new tables per T225057#5808827 (up to 8 million)

mysql:[email protected] [wikidatawiki]> select * from wbt_text where wbx_text = 'English';
 ---------- ---------- 
| wbx_id   | wbx_text |
 ---------- ---------- 
| 89986287 | English  |
 ---------- ---------- 
1 row in set (0.00 sec)
mysql:[email protected] [wikidatawiki]> SELECT
    ->   wbit_item_id as id,
    ->   wby_name as type,
    ->   wbxl_language as language,
    ->   wbx_text as text
    -> FROM wbt_item_terms
    -> LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id
    -> LEFT JOIN wbt_type ON wbtl_type_id = wby_id
    -> LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id
    -> LEFT JOIN wbt_text ON wbxl_text_id = wbx_id
    -> WHERE wbx_text = 'English'
    -> AND wby_name = 'label'
    -> AND wbxl_language = 'en';
 ---------- ------- ---------- --------- 
| id       | type  | language | text    |
 ---------- ------- ---------- --------- 
| 28224411 | label | en       | English |
 ---------- ------- ---------- --------- 
1 row in set (0.00 sec)

https://www.wikidata.org/w/index.php?title=Q28224411&action=history not edited since 13 June 2019 and still correctly relates to the english label

However the maintenance script populating the new tables is at 29 million now, so likely the maint script running over Q28224411 had something to do with it.

Mentioned in SAL (#wikimedia-operations) [2020-01-27T12:10:20Z] <Amir1> ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id 1860 --to-id 1860 (T243705)

Interestingly the "formatter cache" appears to still have an entry for the label which can be seen by purging https://www.wikidata.org/wiki/Q54 where English can still be seen.
This cache has a TTL of 1 day, which means the row in the DB probably disappeared in the last 24 hours.

This is not quite correct as I believe this formatter cache gets the data straight from the item rather than the terms related tables.
This then lines up with the fact that this was reported on the VP over 24 hours ago.

We will run the maint script over the item in question now to fix the issue.

mysql:[email protected] [wikidatawiki]> SELECT
    ->   wbit_item_id as id,
    ->   wby_name as type,
    ->   wbxl_language as language,
    ->   wbx_text as text
    -> FROM wbt_item_terms
    -> INNER JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id
    -> INNER JOIN wbt_type ON wbtl_type_id = wby_id
    -> INNER JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id
    -> INNER JOIN wbt_text ON wbxl_text_id = wbx_id
    -> WHERE wbit_item_id = 1860
    -> AND wbxl_language = 'en'
    -> AND wby_name = 'label';
 ------ ------- ---------- --------- 
| id   | type  | language | text    |
 ------ ------- ---------- --------- 
| 1860 | label | en       | English |
 ------ ------- ---------- --------- 
1 row in set (0.00 sec)
mysql:[email protected] [wikidatawiki]> SELECT
    ->    wbit_item_id as id,
    ->    wby_name as type,
    ->    wbxl_language as language,
    ->    wbx_text as text
    ->  FROM wbt_item_terms
    ->  LEFT JOIN wbt_term_in_lang ON wbit_term_in_lang_id = wbtl_id
    ->  LEFT JOIN wbt_type ON wbtl_type_id = wby_id
    ->  LEFT JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id
    ->  LEFT JOIN wbt_text ON wbxl_text_id = wbx_id
    ->  WHERE wbx_text = 'English'
    ->  AND wby_name = 'label'
    ->  AND wbxl_language = 'en';
 ---------- ------- ---------- --------- 
| id       | type  | language | text    |
 ---------- ------- ---------- --------- 
|     1860 | label | en       | English |
| 28224411 | label | en       | English |
 ---------- ------- ---------- --------- 
2 rows in set (0.00 sec)

Hole filled.
A purge of any of the affected page should fix them.

Change 568968 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikibase@master] Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568968

Change 568975 had a related patch set uploaded (by Ladsgroup; owner: Addshore):
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.16] wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568975

Change 568968 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568968

Change 569023 had a related patch set uploaded (by Tarrow; owner: Tarrow):
[mediawiki/extensions/Wikibase@master] wbterms: tests for not deleting used terms rows

https://gerrit.wikimedia.org/r/569023

Change 568975 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@wmf/1.35.0-wmf.16] wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds

https://gerrit.wikimedia.org/r/568975

Change 569023 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] wbterms: tests for not deleting used terms rows

https://gerrit.wikimedia.org/r/569023