- Training models
- Quechua Wikipedia qu
- Romansh Wikipedia rm
- Romani Wikipedia rmy
- Rundi Wikipedia rn
- Aromanian Wikipedia roa-rup
- Tarandíne Wikipedia roa-tara
- Rusyn Wikipedia rue
- Kinyarwanda Wikipedia rw
- Sanskrit Wikipedia sa
- Sakha Wikipedia sah
- Santali Wikipedia sat
- Sardinian Wikipedia sc
- Sicilian Wikipedia scn
- Scots Wikipedia sco
- Sindhi Wikipedia sd
- Northern Sami Wikipedia se
- Sango Wikipedia sg
- Serbo-Croatian Wikipedia sh
-
Shan Wikipedia shnsee T308141#8778455 - Sinhala Wikipedia si
- Slovak Wikipedia sk
- Slovenian Wikipedia sl
- Models verification
- Publish Datasets
- Populate the excluded section titles
- Deploy back-end
- Check how the model works on the wikis
- In Search, use hasrecommendation:link to find articles
- Test them on https://api.wikimedia.org/service/linkrecommendation/apidocs/#/default/get_v1_linkrecommendations__project___domain___page_title_
- Inform communities
- Deploy front-end
Description
Details
- Due Date
- Nov 22 2023, 5:00 PM
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | • lbowmaker | T307881 Scaling of link suggestions service | |||
Open | Trizek-WMF | T304110 [EPIC] Deploy "add a link" to all Wikipedias | |||
Resolved | Sgs | T308141 Deploy "add a link" to 15th round of wikis |
Event Timeline
Model evaluation has been completed and below are the backtesting results:
[email protected] | [email protected] | |
quwiki | 0.75 | 0.16 |
rmwiki | 0.84 | 0.65 |
rmywiki | 0.89 | 0.76 |
rnwiki | 0.98 | 0.67 |
roa_rupwiki | 0.76 | 0.57 |
roa_tarawiki | 0.97 | 0.76 |
ruewiki | 0.84 | 0.57 |
rwwiki | 0.83 | 0.65 |
sawiki | 0.75 | 0.13 |
sahwiki | 0.77 | 0.43 |
satwiki | 0.76 | 0.52 |
scwiki | 0.80 | 0.54 |
scnwiki | 0.91 | 0.56 |
scowiki | 0.84 | 0.54 |
sdwiki | 0.75 | 0.45 |
sewiki | 0.95 | 0.69 |
sgwiki | 1.00 | 0.89 |
shwiki | 0.90 | 0.51 |
shnwiki | 0.50 | 0.02 |
siwiki | 0.74 | 0.13 |
skwiki | 0.91 | 0.59 |
slwiki | 0.83 | 0.53 |
CCing @MGerlach, in case he would like to add comments on the backtesting evaluation.
The conclusion on the backtesting results is that most of the languages look fine besides:
- shnwiki has a low precision (0.50) and recall (0.02)
- siwiki's precision (0.74) is slightly lower than the recommended one (0.75)
Talked to @MGerlach about these results and agreed that siwiki should be published but shnwiki shouldn't.
@kostajh, we published datasets for all 21/22 models that passed the evaluation in this round.
I ran this script for adding the link-recommendation task type and populating the excluded sections entries:
PHAB=T308141 for WIKI in quwiki rmwiki rmywiki rnwiki roa_rupwiki roa_tarawiki ruewiki rwwiki sawiki sahwiki satwiki scwiki scnwiki scowiki sdwiki sewiki sgwiki shwiki siwiki skwiki slwiki; do ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'` mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --create-only \ --json \ --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ link-recommendation \ '{ "type": "link-recommendation", "group": "easy" }' jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \ | jq --slurp --compact-output "unique" \ | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --json \ --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \ link-recommendation.excludedSections \ "`cat`" echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json" echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next" echo "Press <Enter> to continue" read # give time for manual verification done
Note that the script didn't populate excludedSections for rnwiki, roa_rupwiki, roa_tarawiki and sgwiki because these were not present in the wiki_sections.jsonl, see T345562.
Change 964949 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 15th round of wikis
Change 964949 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: enable AddLink backend 15th round of wikis
Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:27:16Z] <sgimeno@deploy2002> Started scap: Backport for [[gerrit:964949|GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)]]
Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:28:35Z] <sgimeno@deploy2002> sgimeno: Backport for [[gerrit:964949|GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2023-10-11T07:35:01Z] <sgimeno@deploy2002> Finished scap: Backport for [[gerrit:964949|GrowthExperiments: enable AddLink backend 15th round of wikis (T308141)]] (duration: 07m 45s)
All wikis present results now.
Notes
rnwiki and sgwiki present a very low number of suggestions, 4 and 1 respectively, probably due to the scarce amount of articles (639 and 318). I think we can enable their frontends anyways wishing for the model to generate more results as articles grow. cc @Trizek-WMF
rmwiki and shwiki were not present in wikis.txt file leading to 400 errors on the API requests with this domains as parameters. Probably due to the fact the datasets were published before we fixed the publish-dataset.sh script in T340944. I've added them manually cc @kevinbazira
Just to confirm for Tech News purposes: Is this releasing next week even though there isn't a deployment train next week (20 Nov)? Or does the Tech News entry need to be moved to the following week (27 Nov) instead?
(The current entry says "Starting on Wednesday, a new set of Wikipedias will get "Add a link" [...]" (meaning 22 Nov). Thanks!
Thanks for your question. All required code for Add a link tasks is already in all production wikis. We will enable the new set of Wikipedias by performing a config backport in one of the available windows, probably at 14:00 UTC.
And (IIRC) when this code will be backported, activation is a config change, which is not impacted by the absence of train.
Change 977644 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):
[operations/mediawiki-config@master] GrowthExperiments: enable frontend for 15th round of wikis
Change 977644 merged by jenkins-bot:
[operations/mediawiki-config@master] GrowthExperiments: enable frontend for 15th round of wikis
Mentioned in SAL (#wikimedia-operations) [2023-11-27T14:10:22Z] <urbanecm@deploy2002> Started scap: Backport for [[gerrit:977644|GrowthExperiments: enable frontend for 15th round of wikis (T308141)]], [[gerrit:975378|zghwiki: add timezone, wgSitename (T350241)]], [[gerrit:975376|bbcwiki: add timezone, wgSitename (T350373)]]
Mentioned in SAL (#wikimedia-operations) [2023-11-27T14:11:36Z] <urbanecm@deploy2002> sgimeno and anzx and urbanecm: Backport for [[gerrit:977644|GrowthExperiments: enable frontend for 15th round of wikis (T308141)]], [[gerrit:975378|zghwiki: add timezone, wgSitename (T350241)]], [[gerrit:975376|bbcwiki: add timezone, wgSitename (T350373)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2023-11-27T14:21:45Z] <urbanecm@deploy2002> Finished scap: Backport for [[gerrit:977644|GrowthExperiments: enable frontend for 15th round of wikis (T308141)]], [[gerrit:975378|zghwiki: add timezone, wgSitename (T350241)]], [[gerrit:975376|bbcwiki: add timezone, wgSitename (T350373)]] (duration: 11m 23s)
Checked some wikis from the list - all looks as expected. Leaving the task in the Test in Production columns to monitor it during this week.