Page MenuHomePhabricator

CX Unified Dashboard: Support suggestions based on previous edits
Closed, ResolvedPublic

Description

Although, the initial plan was to support suggestions based on user's previous edits, this feature was never implemented for Section Translation/Unified Dashboard. This task is created to capture the design specifications and the implementation plan for it.

Current implementation
Since the creation of Section Translation, page and section suggestions has been based on previously published translations (CX) from the source to the target language. More specifically, we fetch up to 200 such translated articles, and we use these articles as seeds to fetch page suggestions to translate from the Recommendation API. However, given that the Recommendation API doesn't support section translation suggestions, for section suggestions we use such translated articles, and check for each article if there are sections available to translate to the target language, and if so, we display these articles as section suggestions.

Proposed approach

For page suggestions, we already use the Recommendation API which can provide multiple (up to 500) suggestions based on just one seed. So, for the sake of this task, we only need to properly get article seeds that are based on user's previous edits. This can be done based on the following order:

  1. If the user has published any translations using CX/SX, then we should use these published translations as seeds.
  2. If no published translations exist for the given user, we should fetch user's previous edits (any kind of edit) in the source wiki, and use the edited articles as seeds.
  3. If no edits by the user in the source wiki exist, we should fetch the user's previous edits in the target wiki. Given that the suggestion seed titles should be article titles in the source wiki, this case requires another request to fetch the sitelinks for these edited articles, so that we can use the corresponding article titles in the source language - if such sitelink exists.
  4. If no valid suggestion seeds have been found from the previous cases, we should fall back to our current approach (CX published translations). However, we may need to communicate to the user that we do not provide suggestions based on previous edits, since we were not able to find the required suggestion seeds for them.

For section suggestions, things get more complicated, as we need at least one seed per section suggestion - as each seed is not guaranteed to provide such a suggestion, we only get a section suggestion if CXServer API finds missing non-appendix sections to translate for the given seed article. For this reason, my suggested approach here would be to implement a new API endpoint inside the Recommendation API, that uses the CXServer API to provide several valid section suggestions, based on just one seed, in a similar way that multiple suggestions are provided for a given seed for page translations.

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

Change #1056556 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[research/recommendation-api@master] Add support for section translation recommendations

https://gerrit.wikimedia.org/r/1056556

ngkountas changed the task status from Open to In Progress.Jul 25 2024, 10:12 AM
ngkountas claimed this task.

Change #1056556 merged by jenkins-bot:

[research/recommendation-api@master] Add support for section translation recommendations

https://gerrit.wikimedia.org/r/1056556

Patch looked good to me around adding support for section translation recommendations. Just a few thoughts in case you ever want to work on optimizing the part where you check for available section recommendations. Currently the code starts with a large set of candidate articles and then checks each one asynchronously until enough are found with section translations available. I don't know what percentage of candidates actually have section translations available, but if it's a low-ish percentage, then this can potentially run for a while before reaching the desired result set size. Because there are too many possible combinations of pages target languages to cache, I think the best approach would be to optimize the likelihood of finding candidates quickly that have section translations available. So instead of ranking the candidates by relevance when you check for section translations, this could probably be optimized by ranking instead by:

Page length. [...]
Article quality [...]

This makes perfect sense @Isaac. One interesting aspect we found during the analysis of Variables Affecting Deletion Rate of Articles was that longer and more elaborate source articles makes the translation more likely to be deleted (T356765#9793939).

The above applies to full-article translations. So using the page length criteria for section suggestions does not seem problematic.
However, we may need to be careful when applying the same approach for the translation of whole articles. That is, new article translation on desktop (where the user is exposed to a full article that could be intimidating when it is too long).

Thanks for sharing that @Pginer-WMF ! I'd agree that less of a concern for section translation but yeah if we ever incorporated it for article translation, maybe not just a straight ranking and instead we define some ideal range -- e.g., 3000 - 10000 characters -- between which we prioritize source articles. Certainly good food for thought on not just what is a good topical overlap with the query but perhaps what are also good candidate articles for translation in general.

Change #1064078 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] SX: Use new section suggestions endpoint from recommendation API

https://gerrit.wikimedia.org/r/1064078

Change #1064079 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] SX: Use seeds based on user's previous translations/edits

https://gerrit.wikimedia.org/r/1064079

Change #1064078 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] SX: Use new section suggestions endpoint from recommendation API

https://gerrit.wikimedia.org/r/1064078

Change #1064079 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] SX: Use seeds based on user's previous translations/edits

https://gerrit.wikimedia.org/r/1064079

Change #1075030 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] CX3 Build 0.2.0 20240923

https://gerrit.wikimedia.org/r/1075030

Change #1075231 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] CX3 Build 0.2.0 20240923

https://gerrit.wikimedia.org/r/1075231

Change #1075030 abandoned by Nik Gkountas:

[mediawiki/extensions/ContentTranslation@master] CX3 Build 0.2.0 20240923

Reason:

Abandoned in favor of Iac0466a3d2bd906a33c1c6052a92b3be98f5b028

https://gerrit.wikimedia.org/r/1075030

Change #1075231 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] CX3 Build 0.2.0 20240925

https://gerrit.wikimedia.org/r/1075231

Change #1075567 had a related patch set uploaded (by Sbisson; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@wmf/1.43.0-wmf.24] CX3 Build 0.2.0 20240925

https://gerrit.wikimedia.org/r/1075567

Change #1075567 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@wmf/1.43.0-wmf.24] CX3 Build 0.2.0 20240925

https://gerrit.wikimedia.org/r/1075567

Mentioned in SAL (#wikimedia-operations) [2024-09-25T14:08:52Z] <kartik@deploy1003> Started scap sync-world: Backport for [[gerrit:1075567|CX3 Build 0.2.0 20240925 (T374387 T370746 T368422 T374567 T355780 T374559 T374886 T375410)]]

Mentioned in SAL (#wikimedia-operations) [2024-09-25T14:10:55Z] <kartik@deploy1003> kartik, sbisson: Backport for [[gerrit:1075567|CX3 Build 0.2.0 20240925 (T374387 T370746 T368422 T374567 T355780 T374559 T374886 T375410)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-09-25T14:22:59Z] <kartik@deploy1003> Finished scap sync-world: Backport for [[gerrit:1075567|CX3 Build 0.2.0 20240925 (T374387 T370746 T368422 T374567 T355780 T374559 T374886 T375410)]] (duration: 14m 06s)

@Pginer-WMF this task can be closed as done, as we properly support suggestions based on previous edits using the "For you" filter, which is also the default.