Page MenuHomePhabricator

Re-run the MT service usage report (2024)
Open, MediumPublic

Description

After the three previous reports about machine translation usage (T303812 , T310773, T338606), there has been significant change in machine translation support. Machine translation defaults have been adjusted in Content Translation (T341458, T353136, T353259), new models have been introduced supporting more languages (T354666, T355304), and new languages were announced by Google (T369815).

This will help to understand how is MinT used across different languages, which is the impact in translations (and their deletion ratios),. This will allow to analyze both, languages for which the new service has been available for a longer time and those that received machine translation support more recently.

Result: Check full report

Event Timeline

Pginer-WMF triaged this task as Medium priority.Jul 23 2024, 8:33 AM
Pginer-WMF added a subscriber: KCVelaga_WMF.
Pginer-WMF renamed this task from Re-run the MT service usage (2024) to Re-run the MT service usage report (2024).Jul 31 2024, 12:39 PM

SUMMARY
Overall usage

download.png (563×855 px, 43 KB)

  • Google's translation service, which has been the most used translation service across all language pairs has been used for 71% of all all the published translations.
  • MinT (Machine in Translation), a Wikimedia Foundation hosted open-source machine translation service, is the second most used translation service accounting for 16% of all the published translations.
  • Compared to 2023 at the same time Google's service usage reduced by ~9%.
  • No machine translation was used (scratch) for 6% of the published translations using the Content Translation tool.

YoY comparison

  • On average, ~960 articles are translated and published daily using the Content Translation tool.
  • Compared to 2023, the median number of daily translations:
    • using Google reduced from 669 to 626 (6% decrease).
    • using MinT increased from 48 to 118 (145% increase).
  • for other services, the change is not more than 1-2%.

Language pairs where an optional service was used more or close to the default

  • There are 44 language pairs where an optional service (or no service, i.e., scratch) was used more or close to the default.
  • Among the language pairs where Google is the default:
    • For 20 language pairs, MinT was used more than Google.
    • For 10 language pairs, no service was used (i.e., scratch).
  • Among the language pairs where MinT is the default, for 10 language pairs, Google was used more.

Usage of service at each target language

  • MinT was used 100% for translating articles to nine languages.
  • MinT was used for 90% of all translations for translating articles to 18 languages (within the respective language).
  • MinT was used for the majority of the translations (>50% of the services) for translating articles to 18 languages.
  • Google was used 100% for translating articles to nine languages.
  • Google was used for 90% of all services for translating articles to 53 languages.
  • Google was used for the majority of the translations (>50% of the services) to 49 languages.
  • Only for translating articles to Aragonese, Apertium was used for 90% of the translations.
  • Only for translating articles to Chuvash, Yandex was used for 100% of the translations.
  • Only for translating articles to Basque (eu), Elia was the most used service (85% of all services).
  • Even in languages where LingoCloud is supported, the usage has been quite low. For Chinese (zh), it was used for ~2% of 4000 translations, and for less than ~1% of 150 translations to Wu Chinese (wuu).

Percentage of MT content modified by the user

  • The majority of translations across all MT services were modified by at least 10% at the time of publication.
  • For machine translation suggestions from MinT, 32% were modified by less than 10%—the highest of all services.
  • The percentage of translations with a human modification percentage between 10% and 50% for MinT and Google is 54%.
  • The percentage of translations with a human modification percentage higher than 50% is the least for MinT and Elia at 13% and 11%, respectively.
  • Apertium has the highest percentage of translations where the human modification percentage was more than 50%.

Deletion rates by MT service

  • Articles translated using MinT were deleted the least: 2.23% of the 30,000 articles.
  • Yandex and Google had the highest percentage of deleted articles, with more than 3%.
  • 2.7% of the articles translated using Apertium were deleted.

Full report is available at https://analytics.wikimedia.org/published/reports/content_translation/mt_service_usage_analysis_2024_T370749.html

This is great! Thanks @KCVelaga_WMF.
The summary provides a very good and useful overview, looking forward to checking the full report in more detail.

Thank you @Pginer-WMF !

I just realized, I missed a sentence about YoY comparision.

Compared to last year, the year-over-year growth in usage of MinT translation service is ~8pp, while at the same time Google’s service usage reduced by ~9pp.

pp -> https://en.wikipedia.org/wiki/Percentage_point