Page MenuHomePhabricator

Enable in content Translation the new languages Google Translate supports in June 2024
Open, In Progress, MediumPublic

Description

A set of new languages are now available for Google Translate. As with past enablements, it may take some time until they are available in the external APIs. Once they are available we may want to enable the Google support in Content Translation. This ticket compiles the languages to enable. Below you can find them grouped by their current support on Wikipedia:

A) Languages with a Wikipedia and MT support already. We can enable the new support from Google as a non-default to provide them another option, with no need for specific coordination:

  1. ✅ Acehnese (ace)
  2. ✅ Avar/Avaric (av)
  3. ✅ Awadhi (awa)
  4. ✅ Balinese (ban)
  5. ✅ Bambara (bm)
  6. ✅ Bashkir (ba)
  7. ✅ Betawi (bew)
  8. ✅ Breton (br)
  9. ✅ Chamorro (ch)
  10. ✅ Chechen (ce)
  11. Chuvash (cv)
  12. Dinka (din)
  13. Dzongkha (dz)
  14. Faroese (fo)
  15. Fijian (fj)
  16. Fon (fon)
  17. Friulian (fur)
  18. Iloko/Ilocano (ilo)
  19. Jamaican Patois/Jamaican Creole English (jam)
  20. Kapampangan (pam)
  21. Komi (kv)
  22. Konkani (gom)
  23. Latgalian (ltg)
  24. Ligurian (lij)
  25. Limburgish (li)
  26. Lombard (lmo)
  27. Manx (gv)
  28. Meadow/Eastern Mari (mhr)
  29. Meiteilon/Manipuri (mni)
  30. Minang/Minangkabau (min)
  31. Nepalbhasa/Newari (new)
  32. Sepedi/Northern Sotho (nso)
  33. Occitan (oc)
  34. Odia (or)
  35. Ossetian (os)
  36. Pangasinan (pag)
  37. Papiamento (pap)
  38. Rundi (rn)
  39. Sango (sg)
  40. Shan (shn)
  41. Sicilian (scn)
  42. Silesian (szl)
  43. Swati (ss)
  44. Tahitian (ty)
  45. Tetum (tet)
  46. Tibetan (bo)
  47. Tok Pisin (tpi)
  48. Tongan (to)
  49. Tswana (tn)
  50. Tulu (tcy)
  51. Tumbuka (tum)
  52. Tuvan/Tuvinian (tyv)
  53. Udmurt (udm)
  54. Venda (ve)
  55. Venetian (vec)
  56. Western Punjabi (pnb). Google translate supports Punjabi using Shahmukhi script with the code pa-Arab.
  57. Wolof (wo)
  58. Yakut (sah)
  59. Waray (war)

Communication in progress
B) Languages with a Wikipedia but some open questions. We want to check with communities whether the MT support is useful (in bold those getting machine translation for the first time), or some other questions about the specific variant used:

  1. Abkhaz/Abkhazian (ab)
  2. Batak Toba (bbc)
  3. Cantonese (zh-yue)
  4. Kalaallisut (kl)
  5. Madurese (mad)
  6. NKo (nqo)
  7. Northern Sami (se)
    • Do not enable for Northern Sami, a member of the community stated that the quality is poor and won't be useful for their work.
  8. Bikol Google uses code bik. Wikipedia uses bcl for Central Bikol, but is is unclear whether that is the variant supported by Google.
    • Enable for Central Bikol. A contributor indicated that the MT will be useful in their Wikipedia.
  9. Crimean Tatar (crh). Google translate provides translations with Cyrillic script, Crimean Tatar Wikipedia uses both Latin and Cyrillic scripts using a converter, we may want to check if the Google support is useful for the community
  10. Fulani/Fula (ff) This language has several varieties with several language codes, we may need to check with the community whether the variant provided by Google Translate is useful.
  11. Kikongo (kg) We need to check with the community whether the variant provided by Google Translate is useful. In particular we may want to check if they find it useful to use the translations google provides for Kongo (kg), the ones provided for Kituba (ktu), or none of them
  12. Nahuatl (nah) Google uses code nhe. We need to check with the community whether the variant provided by Google Translate is useful
  13. Romani(rom) The Vlax Romani wikipedia uses rmy code. We need to check with the community whether the variant provided by Google Translate is useful
  14. Tamazight Google uses code ber and supports both Tifinagh and Latin scripts. Wikipedia uses zgh for Standard Moroccan Tamazight (using the Tifinagh script), but is is unclear whether that is the variant supported by Google.
    • Do not enable for Tamazight. A member of the community indicated that Google's variant (Kabyle) is not the same as the Amazigh language with code zgh, which is officially recognised in Morocco and used in the wiki.

C) Languages with no Wikipedia yet:

  1. Acholi (ach)
  2. Afar (aa) In Incubator
  3. Alur (alz)
  4. Baluchi (bal) In incubator with three projects for codes bgp, bgn, and bcc
  5. Baoulé (bci) In Incubator
  6. Batak Karo (btx)
  7. Batak Simalungun (bts)
  8. Bemba (bem)
  9. Buryat (bua)
  10. Chuukese (chk)
  11. Dari (prs) Google uses fa-AF code
  12. Dogri (doi in Google, dgo in Wikimedia) In Incubator
  13. Dombe (ndq)
  14. Dyula (dyu)
  15. Ga (gaa) In Incubator
  16. Hakha Chin (cnh) In Incubator
  17. Hiligaynon (hil) In Incubator
  18. Hunsrik (hrx) In Incubator
  19. Iban (iba) In Incubator
  20. Jingpo (kac)
  21. Kanuri (kr) In incubator with code knc
  22. Khasi (kha)
  23. Kiga (cgg)
  24. Kituba (ktu)
  25. Kokborok (trp)
  26. Krio (kri) In Incubator
  27. Luo (luo) In Incubator
  28. Makassar (mak)
  29. Mam (mam)
  30. Marshallese (mh) In Incubator
  31. Marwadi (mwr in Google, rwr in Wikimedia) In Incubator
  32. Mauritian Creole (mfe) In Incubator
  33. Mizo (lus) In Incubator
  34. Ndau (ndc)
  35. Nuer (nus) In Incubator
  36. Qʼeqchiʼ (kek)
  37. Seychellois Creole (crs)
  38. Southern Ndebele (nr) In Incubator
  39. Susu (sus)
  40. Tiv (tiv)
  41. Yucatec Maya (yua) In Incubator
  42. Zapotec (zap) In Incubator

D) Languages not to enable:

  • Santali (sat) Google translate uses Latin script, Santali Wikipedia uses Ol Chiki script.

Related: T308248: Newly supported languages in Google Translate

Event Timeline

Pginer-WMF renamed this task from Enable new languages Google Translate enabled in June 2024 to Enable in content Translation the new languages Google Translate enabled in June 2024.Jul 11 2024, 12:56 PM
Pginer-WMF renamed this task from Enable in content Translation the new languages Google Translate enabled in June 2024 to Enable in content Translation the new languages Google Translate supports in June 2024.
Pginer-WMF triaged this task as Medium priority.
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF updated the task description. (Show Details)

Change #1062484 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Google: Add support for 59 more languages

https://gerrit.wikimedia.org/r/1062484

The https://translation.googleapis.com/language/translate/v2/languages api to list supported languages shows all new languages. However, the actual translation fails for new languages:

{
  "error": {
    "code": 400,
    "message": "Bad language pair: en|to",
    "errors": [
      {
        "message": "Bad language pair: en|to",
        "domain": "global",
        "reason": "badRequest"
      }
    ],
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "target",
            "description": "Target language: to"
          }
        ]
      }
    ]
  }

Note: Buryat (bua) is a macrolanguage. Google Translate use the Russian Cyrillic variant, which we have a Wikipedia at bxr.wikipedia.org

A member of the community indicated that Google's variant (Kabyle) is not the same as the Amazigh language with code zgh, which is officially recognised in Morocco and used in the wiki.

There is a Wikipedia in Kabyle (kab.wikipedia.org).

This issue has been resolved now. API is working as expected now

The https://translation.googleapis.com/language/translate/v2/languages api to list supported languages shows all new languages. However, the actual translation fails for new languages:

A) Languages with a Wikipedia and MT support already. We can enable the new support from Google as a non-default to provide them another option, with no need for specific coordination:

In this list of 59 languages, some languages are already supported by Google(already supported in cxserver google configuration)

  1. Odia(or)
  2. Iloko/Ilocano (ilo)
  3. Konkani(gom)
  4. Meiteilon/Manipuri (mni)
  5. Sepedi/Northern Sotho (nso)

Change #1062484 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Google: Add support for 54 more languages

https://gerrit.wikimedia.org/r/1062484

Change #1067221 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2024-08-27-045705-production

https://gerrit.wikimedia.org/r/1067221

Change #1067221 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2024-08-27-045705-production

https://gerrit.wikimedia.org/r/1067221

Mentioned in SAL (#wikimedia-operations) [2024-08-27T11:51:53Z] <kart_> Updated cxserver to 2024-08-27-045705-production (T369815)

PWaigi-WMF changed the task status from Open to In Progress.Mon, Sep 2, 8:18 AM
PWaigi-WMF assigned this task to santhosh.