Page MenuHomePhabricator

[epic] Structured document translation
Open, MediumPublic

Description

Along with plaintext translation, support for translating structured documents using MT system is a common and useful feature.

  • html
    • Supported in MinT, uses a two stage beautifulsoup based parsing
    • webpage by providing a URL is supported, but has performance issues and bugs with arbitrary webpage translation. Need more testing.
  • markdown
    • supported in MinT by converting markdown to html and translating. The translated html is coverted back to markdown
    • Need to explore if markdown parsers like marko etc can help to avoid going through html format. Need more testing
  • json
    • supported in MinT, uses a two stage parsing. Translates values that are strings
  • svg
    • supported in MinT, uses a two stage parsing. Translates text, textpath nodes. Does not support tspan yet.
  • Wikipedia i18n format strings. See T341544: Wikitext syntax is translated when requesting translations via MinT
  • Wikitext (T347018)
    • The easiest way to support this is going through html format like markdown
  • OpenDocument formats like odt, odp, ods etc
  • MS Word
  • ..

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedsanthosh
OpenNone
ResolvedPginer-WMF
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
ResolvedWangombe
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenBUG REPORTNone
OpenNone
Resolvedsanthosh