Page MenuHomePhabricator

Integrate improved sentence segmentation algorithm in CXServer
Closed, ResolvedPublic

Description

After improvements in section segmentation (T338292, T343781) in the context of MinT, Content and Section Translation can also benefit of a better approach to split sentences.

For Content Translation on desktop, sentence segmentation is used for highlighting equivalent content, while the mobile version makes more intense use of it to allow users to review proposed translations for each sentence.

With the new approach, issues such as the ones reported in T338689: error translating court cases out of english should be fixed.

Event Timeline

Change 961080 had a related patch set uploaded (by Santhosh; author: Santhosh):

[mediawiki/services/cxserver@master] Use sentencex library for sentence segmentation

https://gerrit.wikimedia.org/r/961080

Change 961080 merged by jenkins-bot:

[mediawiki/services/cxserver@master] Use sentencex library for sentence segmentation

https://gerrit.wikimedia.org/r/961080

Change 961979 had a related patch set uploaded (by KartikMistry; author: KartikMistry):

[operations/deployment-charts@master] Update cxserver to 2023-09-28-043003-production

https://gerrit.wikimedia.org/r/961979

Change 961979 merged by jenkins-bot:

[operations/deployment-charts@master] Update cxserver to 2023-09-28-043003-production

https://gerrit.wikimedia.org/r/961979