Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect multiple languages in mixed-language text #4

Closed
pemistahl opened this issue Jan 22, 2022 · 3 comments
Closed

Detect multiple languages in mixed-language text #4

pemistahl opened this issue Jan 22, 2022 · 3 comments

Comments

@pemistahl
Copy link
Owner

Currently, for a given input string, only the most likely language is returned. However, if the input contains contiguous sections of multiple languages, it will be desirable to detect all of them and return an ordered sequence of items, where each item consists of a start index, an end index and the detected language.

Input:
He turned around and asked: "Entschuldigen Sie, sprechen Sie Deutsch?"

Output:

[
  {"start": 0, "end": 27, "language": ENGLISH}, 
  {"start": 28, "end": 69, "language": GERMAN}
]
@pemistahl pemistahl added this to the Lingua 1.1.0 milestone Jan 22, 2022
@jturner116
Copy link

I have tried every single package attempting to find a solution like this and none work well. I will write the most flattering Medium article ever written if you get it working with Lingua :P

@pemistahl
Copy link
Owner Author

Haha, thanks @jturner116. What higher motivation could I wish for? (-; I'm still in the concept phase for this feature but will try to implement some of it as soon as possible.

@pemistahl
Copy link
Owner Author

@jturner116 I've just released Lingua 1.2.0 that has experimental support for detecting multiple languages in mixed-language text. Perhaps you want to try it. If you do, please let me know what you think about it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants