-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected UTF-8 problems #123
Comments
Happens with our |
Another one, this time with
|
For plain text files it would be best to
|
This probably gives up problems with the UTF-8 BOM again, need to check. |
We also need to review the CLIs again, I don't even remember we had an option to process directories (!= directories of lines)... |
Note: working in the feat/flex-line-dirs branch on this, because 1. it came up there 2. the line dirs are especially affected because short texts are the input format there. |
This was on a Linux system, and the "A~-" was an "Ö".
Ã.
problem aboveLookupError: unknown encoding: EUC-TW
problemFor plain text files it would be best to
cli.py
(esp.process_dir
)ocrd_cli.py
- any plain text files supported here?cli_line_dirs.py
--plain-encoding
option so users have the chance to give it manuallyEUC-TW
The text was updated successfully, but these errors were encountered: