-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Unicode tables #745
Conversation
I'll defer to @Kojoley. He's doing the heavy-lifting of spirit maintenance. |
I'm actually not informed on the subject. One time I though about updating these, but had never found time to dig into. @djowel is the script handles everything automatically? I see new blocks added, but I have no idea whether it really fixed #744 or not. and IIRC we don't have proper unicode tests so it is also unclear that everything ok after the update. Low priority thing: do we really need these .txt files in the repository? I guess anyone who wants to update tables will download them anyway, so it would be better to replace them with a download script. |
I believe they do. I had tests in place, which failed after switching from internal table to |
Yes, they do. Maybe @3dyd might be interested to contribute some tests ;-)
Makes sense. I trust your judgment, @Kojoley |
It took me a while to connect a few things, and there is an issue. The create_tables.cpp is hardcoded to specific list of General Categories, Derived Properties and Scripts. I've checked and found that the first two didn't change (and only a subset is used in Spirit) but Scripts did change (there are additions and removals). IIUC these things are defined in PropertyValueAliases.txt which is not processed by create_tables.cpp and is not presented in the repository either. Manually processed data I was able to identify is in:
|
It's been a decade now (I think) since those were added. It's good to have someone have a fresh look at it. @3dyd it would be really nice if you can contribute some tests, esp. since you have a keen interest on spirit unicode support. |
@djowel, do you say about unit tests, e.g. one positive and one negative test for all main, sub, derived categories, and several scripts? Or some more comprehensive, but not automated testing, to be done just on my end?
@Kojoley, this still leaves manual work (compile and run Would you be ok with Scripts updated manually? Or some sort of automation is expected here? |
Let's start with something more basic. I think one positive and one negative test makes sense for now. |
…ssigned' and script 'Unknown'
Tests revealed two issues, both were fixed:
|
LGTM. Good to merge? |
Fixes #744
Old Unicode data files had this comment:
Original terms of use no longer contain quoted text from this comment, so I did not modify new Unicode data files but just added
boost-no-inspect
file to ignore the entire directory with these files. Also added https://www.unicode.org/license.txt there as this seems to be necessary.