-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode 32 bit integers in arrays #266
Comments
Could we pick the entry size at code generation time? Use 8, 16, 32, or 64 bits depending on how many are needed. |
But that would mean more CPP when we want less. I honestly don't see the appeal in that; when all offsets fit in 16 bit, then the parser is so small that it doesn't matter whether it is twice as big. When offsets need 64 bits, the table will be at least 16GB I think, which is unrealistic as well. Do also note that As for issue93: I implemented the change to 32 bit (in #272) and compiled the generated Haskell file with optimisations:
Still only 347KB of .rodata compared to 460KB .text (reductions, data type stuff, etc.), vs. 266KB .rodata with happy-1.20. It appears that the amount of code we generate still surpasses the size of the table. |
Alright. Only 16 and 32 bit are realistic scenarios. I threw in 8 and 64 just for good measure.
My understanding is that doubling the size of the table will make cache behavior worse. We're talking about an additional 250KB in case of GHC:
On my machine Correct me if I'm wrong, I haven't done any hard measurements.
I don't think we need to emit CPP. Couldn't we determine the appropriate entry size in the code generator itself? Or did you mean any sort of conditional whatsoever? |
I would not worry too much about caches; after all, we are still ultimately writing Haskell, where we have to allocate a lot. Plus, reduction actions are far more costly than a few fetches from RAM, with all those thunk evals and allocating a syntax tree. As for GHC's parser, the more accurate .rodata size is 116KB (with 16 bit offsets), not my initial estimate of 250KB. For my PoC, that increases to 237KB, but
I would be really surprised if table accesses were the bottleneck in realistic uses of Other than that, I haven't done any perf measurements either, but I'm convinced that it doesn't matter much. |
Alright. Perhaps someone someday will measure the actual effect of going from 16-bit to 32-bit, but it won't be me 😄 Correctness is more important anyway. |
Fixed by #280. |
There are multiple issues related to running into the 16 bit limit of the tables encoded by happy:
I don't see the appeal in having 16 bit array entries; let's double it to 32 bit.
"#
, I count about 1M characters. That's at most 250KB of tables (\xFF
encodes one byte); doubling that to 500KB won't hurt.More seriously, I tried
bloaty
on GHC's parser:and then on the repro for #93 (Edit: it turns out that the linked executable contains the whole RTS of course; and the tables are all in .rodata contributing just 380KB):
So actually it's a bit larger than anticipated; I wonder why that is but I'm not going to investigate.
Anyway, I think it's far more important to guarantee that large parsers can be generated correctly rather than to generate them incorrectly in the smallest space possible.
The text was updated successfully, but these errors were encountered: