-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YJIT: Enhance the String#<<
method substitution to handle integer codepoint values.
#11032
YJIT: Enhance the String#<<
method substitution to handle integer codepoint values.
#11032
Conversation
This comment has been minimized.
This comment has been minimized.
Hi Kevin. Commenting to give you as much feedback as possible.
I think that the idea of this PR is a good one. It seems like a sensible specialization to implement. I wrote some comments on the code itself. Two things I would like to see in a PR like this:
Seems sensible. Duplicating checks that occur at run-time would not be good but the checks you duplicated occur at compilation time. It makes sense to write the logic for this in a separate function to avoid ending up with a megafunction that is hard to follow. |
42ff8d8
to
f118dd1
Compare
477f936
to
00cc8e4
Compare
Before this commit Protoboeuf YJIT is about 5.28x slower than Google's protobuf implementation, but after this commit it is 4.33x slower. Before:
After:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to what you have is to make a new Rust/C function that essentially does the logic that you currently inline into every site (if codepoint.is_fixnum() { rb_str_buf_cat_byte } else { rb_str_concat }
). That way you only generate one ccall at each site.
It should be easier to read than checking in assembler, perf should be about the same, and it's a code size win.
If you write it in C, it could go into string.c and you can additionally avoid removing static
from rb_str_buf_cat_byte().
I like Alan's idea of embedding some of the checks in a C function to save on code size and maybe make the code a bit more readable. Also thanks Kevin for persisting. You picked a hard problem for your first big PR! 😉 |
00cc8e4
to
869d076
Compare
Okay. I think you were simplifying, but for completeness the check is really (in pseudo-code)
I'll have to introduce a new function. I don't think updating |
Yes, you should make clear that the new function you add is a YJIT helper and by not putting it in headers, express that it has many preconditions that are hard to meet and is not for general usage. |
c89be19
to
edc7703
Compare
a350c88
to
4026c6d
Compare
dff85fe
to
3c3aba4
Compare
I see that you modified the regular string concat to fold it into the C call as well. The C code looks well structured, but the downside of that is that we don't necessarily benefit from type information that YJIT might have. I benchmarked this PR locally on my macbook:
At the moment it seems like it's slightly slower, so not a win at the moment. Initially Aaron had reported a big speedup on protoboeuf with your original patch. I'm sorry about all the back and forth. I don't fully understand why the original patch had such a big speedup in the first place compared to this. It might be worth digging more into where the overhead is coming from and doing more experiments. Maybe we could organize a pairing session with @XrXr ? |
3c3aba4
to
a49aa17
Compare
It looks like the speed-up was on the encoding benchmark and you ran the decoding benchmark. #11032 (comment) |
Mildly embarrassed 😅 Nevertheless, not ideal that it would slow down decode. It does seem slower on encode, too but we should probably run these benchmarks on an AWS machine as well:
Also, @nirvdrum, have you rebased this PR on master recently? Just to make sure it's a fair comparison. |
I put the code that moves the older I've been rebasing the branch each time I edit it (although, it's possible I've forgotten to pull first). It's currently based on 86a762c, which matched the version of master you were using. But, I pushed that about 7 minutes after your comment (sorry, I didn't see it before our I would have mentioned it). It looks like the version you used was based off 83b0ced. |
a49aa17
to
2c2b0ff
Compare
0a6e580
to
1fbe118
Compare
…odepoint values.
Performing the check in YJIT means we can make assumptions about the type. It also improves correctness of stack traces in cases where the codepoint argument is not a String or a Fixnum.
1fbe118
to
44af5ee
Compare
This PR extends YJIT's method substitution for
String#<<
to handle integer codepoints as well. If the string isASCII-8BIT
and the codepoint is a byte value, YJIT will dispatch torb_str_buf_cat_byte
as a fast path for working with binary strings. Otherwise, it'll dispatch to the generalrb_str_concat
just asvm_opt_ltlt
would.rb_str_buf_cat_byte
currently works with bothASCII-8BIT
andUS-ASCII
, but this YJIT side only optimizes forASCII-8BIT
. It could be extended easily enough with an additional comparison. The encoding indices for a handful of encodings, including bothASCII-8BIT
andUS-ASCII
are fixed and sequential, so we could also do a range check. For the time being, I've omitted the handling ofUS-ASCII
. I'd like to get feedback on the simplified PR and extend it withUS-ASCII
handling if needed (which I'm also not convinced we do).Please advise if the mechanism I'm using to handle polymorphic dispatch is incorrect. We already have
jit_rb_str_concat
as a method substitution forString#<<
, but it deliberately only handled string arguments. I could have merged the two, but that struck me as being complicated. However, I also don't know if it's fine to call between methods like this. And, it ends up duplicating some of the type checks to ensure we dispatch to the correct type depending on what we see at compile time. I was unsure on how to handle the runtime guards, so please pay extra attention to that.