Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn decomp_tx: Do not zero txa before initialization #1265

Merged
merged 2 commits into from
Jul 8, 2024
Merged

Conversation

rinon
Copy link
Collaborator

@rinon rinon commented Jun 28, 2024

This adds some complexity, but it improved performance significantly, especially on intel. Zeroing an entire page of stack with memset (which is what was previously happening) is expensive.

Copy link
Collaborator

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need so many ptr casts here? If it's to go from a ref to a ptr, can we use ptr::from_{ref,mut}? They're clearer to read, as the manual ptr cast can be doing other things as well, so it's harder to read.

@kkysen kkysen self-assigned this Jun 28, 2024
src/ctx.rs Outdated Show resolved Hide resolved
src/ctx.rs Outdated Show resolved Hide resolved
src/lf_mask.rs Outdated Show resolved Hide resolved
src/lf_mask.rs Show resolved Hide resolved
@rinon rinon changed the base branch from sjc/get_lo_ctx/simplify to sjc/tilestatecontext_locking June 28, 2024 06:22
src/lf_mask.rs Outdated Show resolved Hide resolved
src/lf_mask.rs Show resolved Hide resolved
@rinon rinon force-pushed the sjc/performance branch 2 times, most recently from 293dfe8 to 240c45b Compare June 29, 2024 00:43
@rinon rinon force-pushed the sjc/tilestatecontext_locking branch from 3f37ee7 to 9d18014 Compare June 29, 2024 00:43
@rinon rinon force-pushed the sjc/performance branch 2 times, most recently from c13ec52 to c241b2d Compare June 29, 2024 00:58
@rinon rinon requested a review from kkysen June 29, 2024 00:59
@rinon rinon force-pushed the sjc/tilestatecontext_locking branch from 9d18014 to c71a595 Compare June 29, 2024 00:59
@rinon rinon force-pushed the sjc/tilestatecontext_locking branch from c71a595 to bd8ac77 Compare June 29, 2024 04:17
@nnethercote
Copy link
Contributor

AFAICT this is the single biggest potential perf win in rav1d.

src/ctx.rs Outdated Show resolved Hide resolved
src/ctx.rs Outdated Show resolved Hide resolved
src/ctx.rs Outdated Show resolved Hide resolved
src/ctx.rs Outdated Show resolved Hide resolved
src/lf_mask.rs Outdated Show resolved Hide resolved
src/lf_mask.rs Show resolved Hide resolved
@rinon rinon force-pushed the sjc/tilestatecontext_locking branch from bd8ac77 to 2506ce8 Compare July 1, 2024 21:17
@rinon rinon force-pushed the sjc/performance branch 2 times, most recently from d46f282 to 8b45876 Compare July 1, 2024 22:05
@rinon
Copy link
Collaborator Author

rinon commented Jul 1, 2024

In my testing, it looks like the latest version with less unsafe is equivalent to the unsafe pointer writes with the assert bounds check. Commenting that assert out gains a tiny bit on my i7-1260p, but seems equivalent on my Zen 4 7700X.

Base automatically changed from sjc/tilestatecontext_locking to main July 1, 2024 23:07
src/ctx.rs Outdated Show resolved Hide resolved
@rinon rinon requested a review from kkysen July 2, 2024 17:44
src/lf_mask.rs Outdated Show resolved Hide resolved
src/lf_mask.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what that use std::ptr in ctx.rs is for but the lf_mask.rs changes all LGTM now.

src/ctx.rs Outdated Show resolved Hide resolved
@rinon rinon merged commit 412cd4c into main Jul 8, 2024
27 checks passed
@rinon rinon deleted the sjc/performance branch July 8, 2024 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants