Implement intrinsic for swapping values #111744

AngelicosPhosphoros · 2023-05-18T23:04:23Z

This allows move target- and backend-specific optmization from library code to codegen.
Also, this should make MIR-optimization simpler.

Main optimization implemented in this PR makes backend generate swap without using allocas
removing unneccessary memory writes and reads and reducing stack usage.

One of the main optimizations is using larger integer chunks for swapping in x86_64 by utilizing unaligned reads/writes. It reduces code size (especially for debug builds) and prevent cases of ineffective vectorizations like load <4 x i8> (LLVM doesn't vectorize it further despite vectorizing load i32).

Also added more tests.

rustc_codegen_cranelift implementation is naive because bjorn3 promised to rewrite it later.

rustc_codegen_gcc uses same implementation via rustc_codegen_ssa.

Previous try: #98892

rustbot · 2023-05-18T23:04:30Z

r? @eholk

(rustbot has picked a reviewer for you, use r? to override)

compiler/rustc_codegen_ssa/src/mir/intrinsic.rs

compiler/rustc_codegen_cranelift/src/intrinsics/mod.rs

compiler/rustc_codegen_gcc/src/builder.rs

RalfJung · 2023-06-03T10:03:13Z

compiler/rustc_const_eval/src/interpret/intrinsics.rs

+ self.read_pointer(&args[0])?,
+ self.read_pointer(&args[1])?,
+ 1,
+ self.deref_operand(&args[0])?.layout,


Using args[0] twice here isn't good, each argument should be converted to its high-level representation exactly once.

Does let layout = self.layout_of(substs.type_at(0))? like in some of the other intrinsics help?

RalfJung · 2023-06-03T10:03:57Z

compiler/rustc_const_eval/src/interpret/memory.rs

@@ -1222,6 1222,76 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {

 Ok(())
 }
+
+ pub fn swap_memory(


mem_swap_nonoverlapping seems like a better name

EDIT: But first decide what the semantics should be -- typed copy or untyped copy.

RalfJung · 2023-06-03T10:04:38Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ let align = layout.align.abi;
+ let Some((x_alloc_id, x_offset, _)) = self.get_ptr_access(x_ptr, size, align)? else {
+ // Called on ZST so it is noop.
+ return Ok(())


This is not good. If the 2nd pointer is invalid you would miss the UB! It is important to check all conditions before short-circuiting. Roughly follow that mem_copy above does.

Hi, @RalfJung !
Is is now correct way or do I need to call M::before_memory_read and M::before_memory_write too?

You'll need to call these hooks the same way mem_copy does.

RalfJung · 2023-06-03T10:04:59Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ count: u64,
+ layout: ty::layout::TyAndLayout<'tcx>,
+ ) -> InterpResult<'tcx> {
+ let size = layout.size;


The size of the access is count * layout.size, isn't it?

RalfJung · 2023-06-03T10:05:34Z

compiler/rustc_const_eval/src/interpret/memory.rs

+
+ let tmp_stack_alloc = {
+ let can_use_values = layout.ty.is_primitive() || layout.ty.is_slice();
+ if can_use_values {


Why do we need 2 code paths here?

RalfJung · 2023-06-03T10:07:24Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ let x_mplace = MPlaceTy::from_aligned_ptr(curr_x_ptr, layout);
+ let y_mplace = MPlaceTy::from_aligned_ptr(curr_y_ptr, layout);
+ if let Some((tmp_x_place, tmp_y_place)) = tmp_stack_alloc {
+ self.copy_op(&x_mplace.into(), &tmp_x_place.into(), false)?;


This is doing a typed swap! Is that really what we want? If it is, then this is the wrong file. memory.rs is for byte-level, untyped operations. This should probably be a function in intrinsics.rs then. Also it should take places as arguments, not pointers, if that is the level of abstraction it acts upon.

Judging from these docs the copy probably should be untyped. That means memory.rs is the right file and taking pointers makes sense. But it means you must not use copy_op in here as that would have too much UB. Use mem_copy instead.

RalfJung · 2023-06-03T10:08:19Z

library/core/src/intrinsics.rs

+ ///
+ /// Swaps 2 values using minimal extra memory depending on target.
+ /// Created to remove target/backend specific optimizations from library code to
+ /// avoid confusing const evaluation and MIRI.


The creation of this creates extra cost for CTFE and Miri, I don't understand the point of the comment here.

RalfJung · 2023-06-03T10:08:44Z

library/core/src/intrinsics.rs

+ ///
+ /// * The region of memory beginning at `x` with a size of `size_of::<T>()`
+ /// bytes must *not* overlap with the region of memory beginning at `y`
+ /// with the same size.


This should clarify whether the swapping happens typed or untyped. For example: if I swap two bool here and they are not 0 or 1, is that UB?

I assume this is used to implement std::ptr::swap_nonoverlapping? In that case ut must be an untyped copy (see the docs). So that should be stated here.

eholk · 2023-06-05T21:44:03Z

@RalfJung - do you mind if I reassign this to you? You seem to have a much better idea of what's needed than I do. Feel free to reroll if you want to review the whole thing. Thanks!

r? @RalfJung

RalfJung · 2023-06-06T06:14:37Z

Sorry, I can only review the interpreter parts of this, not the codegen parts. That will need someone from r? compiler.

This allows move target- and backend-specific optmization from library code to codegen. Also, this should make const eval/miri evaluation simpler. Main optimization implemented in this PR makes backend generate swap without using allocas removing unneccessary memory writes and reads and reducing stack usage. One of the main optimizations is using larger integer chunks for swapping in x86_64 by utilizing unaligned reads/writes. It reduces code size (especially for debug builds) and prevent cases of ineffective vectorizations like `load <4 x i8>` (LLVM doesn't vectorize it further despite vectorizing `load i32`). Also added more tests.

b-naber · 2023-06-13T15:48:01Z

Sorry also not that familiar with codegen stuff. r? compiler

RalfJung

This already looks much better, we are getting there. :)

RalfJung · 2023-06-16T15:51:24Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ pub fn mem_swap_nonoverlapping(
+ &mut self,
+ x_ptr: Pointer<Option<M::Provenance>>,
+ y_ptr: Pointer<Option<M::Provenance>>,
+ count: u64,
+ layout: ty::layout::TyAndLayout<'tcx>,


This is an untyped copy, it shouldn't take a layout. As I said before, it should follow the style of mem_copy.

RalfJung · 2023-06-16T15:52:09Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ let elem_size = layout.size;
+ let align = layout.align.abi;
+
+ if count > i64::MAX as u64 {


This won't be needed, you can't have a valid ptr for 2^63 bytes anyway.

RalfJung · 2023-06-16T15:52:37Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ throw_ub_format!("`count` argument to `swap_nonoverlapping_many` is too large.");
+ }
+
+ let first_ptr_acc = self.get_ptr_access(x_ptr, elem_size * count, align)?;


Please use consistent naming: the arguments are x and y but here it becomes first and second and later you talk about "left" and "right". Stick to a single naming scheme.

RalfJung · 2023-06-16T15:53:58Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ let align = layout.align.abi;
+ let Some((x_alloc_id, x_offset, _)) = self.get_ptr_access(x_ptr, size, align)? else {
+ // Called on ZST so it is noop.
+ return Ok(())


You'll need to call these hooks the same way mem_copy does.

RalfJung · 2023-06-16T15:56:10Z

compiler/rustc_const_eval/src/interpret/memory.rs

+ return Ok(());
+ }
+
+ let tmp_stack_alloc = self.allocate(layout, MemoryKind::Stack)?;


Making this 'stack' memory is not right. This special magic memory.^^ We'll probably need a new MemoryKind, Scratch or something like that.

Are you suggesting to add this new MemoryKind?

P.S. What is so special about MemoryKind::Stack?

Are you suggesting to add this new MemoryKind?

Yes.

P.S. What is so special about MemoryKind::Stack?

It's not special, but it's for stack memory. This here is not stack memory, so we shouldn't give a wrong MemoryKind.

apiraino · 2023-07-13T10:36:33Z

Switching to waiting on author, seems there are some review comments waiting for feedback. Feel free to request a review with @rustbot ready, thanks!

@rustbot author

bors · 2023-07-29T17:19:14Z

☔ The latest upstream changes (presumably #114148) made this pull request unmergeable. Please resolve the merge conflicts.

Dylan-DPC · 2023-08-13T10:22:13Z

@AngelicosPhosphoros any updates on this?

Dylan-DPC · 2023-10-27T17:26:45Z

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

Let codegen decide when to `mem::swap` with immediates Making `libcore` decide this is silly; the backend has so much better information about when it's a good idea. Thus this PR introduces a new `typed_swap` intrinsic with a fallback body, and replaces that fallback implementation when swapping immediates or scalar pairs. r? oli-obk Replaces rust-lang#111744, and means we'll never need more libs PRs like rust-lang#111803 or rust-lang#107140

rustbot assigned eholk May 18, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 18, 2023

This comment has been minimized.

Sign in to view

bjorn3 reviewed May 19, 2023

View reviewed changes

compiler/rustc_codegen_ssa/src/mir/intrinsic.rs Show resolved Hide resolved

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch 2 times, most recently from 569cb1a to 57714da Compare May 19, 2023 14:44

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 57714da to 6d29013 Compare May 19, 2023 16:29

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 6d29013 to 3ee1ae5 Compare May 19, 2023 17:13

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 3ee1ae5 to 0b23a99 Compare May 19, 2023 18:51

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 0b23a99 to 260908f Compare May 19, 2023 18:58

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 260908f to e611cd5 Compare May 19, 2023 19:05

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from e611cd5 to b1cb39f Compare May 19, 2023 19:11

This comment has been minimized.

Sign in to view

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from a31c079 to 4bfaa1d Compare May 20, 2023 11:29

AngelicosPhosphoros mentioned this pull request May 20, 2023

[WIP]Use unaligned read/writes for core::mem::swap on x86_64 #98892

Closed

scottmcm reviewed May 20, 2023

View reviewed changes

compiler/rustc_codegen_cranelift/src/intrinsics/mod.rs Show resolved Hide resolved

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 17714ca to 3d0cc7b Compare May 20, 2023 17:58

AngelicosPhosphoros commented May 21, 2023

View reviewed changes

compiler/rustc_codegen_cranelift/src/intrinsics/mod.rs Show resolved Hide resolved

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from a456d13 to 1a20b9a Compare May 27, 2023 16:44

AngelicosPhosphoros commented May 27, 2023

View reviewed changes

compiler/rustc_codegen_gcc/src/builder.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jun 3, 2023

View reviewed changes

rustbot assigned RalfJung and unassigned eholk Jun 5, 2023

rustbot assigned b-naber and unassigned RalfJung Jun 6, 2023

AngelicosPhosphoros force-pushed the swap_intrinsic_impl branch from 063220c to c6db014 Compare June 12, 2023 17:03

rustbot assigned TaKO8Ki and unassigned b-naber Jun 13, 2023

RalfJung reviewed Jun 16, 2023

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 13, 2023

Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 27, 2023

Dylan-DPC closed this Oct 27, 2023

scottmcm mentioned this pull request Mar 16, 2024

Let codegen decide when to mem::swap with immediates #122582

Merged

Implement intrinsic for swapping values #111744

Implement intrinsic for swapping values #111744

Conversation

AngelicosPhosphoros commented May 18, 2023 • edited Loading

rustbot commented May 18, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

RalfJung Jun 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eholk commented Jun 5, 2023

RalfJung commented Jun 6, 2023

b-naber commented Jun 13, 2023

RalfJung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AngelicosPhosphoros Jun 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apiraino commented Jul 13, 2023

bors commented Jul 29, 2023

Dylan-DPC commented Aug 13, 2023

Dylan-DPC commented Oct 27, 2023

AngelicosPhosphoros commented May 18, 2023 •

edited

Loading

RalfJung Jun 3, 2023 •

edited

Loading

AngelicosPhosphoros Jun 16, 2023 •

edited

Loading