Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ptr range iterable #91000

Closed
wants to merge 7 commits into from
Closed

Make ptr range iterable #91000

wants to merge 7 commits into from

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Nov 18, 2021

This PR adds:

impl<T> Iterator for Range<*const T> { type Item = *const T; }

impl<T> Iterator for Range<*mut T> { type Item = *mut T; }

// and DoubleEndedIterator, FusedIterator, TrustedLen, TrustedRandomAccess

enabling iteration over a range of pointers, such as produced by <[T]>::as_ptr_range and <[T]>::as_mut_ptr_range.

Rationale

It is common for systems interacting with C to need to use pointer ranges for iteration, as begin/end pointer pairs are a design pattern in the C standard library.

// Designed to be called from C   or C.
#[no_mangle]
unsafe extern "C" fn demo(start: *const u16, end: *const u16) {
    for ptr in start..end {
        println!("{}", *ptr);
    }
}

fn main() {
    let slice = &[1u16, 2, 3];
    let range = slice.as_ptr_range();
    unsafe { demo(range.start, range.end); }
}

In pure Rust code such a thing would be done using slices, but in FFI it's not necessarily good to try to turn the ptr pair into a Rust slice at the language boundary — materializing a reference (to a slice, or even to individual elements) is more prone to UB in Rust than directly working with the foreign pointers as pointers. One basically has to be an expert in Rust aliasing rules, which are not authoritatively written down at this point yet, in order to do it correctly. Whereas someone with basic C knowledge can write the pointer-based code and have it be correct.

Compare:

let start: *mut T = ...;
let end: *mut T = ...;
for ptr in start..end {
    // Only the preconditions of the C   callee come into play, which a C   dev
    // is already going to be comfortable reasoning about. Nothing about stacked
    // borrows, concurrency, initializedness, inhabitedness, alignedness, or other
    // Rust-isms is relevant.
    unsafe { cpp(ptr); }
}
let mut slice = std::slice::from_raw_parts_mut(...); // suddenly a pile of extremely subtle
                                                     // invariants in order for this to be allowed
for elem in slice {
    unsafe { cpp(elem as *mut T); } // probably UB
}

Step

Note that the new impls are independent of the previously existing Iterator impl on Range:

impl<A: Step> Iterator for ops::Range<A> {
type Item = A;

That's because the current Step definition has several incompatibilities with the situation of start/end which are not separated by a multiple of size_of::<T>() bytes. For example consider this required invariant on Step::steps_between:

/// # Invariants
///
/// For any `a`, `b`, and `n`:
///
/// * `steps_between(&a, &b) == Some(n)` if and only if `Step::forward_checked(&a, n) == Some(b)`

This says that for arbitrary pointers a and b, which have n range elements in between them, we need that b be losslessly reconstructible from a and n. That's not the case for pointers, where e.g. 0x2 as *const u16..0x7 as *const u16 and 0x2 as *const u16..0x8 as *const u16 contain the same sequence of items when iterated, but different b.

Step is unstable though (#42168). If it were to evolve in a way that accommodates Range<*const T> and Range<*mut T> then we can switch all this over to Step backward compatibly. I don't consider this a goal however.

r? @kennytm

@dtolnay dtolnay added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. labels Nov 18, 2021
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 18, 2021
Copy link
Member

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add tests for start > end (check is_empty() is handling these cases) and the wrapping behavior (the case mentioned in #89900)?

Copy link
Member

@the8472 the8472 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks quite branchy, do for in iter and iter.for_each loops produce reasonable assembly?

library/core/src/iter/range/ptr.rs Show resolved Hide resolved
library/core/src/iter/range/ptr.rs Show resolved Hide resolved
@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

could you add tests for start > end (check is_empty() is handling these cases) and the wrapping behavior (the case mentioned in #89900)?

Good call — added in 7ac5ca3.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

The code looks quite branchy, do for in iter and iter.for_each loops produce reasonable assembly?

Still looking in to this. Would you consider this a blocker for this API if the assembly isn't the best?

@the8472
Copy link
Member

the8472 commented Nov 18, 2021

Would you consider this a blocker for this API if the assembly isn't the best?

No, if this is primarily used for FFI then a few extra instructions won't be the bottleneck compared to the non-inlined calls.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

Hmm it turns out quite good actually (disclaimer: I am not an expert at this stuff). All the branching has been compiled down to conditional moves, aside from the backedge which is expected. Here is what I tried:

extern "C" {
    fn foreign(ptr: *const i32);
}

pub fn forloop(start: *const i32, end: *const i32) {
    for ptr in start..end {
        unsafe { foreign(ptr) }
    }
}

pub fn for_each(start: *const i32, end: *const i32) {
    (start..end).for_each(|ptr| unsafe { foreign(ptr) });
}

and according to cargo asm they both produce the following identical code:

asm::forloop:
 push    r15
 push    r14
 push    rbx
 cmp     rdi, rsi
 jae     .LBB0_3
 mov     r15, rsi
 mov     r14, qword, ptr, [rip,  , foreign@GOTPCREL]
.LBB0_2:
 lea     rax, [rdi,  , 4]
 cmp     rax, r15
 mov     rbx, r15
 cmovb   rbx, rax
 cmp     rdi, rax
 cmova   rbx, r15
 call    r14
 mov     rdi, rbx
 cmp     rbx, r15
 jb      .LBB0_2
.LBB0_3:
 pop     rbx
 pop     r14
 pop     r15
 ret

@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

Here is a comparison against the naive loop. In this case the cmov is making it so that this doesn't become an infinite loop or UB when end is too close to the end of the address space, which is admittedly an edge case that people aren't likely to experience in actual use.

for ptr in start..end {
    unsafe { foreign(ptr) }
}
let mut ptr = start;
while ptr < end {
    unsafe { foreign(ptr) }
    ptr = unsafe { ptr.add(1) };
}
 push    r15
 push    r14
 push    rbx
 cmp     rdi, rsi
 jae     .LBB0_3
 mov     r15, rsi
 mov     r14, qword, ptr, [rip,  , f@GOTPCREL]
.LBB0_2:
 lea     rax, [rdi,  , 4]
 cmp     rax, r15
 mov     rbx, r15
 cmovb   rbx, rax
 cmp     rdi, rax
 cmova   rbx, r15
 call    r14
 mov     rdi, rbx
 cmp     rbx, r15
 jb      .LBB0_2
.LBB0_3:
 pop     rbx
 pop     r14
 pop     r15
 ret
 push    r15
 push    r14
 push    rbx
 cmp     rdi, rsi
 jae     .LBB0_3
 mov     r14, rsi
 mov     rbx, rdi
 mov     r15, qword, ptr, [rip,  , f@GOTPCREL]
.LBB0_2:
 mov     rdi, rbx
 call    r15
 add     rbx, 4
 cmp     rbx, r14
 jb      .LBB0_2
.LBB0_3:
 pop     rbx
 pop     r14
 pop     r15
 ret

@the8472
Copy link
Member

the8472 commented Nov 18, 2021

Hmm it turns out quite good actually (disclaimer: I am not an expert at this stuff). All the branching has been compiled down to conditional moves,

I don't know what the wisdom for current CPUs is. My (possibly outdated) understanding is that inside loop bodies branch instructions should be better as long as they're highly predictable because it shortens the effective instruction length of the loop because branches can be speculated away but cmovs can't.
Or maybe some heuristic thinks the cmovs are still cheaper for some reason. idk.

Here is a comparison against the naive loop.

So that's three cmp per loop (two of which result in cmov) vs. one. Maybe the first two could be merged somehow?

despite the division in the source code.

Yeah that's expected, the type sizes are constant and often a power of two so it turns them into shifts.

Test code:

    extern "C" {
        fn foreign(ptr: *const i32);
    }

    pub fn forloop(start: *const i32, end: *const i32) {
        for ptr in start..end {
            unsafe { foreign(ptr) }
        }
    }

Before:

    asm::forloop:
      push    r15
      push    r14
      push    rbx
      cmp     rdi, rsi
      jae     .LBB0_3
      mov     r15, rsi
      mov     r14, qword, ptr, [rip,  , foreign@GOTPCREL]
    .LBB0_2:
      lea     rax, [rdi,  , 4]
      cmp     rax, r15
      mov     rbx, r15
      cmovb   rbx, rax
      cmp     rdi, rax
      cmova   rbx, r15
      call    r14
      mov     rdi, rbx
      cmp     rbx, r15
      jb      .LBB0_2
    .LBB0_3:
      pop     rbx
      pop     r14
      pop     r15
      ret

After:

    asm::forloop:
      push    r15
      push    r14
      push    rbx
      cmp     rdi, rsi
      jae     .LBB0_3
      mov     r15, rsi
      mov     r14, qword, ptr, [rip,  , foreign@GOTPCREL]
    .LBB0_2:
      mov     rax, r15
      sub     rax, rdi
      lea     rbx, [rdi,  , 4]
      cmp     rax, 4
      cmovb   rbx, r15
      call    r14
      mov     rdi, rbx
      cmp     rbx, r15
      jb      .LBB0_2
    .LBB0_3:
      pop     rbx
      pop     r14
      pop     r15
      ret
@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

So that's three cmp per loop (two of which result in cmov) vs. one. Maybe the first two could be merged somehow?

Okay I fixed it to merge two of them.

BeforeAfter
 push    r15
 push    r14
 push    rbx
 cmp     rdi, rsi
 jae     .LBB0_3
 mov     r15, rsi
 mov     r14, qword, ptr, [rip,  , f@GOTPCREL]
.LBB0_2:
 lea     rax, [rdi,  , 4]
 cmp     rax, r15
 mov     rbx, r15
 cmovb   rbx, rax
 cmp     rdi, rax
 cmova   rbx, r15
 call    r14
 mov     rdi, rbx
 cmp     rbx, r15
 jb      .LBB0_2
.LBB0_3:
 pop     rbx
 pop     r14
 pop     r15
 ret
 push    r15
 push    r14
 push    rbx
 cmp     rdi, rsi
 jae     .LBB0_3
 mov     r15, rsi
 mov     r14, qword, ptr, [rip,  , f@GOTPCREL]
.LBB0_2:
 mov     rax, r15
 sub     rax, rdi
 lea     rbx, [rdi,  , 4]
 cmp     rax, 4
 cmovb   rbx, r15
 call    r14
 mov     rdi, rbx
 cmp     rbx, r15
 jb      .LBB0_2
.LBB0_3:
 pop     rbx
 pop     r14
 pop     r15
 ret

@the8472
Copy link
Member

the8472 commented Nov 18, 2021

Maybe throw a likely() on the inner if and see if that turns the cmov into a branch.

If not that's probably as good as it is gonna get.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

I get the same asm with the following diff.

-     self.start = if byte_offset >= mem::size_of::<T>() {
      self.start = if intrinsics::likely(byte_offset >= mem::size_of::<T>()) {

@dtolnay
Copy link
Member Author

dtolnay commented Nov 18, 2021

The code review isn't done yet but gonna check whether anyone else would be on board with having this impl. @rust-lang/libs-api — please see the PR description.

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Nov 18, 2021

Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Nov 18, 2021
@CAD97
Copy link
Contributor

CAD97 commented Nov 19, 2021

for arbitrary pointers a and b, which have n range elements in between them, we need that b be losslessly reconstructible from a and n. That's not the case for pointers, where e.g. 0x2 as *const u16..0x7 as *const u16 and 0x2 as *const u16..0x8 as *const u16 contain the same sequence of items when iterated, but different b.

Yeah, Step as currently designed requires that you're iterating through every representable state between two endpoints.

It's worth noting that reverse iteration in C/C uses the alignment of the end pointer rather than the begin pointer. Step's design actually matches C 's here, in that stepping down only uses the end value (and the number of times to step). This means that there's no way with the current API to make next_back() through Step also be the last() element.

I can't believe I'm actually arguing this, but I think this is the more correct behavior for pointer range iteration. That is, (a..b).rev().collect() needn't be the same sequence as (a..b).collect().into_iter().rev().collect().

It's also worth considering if we don't also want fn(*mut T, *mut T) -> *mut [T] (ptr::slice_from_raw_parts, which is safe; the offset operation is optionally inbounds) and unsafe fn iter(self: *mut [T]) -> impl Iterator<Item=*mut T>, which would allow the use of inbounds pointer arithmetic.

@scottmcm
Copy link
Member

scottmcm commented Nov 24, 2021

It seems very unfortunate that this has to use wrapping_offset, since it's basically only useful in code that will be doing unsafe things with the pointers anyway. (Right?)

To me, I'd much rather see some sort of &[T] -> impl Iterator<Item = NonNull<T>> for this, where the iterator has a named type with an unsafe constructor for building from two pointers for use in extern "C" functions.

And in addition to optimizing better and avoiding the Step conversation, it wouldn't even need to be insta-stable.

That type could also require that the pointer range be canonical (begin <= end), like C does, to further simplify what it needs to handle.

@CAD97
Copy link
Contributor

CAD97 commented Nov 24, 2021

rather see some sort of &[T] ->

The creation of &[T] is what needs to be avoided here (since a Rust reference introduces Rust reference semantics) because we're working directly with pointers and pointer semantics.

We already have fn(*mut T, usize) -> *mut [T], so while I'm partial to the desire of for it in begin..end to just work, I think the better solution is to make *mut [T] unsafely iterable and offer fn(*mut T, *mut T) -> *mut [T].

This cleanly sidesteps the issue of whether Range<*mut _> should iterate every pointer or every aligned pointer (for some value of aligned), and also allows the iteration to use inbounds offsets.

The one limitation is that it requires converting from (*mut, *mut) to (*mut, usize), which isn't strictly required (and changes the behavior w.r.t. poorly aligned pointer pairs). Also, as_ptr_range already uses Range<*mut>.

So that'd be roughly:

// mod ptr;

/// # Safety
/// - that of pointer::offset
///   - pointers into same allocated object
///   - distance is a multiple of size_of::<T>() bytes
unsafe fn slice_from_ptr_range_mut<T>(range: Range<*mut T>) -> *mut [T] {
    slice_from_raw_parts_mut(range.start, range.start.offset(range.end))
}

fn slice_from_wrapping_ptr_range_mut(range: Range<*mut T> -> *mut [T] {
    slice_from_raw_parts_mut(range.start, range.start.wrapping_offset(range.end))
}

impl<T> *mut [T] {
    /// # Safety
    /// Valid pointer & length, no assertions of memory content's validity
    /// Key is that this can use inbounds pointer offsets, and
    /// slice references can use this and just `&mut*` the pointer
    unsafe fn iter(self) -> slice::RawIter {
        // it's slice::Iter but yielding raw pointers
    }
}

@scottmcm
Copy link
Member

The creation of &[T] is what needs to be avoided here

Agreed. I only mention it here because the OP's example has a mention of as_ptr_range, which loses the "these pointers are in-bounds" information. But that might have just been for convenience.

So I think I was meaning something like

// Designed to be called from C   or C.
#[no_mangle]
#[deny(unsafe_op_in_unsafe_fn)]
unsafe extern "C" fn demo(start: *const u16, end: *const u16) {
    for ptr in unsafe { RawIter::new(start, end) } {
        println!("{}", *ptr);
    }
}

With the same RawIter type as in your example there -- unsafe methods on slice-pointers make sense for constructors too.

Plus there might as well be some safe helpers like

impl<T> [T] {
    fn raw_iter(&self) -> RawIter<T>;
    fn raw_iter_mut(&mut self) -> RawIterMut<T>;
}

Since those are definitely in-bounds and canonical and such, so that you can easily get the pointers to deal in if you want to use them for implementing a slice method.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 29, 2021

It sounds like the two concerns so far are:

  • There are extra instructions in the generated code due to using wrapping_add or not assuming/enforcing that start and end are separated by an exact multiple of the item size.

  • Non-obvious whether the reverse iteration should use the alignment of the start pointer or the end pointer.

Regarding the machine code: I think my preference would be to have start..end be directly iterable despite this. Do others find it a blocker? According to #91000 (comment), at least one reviewer does not. The counterproposal of using a faster new iterator type that is unsafe to construct, like unsafe { RawIter::new(start, end) } or unsafe { slice_from_ptr_range(start..end).iter() }, wouldn't be the right tradeoff in my use cases of this, since the point was to minimize the subtleties that the programmer is responsible for to the absolute minimum. For the minority of use cases where this level of perf would be relevant, their needs would probably be better met by directly incrementing the pointer themselves, in the existing way, to have better confidence in what code they're getting without needing to inspect whether the abstraction provided by the iterator type is being fully eliminated.

Regarding reverse iteration: my inclination would be to simply delete the DoubleEndedIterator impl at this point. I believe that reverse iterating this kind of range will be a highly uncommon use case, and having non-multiple-of-size separation between the start and end is going to be even more rare on top of that, so we simply don't need to figure out an answer.

@scottmcm
Copy link
Member

the point was to minimize the subtleties that the programmer is responsible for to the absolute minimum

I think the separate iterator type decreases subtleties.

From the wrapping_offset docs:

The resulting pointer “remembers” the allocated object that self points to; it must not be used to read or write other allocated objects.

which means that any code that wants to read the pointers that come out from this thing need to meet the requirements of offset anyway. (And I can't come up with any plausible use for iteration over these pointers that doesn't want to dereference them.) Said otherwise, I don't think iteration over a range of pointers with different provenance has any reasonably semantic (why is it the provenance of the begin rather than the end?) and thus shouldn't be provided. But my understanding of the nuance here is limited at best, so I'll cc @RalfJung.

Thus having a separate unsafe fn call for that proof obligation would make it more obvious, not more subtle. For discoverability, it would be easy to add Range<pointer> to the #[rustc_on_unimplemented] on Iterator to point to it, so people would find it.

I think this is particularly true under the "systems interacting with C " rationale. If one uses this in a unsafe extern "C" fn foo, then from what I'm used to in C , I'd fully expect it to have the precondition that the pointers are aligned, start <= end, into the same allocated object, and either both null or both non-null -- that's what's needed for span(start, end) to be a valid std::ranges::random_access_range. And since it's an unsafe fn it doesn't even need another unsafe block to call. (Well, unless one opts into that via unsafe_op_in_unsafe_fn. If the goal is to work under C rules one plausibly wouldn't, but even if so, the extra one is probably minor compared to the ones to use the pointer, either by dereferencing or calling other unsafe fns.)

And as a bonus, the new type would be an obvious place to put a RawIter::counted(pointer, length).

@RalfJung
Copy link
Member

which means that any code that wants to read the pointers that come out from this thing need to meet the requirements of offset anyway

Correct. wrapping_offset simplifies the checking "at the time the offset is computed", but not the checking at the time the ptr is used -- and it cannot be used to 'wander' from one allocation to another; the original provenance is maintained.

Said otherwise, I don't think iteration over a range of pointers with different provenance has any reasonably semantic (why is it the provenance of the begin rather than the end?)

For "pointers pointing to different allocations", I fully agree.
However, the pointers might also differ in more subtle aspects of provenance, such as Stacked Borrows. So it should still be clearly documented whether the pointers yielded by the iterator are 'derived from' the beginning ptr or the end ptr.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 30, 2021

@scottmcm's comment #91000 (comment) is compelling — previously I didn't fully appreciate the argument about having a clear place to put the proof obligation for the eventual dereferenceability of the computed pointers. An unsafe constructor where the caller is providing start and end is a good place to enforce what's required about the two. In contrast in the Range<*const T> way from this PR, it's harder to talk about what makes a pointer valid to dereference after it's come out of the iterator.

I've put up some new iterator types with an unsafe constructor in #91390.

@dtolnay dtolnay added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 30, 2021
@yaahc
Copy link
Member

yaahc commented Dec 2, 2021

Since this has moved to a new PR should we close this one / end the FCP?

@dtolnay
Copy link
Member Author

dtolnay commented Dec 2, 2021

I didn't want to immediately kill discussion here because this PR is still relevant if there turns out to be an intransitive preference: if we prefer #91390 over this PR, but we prefer doing nothing over #91390 (i.e. it isn't worth adding), but I prefer this PR over doing nothing.

We can reopen if this turns out preferred after looking at #91390.

@rfcbot fcp cancel

@rfcbot
Copy link

rfcbot commented Dec 2, 2021

@dtolnay proposal cancelled.

@rfcbot rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 2, 2021
@dtolnay dtolnay removed the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Dec 2, 2021
@dtolnay dtolnay closed this Dec 2, 2021
@dtolnay dtolnay deleted the ptrrange branch December 2, 2021 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-fcp This change is insta-stable, so needs a completed FCP to proceed. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants