Add `Iterator::map_windows` #82413

LukasKalbertodt · 2021-02-22T19:10:06Z

This method returns an iterator over mapped windows of the starting iterator. Adding the more straight-forward Iterator::windows is not easily possible right now as the items are stored in the iterator type, meaning the next call would return references to self. This is not allowed by the current Iterator trait design. This might change once GATs have landed.

The idea has been brought up by @m-ou-se here.

I think this method is quite versatile and useful. I hope the examples in the doc comment sufficiently demonstrate that.

One open question: the current design basically requires that the internal array buffer is shifted on each next call, meaning it has an O(iter.count() * N) runtime complexity per next (where N is the const parameter). I expect this method to be called with very small Ns only, but of course it could be used with large arrays. To avoid this shifting, we would need an array version of VecDeque basically. But this would also mean the closure wouldn't get a simple parameter anymore. I think we want this simpler but "slower for large Ns" version, but what do others thing? Maybe the docs should mention this caveat?

rust-highfive · 2021-02-22T19:10:10Z

r? @sfackler

(rust-highfive has picked a reviewer for you, use r? to override)

LukasKalbertodt · 2021-02-22T19:15:47Z

library/core/src/iter/adapters/map_windows.rs

+ buffer.rotate_left(1);
+ buffer[N - 1] = next;


This is slightly wasteful in theory: the first element does not need to be copied to the last slot as it's overwritten later anyway. I expect this to be optimized out anyway, but I haven't checked.

I haven't found a better non-unsafe way to do this. copy_within requires T: Copy, swap would work, but I doubt doing a bunch of swaps is faster than the current version. Of course, the unsafe would be straight-forward but I still wanted to avoid it for now. Any ideas or opinions?

Using unsafe in std lib is acceptable if it leads to a performance advantage.

Is it possible to just write to the first element then rotate?

Edit: On reflection I think this is probably not that costly either way, have to look at godbolt for it

Writing to the first element doesn't seem to change that much compared to the current approach.

I played a bit on godbolt and I found 3 situations:

They're optimized almost the same way. This often happen for tiny Ts, but sometimes also for bigger ones.

The unsafe version may be optimized to 4-5x less assembly. This often happen for bigger Ts but small N.

The unsafe version is optimized to a memmove call and the safe version to either a lot of moves or a loop. This often happen for bigger Ts and bigger N

I think that, if .rotate_right(1) and .rotate_left(1) don't optimize to the obvious memmove, then that's a bug in those methods that should be fixed.

(I vaguely recall someone on Discord was going to file me a bug about this...)

@SkiFire13 Thanks for testing this stuff. But I agree with @scottmcm in that this should probably optimize to the optimal code basically. The implementation of this can always be changed later, so I wouldn't say it is hugely important to get this as efficient as possible in this PR. Especially before we even agreed we want this method.

library/core/src/iter/traits/iterator.rs

the8472 · 2021-02-22T21:32:00Z

One open question: the current design basically requires that the internal array buffer is shifted on each next call, meaning it has an O(iter.count() * N) runtime complexity per next (where N is the const parameter). I expect this method to be called with very small Ns only, but of course it could be used with large arrays. To avoid this shifting, we would need an array version of VecDeque basically.

There are a few places where this can be done more efficiently: Vec, BinaryHeap and [].
Adding a []::windowed_iter<const N>() could then return overlapping slices. Perhaps we should offer it only where it's efficient?

SkiFire13 · 2021-02-23T10:17:15Z

Adding a []::windowed_iter() could then return overlapping slices. Perhaps we should offer it only where it's efficient?

Wouldn't this be the same as array_windows?

IMO a great advantage of map_windows compared to array_windows is that it doesn't require to borrow anything, so you can return it from functions that don't or can't take the slice as parameter.

Anyway I don't think this is ready to be in the stdlib. I think it would make more sense to have it in itertools for now. There's also the possibility that we could get the same result by combining an hypotetical StreamingIterator::windows with StreamingIterator::map_iter or something like that.

the8472 · 2021-02-23T11:24:11Z

Adding a []::windowed_iter() could then return overlapping slices. Perhaps we should offer it only where it's efficient?

Wouldn't this be the same as array_windows?

Oh. Yeah, exactly what I had in mind.

IMO a great advantage of map_windows compared to array_windows is that it doesn't require to borrow anything, so you can return it from functions that don't or can't take the slice as parameter.

Vec::into_array_windows() could work if you need an owned version.

Of course this is still limiting compared to a general iterator adapter, but you could collect into an intermediate Vec and then create the windows more cheaply from that, e.g.

iter.skip(...).filter_map(...).collect::<Vec<_>>().into_array_windows().fold(...)

scottmcm · 2021-02-24T10:03:33Z

A possible way to avoid the excess data movement: use a 2*N-length internal array. Then most of the time you don't need to move anything, but every N elements you need to move the N-1 things in the buffer back to the beginning (which would be a copy_nonoverlapping). That way it'd be amortized-O(1) copying per step, and when it does a copy it'd be in a nice big chunk that should copy efficiently.

LukasKalbertodt · 2021-02-27T13:28:43Z

Anyway I don't think this is ready to be in the stdlib. I think it would make more sense to have it in itertools for now.

Could you elaborate on that? Why do you feel this is not ready yet?

There's also the possibility that we could get the same result by combining an hypotetical StreamingIterator::windows with StreamingIterator::map_iter or something like that.

Absolutely, and I would love to immediately go that path. But as I said, StreamingIterator (well, hopefully it will be the same as Iterator) is probably still quite a while out.

A possible way to avoid the excess data movement: [...]

That's a great idea! Do you think I should already change my implementation?

JulianKnodt · 2021-02-27T14:43:01Z

Is there an unsafe way to concat two slices of known size to a new size using transmute or try_into? This would help fix the time complexity while still having a nice type &[T; { tail_to_head head_to_tail }]?

the8472 · 2021-02-27T15:04:18Z

Is there an unsafe way to concat two slices of known size to a new size using transmute or try_into?

Even ignoring the issue that it isn't safe in general it would only work on linear memory. If the data is out of order (i.e. needs rotation) then the slices aren't pointing to a linear section of memory.

JulianKnodt · 2021-02-27T17:58:02Z

Hm that's true, but in theory we could add a layer of indirection [&T; <something>](which might be inefficient) but not sure if that's desirable.

Edit:
It's costly because you have to do two memcopies presumably, but here's the general idea: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=a8b8d805a7e52bd2eb634ee681528928

Long post-facto correction: it won't be memcpies but just writing ranges of ptrs into an array, which should be fast?

m-ou-se · 2021-03-21T15:08:23Z

Thinking about this some more, I think this api (too) easily hides that .iter().map_windows() on a slice doesn't actually do .array_windows() but instead keeps a separate array of references which it keeps fully shifting at every iteration, which is a lot less efficient.

Maybe .pairwise_map(|a, &b| ..) would be less of a footgun?

LukasKalbertodt · 2021-04-11T11:34:59Z

To be honest, I'm not sure. pairwise_map seems like a useful API as well, but as it's more specific, it is not a fully replacement for map_windows.

When it comes to map_windows on a slice iterator: yep, I agree that's unfortunate and would happen in the real world from time to time. But I'm not sure if that's really such a big problem, especially if we point it out in the docs. There are plenty other ways people can end up with iterator-chains that are slower than they could be. vec.iter().cloned() is a classical example here.

Dylan-DPC-zz · 2021-04-26T09:46:26Z

@sfackler any updates on this?

JohnTitor · 2021-06-24T06:53:01Z

r? @m-ou-se as Steven is now alumni and Mara left some thoughts.

dtolnay · 2021-08-02T23:30:25Z

library/core/src/iter/traits/iterator.rs

+ /// # Panics
+ ///
+ /// Panics if `N` is 0. This check will most probably get changed to a
+ /// compile time error before this method gets stabilized.


I've added an item for this to the tracking issue.

dtolnay · 2021-08-03T02:27:17Z

library/core/src/iter/adapters/map_windows.rs

+ // to 0, we treat them as uninitialized and treat their copies
+ // as initialized.
+ unsafe {
+ ptr::copy_nonoverlapping(buffer_ptr.add(N), buffer_ptr, N - 1);


The index math on this line seems wrong.

Repro:

fn main() { for () in std::iter::repeat("0".to_owned()) .map_windows(|_: &[_; 3]| {}) .take(4) {} }

free(): double free detected in tcache 2 Aborted (core dumped)

LukasKalbertodt · 2021-08-04T20:32:20Z

@dtolnay Thanks for catching that logic bug. I was surprised my tests did not catch that. Digging deeper, I now believe my whole logic is flawed. In particular, I prepare out inside of next, then modify the underlying array and then return out. That seems utterly broken. My tests also have very weird behavior that I suspect might be a sign of UB. In addition to that, the comment by @SkiFire13 is also a good one: iterators adaptors usually don't call next on the underlying iterator on creation.

Soooo I think I will have to rethink my approach. I'm not sure how quickly I can find the time to do that. Just as a heads up that me fixing this could take a while.

bors · 2021-10-09T19:34:40Z

☔ The latest upstream changes (presumably #89703) made this pull request unmergeable. Please resolve the merge conflicts.

JohnCSimon · 2021-10-24T22:26:30Z

Triage:
This PR has sat idle for a while. Maybe close it as inactive?

LukasKalbertodt · 2021-10-24T22:30:20Z

Yeah let's close it. I don't see myself fixing this PR anytime soon. I still think the API is very interesting and worth playing around with, but right now I don't have the resources to land this.

…, r=Mark-Simulacrum Add `Iterator::map_windows` Tracking issue: rust-lang#87155. This is inherited from the old PR rust-lang#82413. Unlike rust-lang#82413, this PR implements the `MapWindows` to be lazy: only when pulling from the outer iterator, `.next()` of the inner iterator will be called. ## Implementaion Steps - [x] Implement `MapWindows` to keep the iterators' [*Laziness*](https://doc.rust-lang.org/std/iter/index.html#laziness) contract. - [x] Fix the known bug of memory access error. - [ ] Full specialization of iterator-related traits for `MapWindows`. - [x] `Iterator::size_hint`, - [x] ~`Iterator::count`~, - [x] `ExactSizeIterator` (when `I: ExactSizeIterator`), - [x] ~`TrustedLen` (when `I: TrustedLen`)~, - [x] `FusedIterator`, - [x] ~`Iterator::advance_by`~, - [x] ~`Iterator::nth`~, - [ ] ... - [ ] More tests and docs. ## Unresolved Questions: - [ ] Is there any more iterator-related traits should be specialized? - [ ] Is the double-space buffer worth? - [ ] Should there be `rmap_windows` or something else? - [ ] Taking GAT for consideration, should the mapper function be `FnMut(&[I::Item; N]) -> R` or something like `FnMut(ArrayView<'_, I::Item, N>) -> R`? Where `ArrayView` is mentioned in rust-lang/generic-associated-types-initiative#2. - It can save memory, only the same size as the array window is needed, - It is more efficient, which requires less data copies, - It is possibly compatible with the GATified version of `LendingIterator::windows`. - But it prevents the array pattern matching like `iter.map_windows(|_arr: [_; N]| ())`, unless we extend the array pattern to allow matching the `ArrayView`.

rust-highfive assigned sfackler Feb 22, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 22, 2021

LukasKalbertodt commented Feb 22, 2021

View reviewed changes

This comment has been minimized.

Sign in to view

LukasKalbertodt force-pushed the add-iterator-map-windows branch from 35b6692 to acaf8a6 Compare February 22, 2021 19:30

This comment has been minimized.

Sign in to view

the8472 reviewed Feb 22, 2021

View reviewed changes

library/core/src/iter/traits/iterator.rs Show resolved Hide resolved

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 16, 2021

crlf0710 added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Apr 9, 2021

bstrie added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 12, 2021

JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 31, 2021

rust-highfive assigned m-ou-se and unassigned sfackler Jun 24, 2021

m-ou-se added the I-nominated label Jun 30, 2021

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 31, 2021

dtolnay requested changes Aug 3, 2021

View reviewed changes

dtolnay added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 3, 2021

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 22, 2021

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 6, 2021

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 27, 2021

LukasKalbertodt closed this Oct 24, 2021

the8472 mentioned this pull request Jan 1, 2022

Add Iterator::array_windows() #92394

Closed

the8472 added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Jan 1, 2022

frank-king mentioned this pull request Mar 6, 2022

Add Iterator::map_windows #94667

Merged

16 tasks

Easyoakland mentioned this pull request Aug 7, 2023

Iterators generating (many) vectors rust-itertools/itertools#722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Iterator::map_windows` #82413

Add `Iterator::map_windows` #82413

LukasKalbertodt commented Feb 22, 2021

rust-highfive commented Feb 22, 2021

LukasKalbertodt Feb 22, 2021

leonardo-m Feb 22, 2021

JulianKnodt Feb 22, 2021 •

edited

Loading

SkiFire13 Feb 23, 2021

scottmcm Feb 24, 2021

LukasKalbertodt Feb 27, 2021

This comment has been minimized.

This comment has been minimized.

the8472 commented Feb 22, 2021 •

edited

Loading

SkiFire13 commented Feb 23, 2021

the8472 commented Feb 23, 2021

scottmcm commented Feb 24, 2021

LukasKalbertodt commented Feb 27, 2021

JulianKnodt commented Feb 27, 2021

the8472 commented Feb 27, 2021

JulianKnodt commented Feb 27, 2021 •

edited

Loading

m-ou-se commented Mar 21, 2021

LukasKalbertodt commented Apr 11, 2021

Dylan-DPC-zz commented Apr 26, 2021

JohnTitor commented Jun 24, 2021

dtolnay Aug 2, 2021

dtolnay Aug 3, 2021

LukasKalbertodt commented Aug 4, 2021

bors commented Oct 9, 2021

JohnCSimon commented Oct 24, 2021

LukasKalbertodt commented Oct 24, 2021

Add Iterator::map_windows #82413

Add Iterator::map_windows #82413

Conversation

LukasKalbertodt commented Feb 22, 2021

rust-highfive commented Feb 22, 2021

LukasKalbertodt Feb 22, 2021

Choose a reason for hiding this comment

leonardo-m Feb 22, 2021

Choose a reason for hiding this comment

JulianKnodt Feb 22, 2021 • edited Loading

Choose a reason for hiding this comment

SkiFire13 Feb 23, 2021

Choose a reason for hiding this comment

scottmcm Feb 24, 2021

Choose a reason for hiding this comment

LukasKalbertodt Feb 27, 2021

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

the8472 commented Feb 22, 2021 • edited Loading

SkiFire13 commented Feb 23, 2021

the8472 commented Feb 23, 2021

scottmcm commented Feb 24, 2021

LukasKalbertodt commented Feb 27, 2021

JulianKnodt commented Feb 27, 2021

the8472 commented Feb 27, 2021

JulianKnodt commented Feb 27, 2021 • edited Loading

m-ou-se commented Mar 21, 2021

LukasKalbertodt commented Apr 11, 2021

Dylan-DPC-zz commented Apr 26, 2021

JohnTitor commented Jun 24, 2021

dtolnay Aug 2, 2021

Choose a reason for hiding this comment

dtolnay Aug 3, 2021

Choose a reason for hiding this comment

LukasKalbertodt commented Aug 4, 2021

bors commented Oct 9, 2021

JohnCSimon commented Oct 24, 2021

LukasKalbertodt commented Oct 24, 2021

Add `Iterator::map_windows` #82413

Add `Iterator::map_windows` #82413

JulianKnodt Feb 22, 2021 •

edited

Loading

the8472 commented Feb 22, 2021 •

edited

Loading

JulianKnodt commented Feb 27, 2021 •

edited

Loading