-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp unstable MaybeUninit APIs #122
Comments
Please list all the API changes you intend to make. And the motivation doesn't explain why this is better, why we'd want this. |
MaybeUninit::{uninit_array,array_assume_init}
now that transpose APIs are available
Done. The motivation is from #110. |
I'm seeing additional changes on rustc PRs that all are related to MaybeUninit. I think a larger overview of what you're aiming for overall rather than piecemeal changes would be helpful. |
I'm not sure I agree since these changes can go through independently, but the big theme is "generalize":
And then unrelated:
|
Quick summary, as far as I understand it. I'm including EDIT: Updated after feedback from SUPERCILEX impl<T, const N: usize> MaybeUninit<[T; N]> {
/// MaybeUninit<[T; N]> into a [MaybeUninit<T>; N]
pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T> MaybeUninit<T> {
// Remove `as_bytes` methods. See below for discussion.
//pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
//pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]
// Removed method. Can be replaced with `array.transpose().assume_init()`
//pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
// Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
//pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]
// Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
//pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
//pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T
// Moved methods.
//pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
//pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
//pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
//pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}
// New array method
impl<T, const N: usize> [MaybeUninit<T>; N] {
/// Transposes a [MaybeUninit<T>; N] into a MaybeUninit<[T; N]>.
pub fn transpose(self) -> MaybeUninit<[T; N]>;
}
// Moved slice methods
impl<T> [MaybeUninit<T>] {
pub fn write_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
pub fn write_slice_cloned(&mut self, src: &[T]) -> &mut [T] where T: Clone
pub unsafe fn assume_init_ref(&self) -> &[T]
pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
}
// Convert a MaybeUninit pointer to a pointer to its underlying type.
// This is not a `From` implementation because that could be surprising in unsafe code.
impl<T> *const MaybeUninit<T> {
pub const fn inner(self) -> *const T {
self as *const T
}
}
impl<T> *mut MaybeUninit<T> {
pub const fn inner(self) -> *mut T {
self as *mut T
}
} |
impl From<*const MaybeUninit<T>> for *const T;
impl From<*mut MaybeUninit<T>> for *mut T; It was decided that these guys should not be From impls since that's too implicit. I'll tackle Just a general note, I'd hesitate to say the "new" methods were "removed." While true from the API diff standpoint, they were just moved. Anyway, thank you! |
Thanks, I've edited my comment to reflect that. |
Nice! Slight tweak: // Removed methods. These can be replaced with `array.as_{,_mut}ptr().inner()`.
//pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
//pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T] |
Oh wait no sorry misread stuff. I think you're fixing it as we speak. Writing out reply rn. |
Yeah, sorry my copy/pasting went askew and I got in a muddle fixing it. Should be sorted now. |
Still not quite cuz I goofed, should be: impl<T> MaybeUninit<T> {
// Renamed from `as_bytes_mut`
pub fn as_mut_bytes(&mut self) -> &mut [MaybeUninit<u8>];
// Removed method. Can be replaced with `array.transpose().assume_init()`
//pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
// Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
//pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]
// Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
//pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
//pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T
// Moved methods.
//pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
//pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
//pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
//pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
//pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
//pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
} |
Ok, I think I got there in the end! Thanks. |
Lol, thanks! Looks good. Maybe also worth pointing out that there's discussion around the write_slice method naming: rust-lang/rust#79995. Seeing all of this listed out is really helpful. It's making me wonder about the as_bytes methods. They feel like a weird restricted transmute. #![feature(maybe_uninit_as_bytes)]
use core::mem::MaybeUninit;
#[derive(Debug)]
enum Variant { A(u64), B(String) }
fn main() {
let mut u = MaybeUninit::new(Variant::A(42));
// How are variants represented again? Who knows, but 69 is better than 42
u.as_bytes_mut()[0].write(69);
// SAFETY: look how safe this is! It's intialized, so nothing could go wrong right?
let u = unsafe { u.assume_init() };
// Ahem
println!("{u:?}");
} Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly. I desperately need to get some other stuff done, but I'll open a PR to nuke the as_bytes methods in a bit. |
I think it just means we need to be clearer about the invariants that More specifically, we've been pretty clear for a while now that |
Right, I would encourage moving the Naming feedback: I'm really not convinced by the name Said otherwise, I think "rework all the MaybeUninit functions" is too big a scope for the issue to feasibly get consensus on. I suspect that there's something smaller and separable but still valuable. For example, I might phrase my initial impression of a bunch of these things as being about "rather than needing different-named creation and consumption methods in MaybeUninit for a bunch of different types, there should be a couple extra transformers or methods on those other types to be able to just use |
@thomcc Sure, I agree with you that MaybeUninit can hold "initialized" values like
Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?
Fully agree. Do we want this issue to be a catch-all MaybeUninit nightly API overhaul? If so I'll close the other FCP and work on making this issue encompass all the relevant changes discussion. |
Whatever libs-api says is best is what you should do 😄 It's definitely good to consolidate the |
It can. Consider a case like: pub fn make_uninitialized<T>(mu: &mut MaybeUninit<T>) {
*mu = MaybeUninit::uninit();
} |
@scottmcm Sounds good! :) I closed #123. TODO(me):
Should be able to get to this in a few days. @thomcc Good point, that lessens my concerns. That said, I remain unconvinced:
|
|
Also, (I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.) |
I'm not sure we should be using C as the role model. 😜 But actually, maybe what I'm advocating for then is to make the as_bytes methods unsafe. I just realized a simple transmute doesn't cut it since you need to set up the slice metadata (and we don't want people to need to deal with that). Anyway, I'll think about this more and put some thoughts in the summary.
Why? It's defined as an inherent method on arrays... the type system should yell at you if you goof. And yeah, happy to bikeshed on the name: how about bellybutton terminology? Innie and outie? Ok but more seriously maybe eversion? |
Sorry, I haven't quite been able to figure out what you mean by this. Are you saying that because writing the bytes is safe, you know you won't cause UB while writing the bytes? As opposed to pointers where writing can cause UB? If so, I can potentially see how that's a good thing though I'm still bothered that it's pushing the unsafety of manipulating a type's raw bytes downstream. I guess what I've been saying is that I don't see much value in being able to write the bytes safely. If I'm understanding your argument correctly, then I'm mostly convinced of the value of the as_bytes methods. Perhaps all the mut variants of as_bytes just need extra loud yelling in the docs that reminds people to ensure the bytes they're putting in the type match its invariants. (Though based on my sidenote below, if everyone is in agreement on which invariants must be matched where, then extra docs could be a nice reminder but probably aren't necessarily.) As a sidenote, this thought process has changed my understanding of safety with regards to raw pointers. I previously thought writing to the pointer was the right place to document invariants about the type's bytes and how they were being met. However, it sounds like the correct place for that documentation is in the places you read the pointer. That is, writing should focus on the validity of the pointer itself whereas reading should focus on the validity of the data the pointer points to. Seeing that written out it seems somewhat obvious, but this is a newsflash to me lol. I'll reread the pointer docs because I don't remember them talking about this and I think it'd be valuable to mention. |
I am saying that writing uninit bytes into (parts of) a Removing Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant. For once, that would be a breaking change. Second, given that What you say about raw pointers make sense. It's still worth being careful on writes though, since the read might be unsuspecting safe code reading through a reference. |
Gotya, that makes sense.
I was more advocating for forcing people to document why their writes meet a type's invariants, but yeah that doesn't make much sense when you're writing uninit bytes or zeros.
I thought this wasn't allowed? The pointer docs say this: The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory. That makes it sound like you need to make a new reference after using a raw pointer. Or is it just saying you have to stack your usage (so create and drop the pointer before accessing the original reference again)? |
You can always do something like fn foo(x: &mut NonZeroI32) {
let ptr = x as *mut NonZeroI32;
unsafe { ptr.cast::<i32>().write(0); } // no UB here
let _val = *x; // but UB here!
} So yes as long as it's properly nested, the ptr access is fine, and even the write itself is fine here, but the read can occur in safe code. |
Makes sense, thank you! TODO(me):
|
Unrelated to the uninit byte methods, but is it reasonable to include either
or
as part of this proposal? The copy produced by Edited for tone. |
@ClydeHobart I'm not quite sure I understand the use case of those methods.
Are you talking about the copy due to Everyone else: ok, I've updated the issue with the latest proposals. I'm pretty happy with everything (minus some needed bikeshedding) except for the |
Add small clarification around using pointers derived from references r? `@RalfJung` One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data? ```rust fn foo(x: &mut NonZeroI32) { let ptr = x as *mut NonZeroI32; unsafe { ptr.cast::<i32>().write(0); } // no UB here // What now? x is considered garbage when? } ```
Add small clarification around using pointers derived from references r? `@RalfJung` One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data? ```rust fn foo(x: &mut NonZeroI32) { let ptr = x as *mut NonZeroI32; unsafe { ptr.cast::<i32>().write(0); } // no UB here // What now? x is considered garbage when? } ```
From @scottmcm: rust-lang/rust#104475 (comment) Under the following assumptions:
Then I think the new proposal would be the same as what we have right now except:
Then people will be expected to create arrays the normal way and can therefore access individual elements. To perform array-wise operations, they will convert the array to a slice and use the slice methods. Thoughts? |
This pattern where we want some functionality on some type Adding functionality using on I'm hoping we might be able to find an ergonomic solution that works for more than just |
I think you're right: we shouldn't be special casing MaybeUninit. So that leaves two things from this ACP:
I'm just speculating here, so I'd also be ok with closing everything and saying we'll wait to find out what the general solution looks like. |
Add small clarification around using pointers derived from references r? `@RalfJung` One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data? ```rust fn foo(x: &mut NonZeroI32) { let ptr = x as *mut NonZeroI32; unsafe { ptr.cast::<i32>().write(0); } // no UB here // What now? x is considered garbage when? } ```
IMO |
I think we can strike a good balance by making the important methods inherent. I think that means Thoughts? |
https://github.com/rust-lang/rust/pull/103131/files shows just how common the assume_init methods are and they end up being much more concise. |
Rename MaybeUninit::write_slice A step to push rust-lang#79995 forward. rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
Rename MaybeUninit::write_slice A step to push #79995 forward. rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
Rename MaybeUninit::write_slice A step to push #79995 forward. rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
Rename MaybeUninit::write_slice A step to push #79995 forward. rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
Could this ACP be updated to reflect the changes that have already been made to nightly and what it is still proposing to do? For example, |
Goals
Many of the new MaybeUninit APIs deal with slices or arrays. The current state of affairs is unfortunate because it leads to API duplication. An API may need to be duplicated 3 times: once on MaybeUninit and then another two times for slices and arrays. In an ideal world, the type system could express some sort of container API where if something contains a MaybeUninit then certain methods are available. Since we don't live in that world, the primary goal is to make as many MaybeUninit methods available as possible on slices and arrays without duplication.
Other goals include guiding users towards the right APIs and making unsafely more visible.
Problematic APIs
MaybeUninit::{uninit_array,array_assume_init}
These APIs let you create MaybeUninit arrays, but do not let you manipulate them. Any APIs on MaybeUninit would need to be duplicated on arrays or accessed via slices (requiring borrowing).
Proposal
An API that converts between
[MaybeUninit<T>; N]
<->MaybeUninit<[T; N]>
. Since you can get aMaybeUninit
with your array inside it, anything on MaybeUninit will work for arrays.Currently it is named
transpose
but there may be clearer names.Alternatives
Implement index traits on
MaybeUninit<[T; N]>
My two primary concerns are that this prevents the natural use of array initialization syntax and would require bounds checking when that may not be desirable.
Furthermore, I don't think this works with MaybeUninits of non-Copy types unless you call the index method directly?
slice_assume_init
variantsAssuming a transpose-like API is not possible for slices, these methods need to be available.
Proposal
Currently they aren't inherent methods but should be.
as_bytes
variantsBased on the discussion below we should keep these methods to circumvent unsafety around invalid pointers.
Proposal
Currently they aren't inherent methods but should be. Add documentation warning users to be extra careful about meeting the invariants of the type they are mucking around with.
slice_as_ptr
variantsThese methods are questionable because they combine the removal of bounds checking with raw pointer conversion. You may want one but not the other, and in either case, you should be required to use separate unsafes.
Proposal
Get rid of them and instead provide safe MaybeUninit pointer conversion to its underlying type (T). That is,
*{const,mut} MaybeUninit<T> -> *{const,mut} T
.write_slice
variantsBased on discussion in the tracking issue, the name of these methods has not conveyed the fact that write_cloned does not drop old items. I believe we should just use consistent names with slices and document this behavior in bold or something. And at worst, memory leaks are not unsafe in rust so eh.
Proposal
{copy,clone}_from_slice
uninit
This method cannot be used directly in array initializers if T is not Copy. However, using a const gets around this.
Proposal
Add an UNINIT const that does the same thing as
uninit
.Drawbacks
API duplication. This also isn't necessary with transpose, though it would look prettier.
API diff
The text was updated successfully, but these errors were encountered: