Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads #2195

Merged
merged 5 commits into from
Feb 14, 2018

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Oct 31, 2017

Formally define the enum #[repr(u32, i8, etc..)] and #[repr(C)] attributes to force a non-C-like enum to have defined layouts. This serves two purposes: allowing low-level Rust code to independently initialize the tag and payload, and allowing C( ) to safely manipulate these types.

Rendered

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 31, 2017
@joshtriplett
Copy link
Member

joshtriplett commented Oct 31, 2017

I really like this, and I think it'll significantly improve FFI.

A few bits of feedback:

  • I'd prefer the alternative where "The tag shouldn't be opaque to outer enums". I don't think that would break any reasonable assumptions made in C code. C can't make any assumptions about the layout of Option<MyEnum> (other than Option on pointer types mapping NULL to None), since Option doesn't use repr (and should not use repr since it would invalidate enum layout optimizations).
  • Please explicitly state that the documentation for repr(X) should explicitly state that native Rust code should not use that attribute unless it needs to manipulate an enum at the raw memory level or pass it to C code non-opaquely, as doing so inhibits useful optimizations.
  • Regarding "A field/method for the tag/payload", the former would be the discriminant_value function (still not stabilized). I don't know that we need a field for the payload, since it won't necessarily have a corresponding type.
  • Nice branch name. 👍

@jld
Copy link

jld commented Oct 31, 2017

There's another representation that's slightly different from the one proposed: a union of structs where each struct's first field is the discriminant. (For FFI there could be an extra union case with only the discriminant, but I believe it's valid C to assume a struct's first field is at offset 0 and access it by pointer casting.) The difference from the struct-of-tag-and-payload representation shows up in cases like this:

#[repr(u8)]
enum MyEnum {
  Word(u64),
  Bytes([u8; 15]),
}

If the whole payload is a union, it gets 8-aligned and the enum takes 24 bytes; if each variant is laid out separately, it takes 16 bytes. Yet this isn't an “enum optimization” in any of the ways that cause the problems identified in this RFC: the C/C interface is well-defined and fairly straightforward, and the discriminant's meaning is unaffected by the bits stored in the data fields.

This is, incidentally, how rustc currently handles enums when no optimizations or special cases apply; note this comment, currently in rustc::ty::layout, which dates back to rust-lang/rust@a301db7:

    /// General-case enums: for each case there is a struct, and they
    /// all start with a field for the discriminant.

@joshtriplett
Copy link
Member

@jld That's an interesting distinction.

Such a representation would always have an advantage on size, but might potentially prove more annoying to define and access from non-C languages. On balance, though, since Rust already uses that layout, using the same layout for the repr case seems reasonable to me.

Can I suggest switching the RFC to that representation, observing that Rust uses that representation by default, and mentioning the "two-element struct" representation as an alternative?

@pythonesque
Copy link
Contributor

I agree with @jld here, though it may turn out that C compatibility concerns are the deciding factor.


The correct C definition is essentially the same, but with the `enum class` replaced with a plain integer of the appropriate type.

In addition, it is defined for Rust programs to cast/reinterpret/transmute such an enum into the equivalent tag union Rust definition above. Seperately manipulating the tag and payload is also defined. The tag and payload need only be in a consistent/initialized state when the value is matched on (which includes Dropping it). This means that the contents of a `#[repr(X)]` enum cannot be inspected by outer enums for storage optimizations -- `Option<MyEnum>` must be larger than `MyEnum`.
Copy link
Contributor

@arielb1 arielb1 Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the contents of a #[repr(X)] enum cannot be inspected by outer enums for storage optimizations -- Option<MyEnum> must be larger than MyEnum.

When is that needed? This breaks the rule that by-value enums must always be "valid", and I don't see you using it (e..g. if dest is initialized when the function ends, then an Option<MyEnum> would still work).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, having reflected on this more, I believe you're correct. I'd just like @bluss to verify that their original use-case only requires the payload to be opaque. And it would be fine for Option<Flag> to use the the Flag's tag to represent None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having reflected on this even more, here is an interesting case where this matters:

I am recursively doing this in-place deserialization process, and at some point I need to deserialize Option<MyEnum> in place. To do so, I need to do:

*result = Some(mem::uninitialized());
if let Some(ref mut my_enum) = result {
  deserialize_my_enum(my_enum);
}

That is, if we make MyEnum "super opaque" it means Option<MyEnum> effectively becomes a pseudo-repr(X) type, and we can defer initialization of the payload... except that this runs afoul of my "must be valid when you match" rule. It's possible we could create more fine-grained rules to allow it, though.

Alternatively we can declare this to be unsupported, and force people who want this to make a #[repr(u8)] OpaqueOption<T> { Some(T), None }.

Copy link
Member

@bluss bluss Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it seems entirely right, it's just the payload.

It's about crate nodrop and arrayvec's use of it to store an uninitialized and/or partly uninitialized payload.

Yes as I understand it's ok if the enclosing Option uses the repr(u8) enum's tag. @eddyb has in fact already implemented that in the big enum layout PR(!), and it works fine with nodrop, even if it broke a test I wrote. Because my test was buggy.

I was envisioning a ManuallyDrop with more guarantees or a MaybeUninitialized or stable fully generic unions (without Copy bound) would the way to bring the arrayvec and nodrop use case to sound ground. (I've been discussing ideas with a few others the past weeks (breadcrumby link).)

If the uninitialized-in-repr-enum use case gets blessed by way of this rfc I think it's great if it can do so on other merits. But crate nodrop provides a working proof of concept.

@arielb1
Copy link
Contributor

arielb1 commented Oct 31, 2017

Also, the representation should be more of the form:

type Tag = uX;

#[repr(C)]
union MyEnum {
    tag: Tag,
    variant_1: Variant1,
    variant_2: Variant2
}

#[repr(C)]
struct Variant1 {
    tag: Tag,
    field1: Field1,
    ...
}

#[repr(C)]
struct Variant2 {
    tag: Tag,
    field1: Field1,
    ...
}

Because e.g.

#[repr(u8)]
pub enum X {
    Y { a: u8, b: u32 },
    Z,
}

fn main() {
    assert_eq!(std::mem::size_of::<X>(), 8);
}

where the enum is represented as

   0     1     2     3     4     5     6     7
 ----- ----- ----- ----- ----- ----- ----- ----- 
| tag |  a  | <padding> |  -------- b --------- |
 ----- ----- ----------- ----------------------- 

Where the first field of the enum is allowed to not have the same alignment as the enum.

@eddyb
Copy link
Member

eddyb commented Oct 31, 2017

@jld @arielb1 @pythonesque That (union of variants each containing the tag and their payload) is indeed what Rust enums use, as an optimization, but it's not, AFAIK, typically used in C or C , and harder to work with in those languages than "tag followed by union of payloads".

I think that minimizing the padding through this method, just like field reordering, shouldn't apply to C-compatible types as I do not see the optimization benefits justifying the complexity.

@glaebhoerl
Copy link
Contributor

Would it be reasonable to allow syntax like #[repr(C, u8)] to specify both "must be C-compatible" and the size of the tag, instead of just assuming the former whenever the latter is done? Then whether to do the alignment optimization, as well as whether to do #[repr(C)] on the member structs as mentioned in the RFC, could both depend on whether or not C representation was requested (without taking away the ability specify the tag size at the same time). If not it's probably simplest to just always do the FFI-appropriate thing.

@eddyb
Copy link
Member

eddyb commented Oct 31, 2017

@glaebhoerl I'm pretty sure that's how it already works (except for the union-of-(tag, payload) vs (tag, union-of-payloads)).

* Probably should be a field to avoid conflicts with user-defined methods
* Might need `#[repr(pub(X))]` for API design reasons
* Compiler-generated definitions for the Repr types
* With inherent type aliases on the enum? (`MyEnum::Tag`, `MyEnum::Payload`, `MyEnum::PayloadA`, etc.)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these should probably be implemented separate from this RFC. I'd be inclined to instead allow the ecosystem to generate these declarations using a Custom Derive like #[derive(EnumRepr)] which would define an my_enum_repr (or similar) module with Tag, Payload, and structs for each payload.

Right now I don't think we have the tools to expose these in a nice way, especially if we consider a theoretical enum with a variant called Tag or Payload (this change would be breaking for those enums).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was willing to punt on these since serde and cbindgen could both generate these automatically if need be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds partly like #1450.

@mystor
Copy link

mystor commented Oct 31, 2017

I'm worried that this has limited usefulness without the ability to define untagged unions with non-Copy variants, which IIRC has its stability blocked on #1897 or similar.
I suppose it can still be used for copy variants and for C/C FFI right now.

# Drawbacks
[drawbacks]: #drawbacks

There aren't really any drawbacks: we already decided to do this. This is just making it official.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saying "it has been decided already" is neither a good attitude towards the RFC process, nor is it a substitute for a drawback section!

@Gankra
Copy link
Contributor Author

Gankra commented Oct 31, 2017

Would people like me to decouple the proposal so repr(u32) gets the Rust union(tag) layout, and repr(C, u32) gets the tag, union layout?

@joshtriplett
Copy link
Member

joshtriplett commented Oct 31, 2017

@gankro That sounds reasonable, though it needs very careful documentation.

And I don't think either one should inhibit optimization of Option<MyEnum>.

@Gankra
Copy link
Contributor Author

Gankra commented Oct 31, 2017

Note that the Rust union(tag) layout is explicitly declared valid to manipulate in C >= 11 (not sure about earlier versions):

screen shot 2017-10-31 at 4 21 52 pm

And I believe C doesn't actually care about accessing unions in weird ways, so it's also valid there. So the only reason to care about the (tag, union)-style is because it's a bit more ergonomic to manipulate in low-level code, and might make it easier to work with existing C( ) codebases.

@RReverser
Copy link

because it's a bit more ergonomic to manipulate in low-level code

Yeah, ergonomically having separate tag and actual value structs is more convenient. Otherwise you will have an extra wrapper struct for each union variant which already has a nested named struct type, and that tag in each of variants will be ignored anyway when it's not part of the union.

@Gankra Gankra changed the title Formally define repr(X) on enums with payloads Formally define repr(u32, i8, etc...) and repr(C) on enums with payloads Oct 31, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Oct 31, 2017

I have significantly updated the contents of the RFC based on feedback:

  • repr(u32, i8, etc...) now uses the current Rust union(tag) layout for compatibility and performance reasons
  • repr(C) now forces the alternative (tag, union) layout that was originally proposed.

In addition, the Drawbacks section has been properly filled out, and two Real "unresolved questions" have been added:

  • Should Option<MyEnum> be forced to be larger than MyEnum? (should the tag be opaque?)
  • Should every payload in a repr(C) enum's union be a struct?

@RReverser
Copy link

Just to reconfirm: this doesn't impose similar Copy restrictions as union do?

On one hand, such restrictions would help with "it's very bad" note in the parsing example, but on another, it might be not a sufficient reason to limit the usefulness of tagged enums (and it's a breaking change).

@Gankra
Copy link
Contributor Author

Gankra commented Nov 1, 2017

This RFC only defines the layout. So you won't be able to write out a MyEnumRepr type for non-Copy payloads in Rust today, but when/if unions remove that restriction you will be able to. (and you can do whatever in C( ))

@RReverser
Copy link

@gankro Right, what I wanted to reconfirm is that this RFC doesn't impose additional restriction on variant contents. And since it doesn't, that "it would be very bad if we panicked past this point" comment seems a bit excessive, since for Copy types (as currently limited by union) it's not scary at all.

@Gankra
Copy link
Contributor Author

Gankra commented Nov 1, 2017

This specific enum isn't the best example of it, but e.g. Option<u32>, bool, and char are all Copyable but have Undefined values that you could get by failing to complete the parse, effectively transmuting one variant into the other. This itself would only be a problem if the caller had a catch_unwind, though.

@RReverser
Copy link

This itself would only be a problem if the caller had a catch_unwind, though.

Ah yes, indeed, but then the problem is much more generic than this proposal.

@Centril Centril added A-repr #[repr(...)] related proposals & ideas A-ffi FFI related proposals. A-machine Proposals relating to Rust's abstract machine. A-data-types RFCs about data-types labels Nov 23, 2018
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Oct 2, 2019
…nd 🐉🐲 (from servo:derive-all-the-things); r=emilio

We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition.

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285.

Source-Repo: https://github.com/servo/servo
Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2

UltraBlame original commit: 8bd0a2507ccbb5fad4588e43bbca20eead87a2bb
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Oct 2, 2019
…nd 🐉🐲 (from servo:derive-all-the-things); r=emilio

We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition.

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285.

Source-Repo: https://github.com/servo/servo
Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2

UltraBlame original commit: 8bd0a2507ccbb5fad4588e43bbca20eead87a2bb
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Oct 2, 2019
…nd 🐉🐲 (from servo:derive-all-the-things); r=emilio

We rely on rust-lang/rfcs#2195 to make some enums share the same discriminants by definition.

This fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1428285.

Source-Repo: https://github.com/servo/servo
Source-Revision: 9faf0cdce50379dfb8125d7a9c913babd56382e2

UltraBlame original commit: 8bd0a2507ccbb5fad4588e43bbca20eead87a2bb
@bjorn3
Copy link
Member

bjorn3 commented Dec 13, 2021

Has this been implemented? There is no tracking issue listed.

@RReverser
Copy link

Long ago, yes. There was no issue, I guess (?), because it didn't need change in implementation.

@bjorn3
Copy link
Member

bjorn3 commented Dec 13, 2021

I see, thanks.

@zstewar1
Copy link

zstewar1 commented Jun 3, 2022

The reference for #[repr(C)] enums states:

The representation of a repr(C) enum with fields is a repr(C) struct with two fields, also called a "tagged union" in C:

  • a repr(C) version of the enum with all fields removed ("the tag")
  • a repr(C) union of repr(C) structs for the fields of each variant that had them ("the payload")

However, according to this RFC (which is linked in the nomicon section on #[repr(C)], so I expected the RFC to be authoritative), the layout is specified as a union containing structs whose first field is the enum tag, not a struct containing a union. I don't know whether the default size of C enums leads to those two representations being equivalent, however a quick playground test suggests that this is not the case, as the size of #[repr(C)] enum is equal to the size of a struct of tag union (as specified in the reference), not the size of a union of tagged structs (as specified in the RFC).

Am I missing something? Like, did I make a mistake in my example or misread the RFC or was the behavior updated in a later RFC? Or does this behavior actually disagree with the RFC?

@pnkfelix
Copy link
Member

pnkfelix commented Jun 3, 2022

Unfortunately, it is risky to treat RFC's as authoritative, because they are not always kept up to date as things evolve.

I do not know what the intention here is.

It sounds like @zstewar1 has found a discrepancy that, as far as I can tell, has not been explicitly discussed.

The section of the reference linked by @zstewar1 was changed in rust-lang/reference#879 as part of updating the reference to reflect #2195, so the text written there has been reviewed by @Gankra ...

In any case, I think the best thing would be to turn the example you have come up with as an explicit issue on https://github.com/rust-lang/rust, and tag it as T-lang, (and maybe also with unsafe-code-guidelines? I'm not sure who besides T-lang should own the definition of repr(C) for things like this (i.e. enums with payloads)

@pnkfelix
Copy link
Member

pnkfelix commented Jun 3, 2022

It certainly sounds like @zstewar1 has found a discrepancy that, as far as I can tell, has not been explicitly discussed.

Well, that's not quite true: the distinction under description certainly sounds a lot like the two models that were discussed by @jld @arielb1 @eddyb and @Gankra way up here:

I guess I should look at the RFC text itself to try to puzzle where the discussion actually landed.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 3, 2022

@zstewar1 isn't this text from the RFC consistent with what the reference says?

However, for better compatibility with common C( ) idioms, and better ergonomics for low-level Rust programs, this RFC defines #[repr(C, Int)] on a tagged enum to specify the (tag, union) representation. Specifically the layout will be equivalent to a C-struct containing a C-like #[repr(Int)] enum followed by a C-union containing each payload.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 3, 2022

@zstewar1 and now that I've read and re-read the relevant sections a few times, I think that the problem here may be in how you read the RFC?

In particular, the text here that you wrote:

according to this RFC (which is linked in the nomicon section on #[repr(C)], so I expected the RFC to be authoritative), the layout is specified as a union containing structs whose first field is the enum tag, not a struct containing a union.

From what I can tell, the "union containing structs whose first field is the enum tag" interpretation is only applied to #[repr(Int)] enums, not #[repr(C)] ones?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 8, 2024
… r=jieyouxu

Update E0517 message to reflect RFC 2195.

E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout.

This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 8, 2024
… r=jieyouxu

Update E0517 message to reflect RFC 2195.

E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout.

This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 8, 2024
… r=jieyouxu

Update E0517 message to reflect RFC 2195.

E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout.

This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 8, 2024
… r=jieyouxu

Update E0517 message to reflect RFC 2195.

E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout.

This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Aug 8, 2024
Rollup merge of rust-lang#128795 - zachs18:e0517-update-for-rfc-2195, r=jieyouxu

Update E0517 message to reflect RFC 2195.

E0517 occurs when a `#[repr(..)]` attribute is placed on an unsupported item. Currently, the explanation of the error implies that `#[repr(u*/i*)]` cannot be placed on fieldful enums, which is no longer the case since [RFC 2195](rust-lang/rfcs#2195) was [stabilized](rust-lang#60553), which allows placing `#[repr(u*/i*)]` and/or `#[repr(C)]` on fieldful enums to produce a defined layout.

This PR doesn't (currently) add a description of the semantics of placing `#[repr(u*/i*)]` on a fieldful enum to the error explanation, it just removes the claims/implications that it is not allowed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-ffi FFI related proposals. A-machine Proposals relating to Rust's abstract machine. A-repr #[repr(...)] related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.