Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC2603] Extend <const> to include str and structural constants. #3161

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 55 additions & 25 deletions text/2603-rust-symbol-name-mangling-v0.md
Original file line number Diff line number Diff line change
Expand Up @@ -691,28 691,30 @@ Mangled names conform to the following grammar:
| "D" <dyn-bounds> <lifetime> // dyn Trait<Assoc = X> Send 'a
| <backref>

<basic-type> = "a" // i8
<basic-type> = <int-type>
| "b" // bool
| "c" // char
| "d" // f64
| "e" // str
| "f" // f32
| "h" // u8
| "i" // isize
| "j" // usize
| "l" // i32
| "m" // u32
| "n" // i128
| "o" // u128
| "s" // i16
| "t" // u16
| "u" // ()
| "v" // ...
| "x" // i64
| "y" // u64
| "z" // !
| "p" // placeholder (e.g. for generic params), shown as _

<int-type> = "a" // i8
| "h" // u8
| "i" // isize
| "j" // usize
| "l" // i32
| "m" // u32
| "n" // i128
| "o" // u128
| "s" // i16
| "t" // u16
| "x" // i64
| "y" // u64

// If the "U" is present then the function is `unsafe`.
// The return type is always present, but demanglers can
// choose to omit the ` -> ()` by special-casing "u".
Expand All @@ -724,16 726,40 @@ Mangled names conform to the following grammar:
<dyn-bounds> = [<binder>] {<dyn-trait>} "E"
<dyn-trait> = <path> {<dyn-trait-assoc-binding>}
<dyn-trait-assoc-binding> = "p" <undisambiguated-identifier> <type>
<const> = <type> <const-data>
| "p" // placeholder, shown as _

// Constants are encoded structurally, as a tree of array/tuple/ADT constructors,
// with integer(-like) leaves, not using the constant's memory representation.
// See the comments on <const-int> & <const-str> for more details on leaf encoding.
<const> = <int-type> <const-int>
| "b" <const-int> // false, true
| "c" <const-int> // '...'
| "e" <const-str> // "..."
| "R" <const> // &value
| "Q" <const> // &mut value
| "A" {<const>} "E" // [a, b, c, ...]
| "T" {<const>} "E" // (a, b, c, ...)
| "V" <path> <const-fields> // named struct/variant
| "p" // placeholder, shown as _
| <backref>

// The encoding of a constant depends on its type. Integers use their value,
// in base 16 (0-9a-f), not their memory representation. Negative integer
// values are preceded with "n". The bool value false is encoded as `0_`, true
// value as `1_`. The char constants are encoded using their Unicode scalar
// value.
<const-data> = ["n"] {<hex-digit>} "_"
<const-fields> = "U" // X
| "T" {<const>} "E" // X(a, b, c, ...)
| "S" {<identifier> <const>} "E" // X { field: value, ... }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why U is necessary, UnitStruct and UnitStruct {} are the same constant, so U can be S {} E, the only benefit is slight compression.

I'm also not sure why identifiers for field names are necessary, fields are uniquely identifiable by their indices, so "S" {<const>} "E" should work equally well?

In other words, "S" {<const>} "E" appears to be usable for any structs and variants.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also "just" flatten the constant and only put the leaf values in, we wouldn't need any names or tuple/array distinctions and nesting.

The only reason we tell apart any of these things is being able to recover useful syntax for the users, and I somewhat assumed that to be a given, I suppose.

I'm obviously biased towards the approach I implemented, but maybe you could register a concern with rfcbot, that could get discussed by compiler team? (though I'm not sure in what kind of meeting)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we keep the U just in order to make it consistent: we already encode () as u where we could also encode it as an empty tuple TE.


// An integer(-like) constant's numeric value is encoded in base 16 (0-9a-f),
// with negative integer values being preceded with "n".
// For other types, the numeric value is the same one used for `as` casts, i.e.:
// * `bool`: 0 for `false` (encoded as `0_`), 1 for `true` (encoded as `1_`)
// * `char`: the Unicode scalar value
<const-int> = ["n"] {<hex-digit>} "_"

// `str` constants are encoded as their (UTF-8) byte sequence, where each byte
// always uses two hex nibbles.
// Because the constant has `str` type, and not `&str`, demangling should make
// that clear by e.g. demangling `616263_` as `*"abc"` (instead of `"abc"`).
// In order to have constants of type `&str` demangle as a plain string literal
// (i.e. without `&*`), demanglers can special-case `Re...` constants.
<const-str> = {<hex-digit> <hex-digit>} "_"

// <base-62-number> uses 0-9-a-z-A-Z as digits, i.e. 'a' is decimal 10 and
// 'Z' is decimal 61.
Expand Down Expand Up @@ -1136,7 1162,7 @@ pub static QUUX: u32 = {
- mangled: `_RINxC3std3fooTNyB4_3BarBe_EBd_E`


# Appendix C - Change LOG
# Appendix C - Changelog
- Removed mention of Itanium mangling in introduction.
- Weakened "predictability" goal.
- Removed non-goal of not providing a mangling for lifetimes.
Expand All @@ -1152,7 1178,11 @@ pub static QUUX: u32 = {
- Resolve question of complex constant data.
- Add a recommended resolution for open question around Punycode identifiers.
- Add a recommended resolution for open question around encoding function parameter types.
- Allow identifiers to start with a digit.
- Make `<binder>` optional in `<fn-sig>` and `<dyn-bounds>` productions.
- Extend `<const-data>` to include `bool` values, `char` values, and negative integer values.
- Remove type from constant placeholders.
- In amendment PR [#2705](https://github.com/rust-lang/rfcs/pull/2705):
- Allow identifiers to start with a digit.
- In amendment PR [#3130](https://github.com/rust-lang/rfcs/pull/3130):
- Make `<binder>` optional in `<fn-sig>` and `<dyn-bounds>` productions.
- Extend `<const-data>` to include `bool` values, `char` values, and negative integer values.
- Remove type from constant placeholders.
- In amendment PR [#3161](https://github.com/rust-lang/rfcs/pull/3161):
- Extend `<const>` to include `str` and structural constants