Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between num_enum and num_derive ? #61

Closed
debuggerpk opened this issue Oct 31, 2021 · 3 comments
Closed

Difference between num_enum and num_derive ? #61

debuggerpk opened this issue Oct 31, 2021 · 3 comments

Comments

@debuggerpk
Copy link

First of all, thankyou for your efforts.

It appears that both the num-derive and num-enum crate provides the same functionality. What is the motivation behind this crate?

@illicitonion
Copy link
Owner

That's a great question - thanks for asking!

These two crates do indeed do similar things, but there's a key difference between them:

num_enum is trying to emphasise a certain aspect of type safety. First, let's explore ways of doing these conversions without any third-party crates. Throughout, I'll use this example enum:

#[repr(u16)]
enum Number {
  Zero,
  One,
}

Without any third-party crates, we can trivially convert this enum to a u16:

assert_eq!(Number::Zero as u16, 0_u16)

But there's nothing special about u16 here, we can also write this code:

assert_eq!(Number::Zero as u8, 0_u8)

It's less trivial to convert a u16 to this enum, but it's not complicated, just verbose with lots of boilerplate:

let number = match 0_u16 {
  0_u16 => Number::Zero,
  1_u16 => Number::One,
  _ => // Up to you what to do here
}

That's our base-line :) So, let's talk about what these two crates do to try to make this better!

num_enum

num_enum allows deriving traits for enums that have an explicit discriminant type, and its conversions are specifically targeted to that type. So for our example, it only allows conversions to and from u16. This is different from regular as casts, because those actually allow converting to any numeric type. So remember, this code is allowed:

assert_eq!(Number::Zero as u8, 0_u8);

This is kind of fine, but also, kind of sketchy. What if there you tried to convert Number::TwoHundredFiftySeven to a u8? You'll get the value 1_u8. Probably what you wanted was either 257_u16, or to be given a compile error, or at least warning, saying "You're doing a lossy conversion here" (I'd call this "accidental truncation" - your value is accidentally getting truncated to a smaller value than it should be). In particular, this gets problematic if an enum changes its discriminant. Imagine we had originally defined Number to have a repr or u8, but we realised we needed more numbers, so we change its repr to u16 - this API change doesn't cause any compile errors or warnings for anyone doing conversions - it may just introduce incorrect results.

This is not generally the style of Rust. In Rust, as a rule, if you change something which may cause errors or surprising behaviour, the compiler is meant to intervene. It's why we use Result types for errors, and why crates tend to expose enums for the assorted errors they can expose - it forces you to consider edge cases and changes.

num_enum was specifically written to handle exactly this problem - there is a natural type an enum should be converted to, and we only allow that. Which means that if you have this code:

#[repr(u8)]
enum Number {
  Zero,
  One,
}

fn use_number() {
  let number: u8 = Number::Zero.as_repr();
  // ...
}

and you change the repr so we have this code:

#[repr(u16)]
enum Number {
  Zero,
  One,
}

fn use_number() {
  let number: u8 = Number::Zero.as_repr();
  // ...
}

you will get a compile error forcing you to consider that your types may be wrong, and that you may need to handle some values you hadn't considered before, or increase the memory being allocated to a struct, or whatever.

Of course if you did want that behaviour, you can always be explicit about it:

let number = Number::Zero.as_repr() as u8;

num_derive

I can't speak to why num_derive was written (I think it was a pre-1.0 thing that existed in the standard library, and when 1.0 was coming up it was decided the APIs didn't make the cut for std so it got moved to an external crate), but from looking at it from the outside, it does two main things:

Solve the conversion type problem

num_derive solves the same problem as described above, but in a different way. It always attempts to convert to either u64 or i64. This loses type information and sizing (which may matter e.g. when converting a wire format), but it avoids the accidental truncation. However, u64 and i64 aren't necessarily ideal types either! If your enum's repr is i64, then u64 is the wrong type - it can't hold all the values! There are also experimental u128 and i128 reprs which couldn't be properly handled by this type. All of these things which could be compile-time errors, num_derive pushes to be runtime errors (you need to check whether the Option was a Some).

But, in e.g. the case of expanding a u8-repr enum to a u16-repr enum, it does a great job - rather than needing you to update all your call-sites, your call-sites were already using valid types big enough to handle all the possible values! It's a way of being forward-compatible with the repr type growing, at the expense of losing some of the precision about how many bits you actually need to store.

You could just always write Number::Zero as u64 and get the same effect, though, which brings us on to what I think is the main focus of the crate:

Allow for generic programming

num_derive uses the num_traits crate, and this crate is interesting! It provides traits which allow you to write generic code over "any type which can be converted to a number". Sometimes you may want to write code like:

fn is_big<Number: ToPrimitive>(number: Number) -> bool {
  if let Some(primitive) = number.to_u64() {
      primitive > 12
  } else {
      false
  }
}

where you don't care whether number is one kind of enum, another kind of enum, some newtype, or whatever, you just care that it can be converted to a primitive.

Implementing a standard trait, such as ToPrimitive allows this pretty nicely.

This generic programming isn't particularly an emphasis for num_enum, but we do allow it, just on a type which is parameterised by the discriminant type of the enum (and we use the standard Into and From traits). So you can write:

fn is_big<Number: Into<u8>>(number: Number) -> bool {
    number.into() > 12
}

This means it's harder to treat all number-like things the same (we had to be explicit it was u8 we cared about, we could possibly be generic in that too but it starts getting fiddly).

So, tl;dr: num_derive is more about treating all number-like things the same, and forces you to use large types to handle the possibility that the numbers may be large (though, not always large enough types!). num_enum is more about treating things which are represented as numbers exactly as the type of number they're represented by, and forcing you to handle changes to what that type is that come up.

@debuggerpk
Copy link
Author

Thank you for sharing such insights. Got to learn a lot.

For my use case, where I am writing a proc macro ByteIt for structs to generate to_bytes and for_byte on a struct field. The way I am handling enum is by providing a #[byte_it="size"] attribute. where size can be any of the positive integers. I am specifically handling this case when producing my output. Below is the sample of the output.

pub enum ServerGreetingMode {
    Unavailable = 0,
    Unauthenticated = 1,
    Authenticated = 2,
    Encrypted = 4,
}
#[automatically_derived]
#[allow(unused_qualifications)]
impl ::core::marker::Copy for ServerGreetingMode {}
#[automatically_derived]
#[allow(unused_qualifications)]
impl ::core::clone::Clone for ServerGreetingMode {
    #[inline]
    fn clone(&self) -> ServerGreetingMode {
        {
            *self
        }
    }
}
#[allow(non_upper_case_globals, unused_qualifications)]
const _IMPL_NUM_FromPrimitive_FOR_ServerGreetingMode: () = {
    #[allow(clippy::useless_attribute)]
    #[allow(rust_2018_idioms)]
    extern crate num_traits as _num_traits;
    impl _num_traits::FromPrimitive for ServerGreetingMode {
        #[allow(trivial_numeric_casts)]
        #[inline]
        fn from_i64(n: i64) -> Option<Self> {
            if n == ServerGreetingMode::Unavailable as i64 {
                Some(ServerGreetingMode::Unavailable)
            } else if n == ServerGreetingMode::Unauthenticated as i64 {
                Some(ServerGreetingMode::Unauthenticated)
            } else if n == ServerGreetingMode::Authenticated as i64 {
                Some(ServerGreetingMode::Authenticated)
            } else if n == ServerGreetingMode::Encrypted as i64 {
                Some(ServerGreetingMode::Encrypted)
            } else {
                None
            }
        }
        #[inline]
        fn from_u64(n: u64) -> Option<Self> {
            Self::from_i64(n as i64)
        }
    }
};
pub struct ServerGreetingFrame {
    pub unused: [u8; 12],
    #[byte_it(u32)]
    pub mode: ServerGreetingMode,
    pub challenge: [u8; 16],
    pub salt: [u8; 16],
    pub count: u128,
    pub mbz: [u8; 12],
}
impl ServerGreetingFrame {
    const SIZE: usize = 76usize;
    /// Convert the struct to a byte array.
    pub fn to_bytes(&self) -> Vec<u8> {
        let mut bytes = Vec::new();
        bytes.extend_from_slice(&(self.unused));
        bytes.extend_from_slice(&(self.mode as u32).to_be_bytes().to_vec());
        bytes.extend_from_slice(&(self.challenge));
        bytes.extend_from_slice(&(self.salt));
        bytes.extend_from_slice(&(self.count as u128).to_be_bytes().to_vec());
        bytes.extend_from_slice(&(self.mbz));
        bytes
    }
    /// Convert the byte array to a struct.
    pub fn from_bytes(bytes: Vec<u8>) -> Self {
        let unused: [u8; 12usize] = bytes[0usize..12usize].try_into().unwrap();
        let mode: [u8; 4usize] = bytes[12usize..16usize].try_into().unwrap();
        let mode = u32::from_be_bytes(mode);
        let mode = ServerGreetingMode::from_u32(mode).unwrap();
        let challenge: [u8; 16usize] = bytes[16usize..32usize].try_into().unwrap();
        let salt: [u8; 16usize] = bytes[32usize..48usize].try_into().unwrap();
        let count: [u8; 16usize] = bytes[48usize..64usize].try_into().unwrap();
        let count = u128::from_be_bytes(count);
        let mbz: [u8; 12usize] = bytes[64usize..76usize].try_into().unwrap();
        Self {
            unused,
            mode,
            challenge,
            salt,
            count,
            mbz,
        }
    }
}

Inspecting the output again, especially on the trait implementation on the enum, I can see the merits of using num_enum with repr.

So far in my code, I haven't felt the need to use repr. I will update the issue board .. leave a note in the code as a reference and a link to this discussion, if we find the bug.

Thankyou again.

@illicitonion
Copy link
Owner

Sounds good - glad it was useful!

I'm going to close this issue, as I think it's resolved, but feel free to reply to it again if you have any thoughts/ideas/questions! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants