-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Implement union types (rather than nominal enums) #43
Comments
For what it's worth, "Algebraic data types like ‘enum’ in Swift/Rust" is already on the roadmap. |
Yes, this is a very important feature, also very important to Swift and many ML languages. Doing this right is super important. Thank you for filing this! |
It's worth clarifying that my proposal is to avoid the design that Swift, Rust, and ML have implemented, owing to its lack of Modularity. 🥁 What are the next steps for this feature? Will Modular be sharing more thoughts in the future? |
So looking at the original Issue and the responses a think there has been some confusion. Support for Structural Union Types@nmsmith is asking for Structural Union Types instead of Nominal Algebraic types such as in Swift and Rust. (Optionals in Swift are an exception) The GoodThis would be a big improvement in developer experience because you no longer need to “wrap” your types in enums when you are choosing one of multiple types. The BadUsually, supporting true structural unwrapped union types means more runtime overhead as you need the ability to determine the actual type at runtime. Nominal enums, on the other hand need minimal overhead (essentially, just a numberic “tag”) to determine which “case” it is. Both Unions and Algebraic Data Type Enums should probably exist in the languageMojo is an interesting case since it aims to be a superset of Python. Since Python is a dynamic language, all types in it can be determined at runtime. A structural Union type makes sense for this. On the other hand, Mojo also provides primitives such as Keeping this in mind, structural types and union types make sense for the “Python, but faster case”. But when you want to eek out the best possible performance from your program, you’d probably want to move from Structural Union types to Nominal Enums. |
@nmn Yes, you've summarized the distinction well. Thanks for that. I addressed the topic of runtime overhead in my original post, under the following bullet point:
To summarize it: I believe that if you're using a structural union type in the same manner that you'd be using an enum, you can compile it in exactly the same way — i.e. efficiently, using numeric tags. (Why shouldn't that be the case? If the code is isomorphic to the code for enums, the compiler can use the same compilation strategy.) If this is true, there would be no reason to support nominal enums in Mojo. They would be entirely redundant. Also, it's worth demonstrating that translating any enum to a union type is trivial. Consider the following enum:
This can be re-written as a union type as follows. (Let's assume that our hypothetical programming language has a syntax for "tuple structs", in the style of Rust.)
Notably, this requires the same number of lines of code as the enum declaration. The advantage is, of course, that structs
|
I agree that many error types can be tough, but I would also caution against giant catch all error types. Sabrina Jewson has a great article describing the pitfalls of multierror unions. |
This is not as simple as you suggest. But in order to not pollute this issue anymore I'm going to not go into detail here. TLDR; you'd need a unique tag for every type for what you're describing to be doable. You may argue that that is worth it, but it is overhead. |
I'm assuming you mean each |
@nmsmith I know what you argued. I had written a full reply why that wouldn't be feasible but it got long so I decided against it. You've generally already mentioned the problem I would describe. You'd need to know how the type is used during all execution. This is possible for small programs, but not when you start building libraries, partial re-compilation etc.
There's a bunch of complexity you're papering over here as well, but let's not get into that! |
There are some challenges, but I think they are surmountable. It's just another optimization task — compilers are full of those. (Traditional enums are already subjected to various kinds of optimization.) Of course, I'm not the one writing the compiler 😄. |
We're not working on this in the immediate future, but when it comes up, we'll update it with thoughts. To clarify, I wasn't suggesting necessarily that they'd be nominal. I agree that making them be anonymous types would be nice to reduce a concept. |
Are there some hard problems with this design? |
This is only true for an immutable list. A mutable list would be invariant. |
Oh, now I get it... |
I've had an epiphany recently. We might be able to model unions and trait objects really cleanly using dependent types. For this approach to work, we'd need the ability to:
A Rust-style "trait object" could be modelled via a definition such as the following:
Caveat: The above definition implies that This type could be used as follows (ignoring the sizing issue):
Now, a trait object is different from an enum, because the set of members of an enum is closed. To model an enum, we'd need a way to specify that a type can only take on a value from a fixed set. I'll use the syntax To define a struct that models enums, we'll need it to accept a variadic number of types. The following definition seems plausible:
This data type can be used in the exact same manner as the
This approach could be really promising. The beautiful thing about this approach is that it doesn't require enums/unions to be a language feature. Instead, they can be defined as a library, built on top of dependent types. Note that the
The second construct is defined using the first. Obviously we can't call them both "enums", so appropriate names would need to be chosen. So to flesh out this design, we would first need to know how the enums-without-payload should work. I also haven't mentioned anything about subtyping, e.g. how/whether |
While I agree it would be really nice to implement these as purely library types, in practice I don't think we will without some compiler magic. The first is the use of dynamic type values in type annotations. I think this is well-formed (dispatch dynamic queries into the runtime type value instead of statically), but I wouldn't put it on the critical path. Also it would have to be a boxed value, so The second is that the struct has to magically implement all the methods of the trait. Our metaprogramming is not good enough to express something like this. This is where compiler magic needs to come in for now. Also, I think existentials should be spelled with The last thing is that with a first class IR representation, we can optimize dynamic dispatch away trivially. |
The same problem occurs for I can envisage a solution that works for both We can possibly integrate this idea with the notion of "dereferencing" (
For trait objects (
More generally, any value with type
|
(Below is just a brain dump. It's by no means a final verdict.) I'm starting to suspect that storing types in runtime variables, e.g. Storing types in To avoid all this complexity, we might be able to store types in runtime variables indirectly, via zero-size objects and some kind of variant/enum feature. For example:
(I'm not sure how |
Another approach to union types - which also avoids Rust's problem of overbroad errors types (e.g. any file system operation might return an It's essentially a middle ground between Typescript and Rust: The type is structural but variants are still nominal (or at least they have names): You don't have to declare a type first, but instead you can just make up tags on the fly and errors accumulate automatically (like in TypeScript), but it is still straightforward to pattern-match on tags (like in Rust). In TypeScript the way to usually discriminate between different variants of a union is to check if certain fields are present or not (which sometimes can be cumbersome and error-prone). |
Look at Haskell, F#, Rescript, no need to reinvent or copy/paste bad habbits from Rust/Typescript. |
@hardskulls as Richard points out on that thread, Roc's tags are basically the same as OCaml's polymorphic variants. So they're not necessarily anything novel. That said, polymorphic variants are a point in the design space worth considering. |
I thought he meant it looks the same, but differs in implementation. |
Summary
I propose to consider adding TypeScript-style union types to Mojo — instead of Rust-style nominal enums. This approach seems like it would work well given that Mojo plans to extend Python, which is a dynamically-typed language. In addition to enabling type declarations for Python libraries, union types would allow Mojo users to avoid the modelling issues associated with nominal enums.
Union types are strictly more general than nominal enums, and so have the potential to make enums obsolete.
Update Jan 2024: I removed a section that was too speculative, and I posted new thoughts on how enums/unions might be modelled using dependent types. This approach would supercede most of the design details discussed in this original post.
Motivation
Advantage 1: Specifying precise return types
Most statically-typed languages that implement type-safe enums use nominal typing. For example, in Rust, to define a function that can return two kinds of error, one must first define (and name) an enumeration of those error kinds:
One can then specify the return type of the error-producing function as follows:
This approach doesn't scale when different functions can return different combinations of errors. For example, maybe we have:
parseFile
(above) that produces either anIOError
or aParseError
.readFile
that only produces anIOError
.parseString
that only produces aParseError
.parseAndExecuteFile
that produces either anIOError
, aParseError
, or aRuntimeError
.The issue here is that each combination of errors requires a different enum to be declared, and each enum needs to have unique constructor names. Ultimately, this makes expressing these return types using nominal enums a nightmare. (And this problem isn't unique to error types. There are many other scenarios in which this issue occurs.)
In contrast, modelling these functions in a dynamically-typed language (such as Python) is trivial. For example, the implementation of
parseFile
will simply return whatever error object is appropriate. No up-front declaration is required.Because of this flexibility afforded by dynamically-typed languages, retroactively adding a type system to such languages has proved challenging. But in recent years, TypeScript has demonstrated that it is possible to statically-type these kinds of programs. TypeScript's trick is to offer union types, via a union operator
|
. The union operator can be used to directly express a disjunction of types. We can use the union operator to directly enumerate the specific errors that the aforementioned functions can return:For brevity, I will omit a full description of how TypeScript's union types work. For more information, check out the relevant section of the TypeScript handbook.
Advantage 2: Static type-checking (and optimization) of Python code
Given that Python is a dynamically-typed language, Python programmers mix values of different types together in myriad ways. For example, the use of heterogeneous collections is common amongst Python programmers.
Accordingly, it would be beneficial if Mojo's static semantics could have a richer understanding of Python, so that more Python code is able to "just work" when copy-pasted into Mojo — even if the code is pasted into a statically-checked
fn
.Notably, Python already supports the use of union types as part of its type annotations feature. However, Python assigns no static semantics to these annotations. In this post, I'm arguing that giving union types a rigorous static semantics will enable Mojo users to construct expressive, statically-typed code that can be compiled very efficiently.
Design
Developing a precise semantics for union types will require a large amount of work. Thankfully, TypeScript is a helpful piece of prior art. Scala 3 has union types as well. But as I explain below, I think we would want to differ in the fine details.
I'm unable to dedicate the time required to flesh out a full design proposal. However, I can skim over some potential questions and challenges that may arise:
Foo
andBar
have a common method (not a trait method), should a person with access to a value of typeFoo | Bar
be able to call it? (TypeScript allows this.)Foo
orBar
and calls a method that they have in common, then you should put that method in a protocolP
, haveFoo
andBar
implement that protocol, and then have your function accept an instance ofP
rather thanFoo | Bar
.Foo | Bar
, because you want to be able to pattern match on the argument and handle each struct differently. IfFoo
andBar
implement a common protocolP
, should the methods of that protocol be callable? This time I think the answer is yes, this should be allowed. Indeed, I relied upon this behaviour earlier, when I described how union types could obviate "trait objects". In short: it should always be possible to upcast a value of typeFoo | Bar
to a value of typeP
.Foo
andBar
were protocols (rather than structs). A common method should only be callable onFoo | Bar
if the method belongs to a protocol that bothFoo
andBar
inherit from. (This is assuming Mojo supports protocol inheritance.)type Color = Red | Green | Blue
(whereRed
,Green
, andBlue
are pre-defined struct types with no fields), there is no reason that the compiler can't encode them via the integers0
,1
and2
when storing them in variables of typeColor
and when passing them as arguments of typeColor
. A difficulty arises when the same structs are used in multiple enums, e.g.type TrafficLight = Red | Yellow | Green
. When these two types "interact" (e.g. aColor
value is used as aTrafficLight
argument), re-mapping of the tag values may need to occur, e.g.Green
gets mapped from1
to2
. Alternatively, the compiler can avoid the need for such re-mappings by monomorphizing function calls. For example, given a function with an argument of typeTrafficLight
, the compiler could generate a "special" version of the function where the tag of eachTrafficLight
color is expected to match the tag for the respectiveColor
color (i.e.Green
is encoded as1
). Call sites that provide aColor
as an argument will invoke this version of the function.__init__
methods are not invertible. Thus, we would probably need a keyword or decorator to designate that a struct is destructurable. The@value
decorator that Mojo offers could be used for this. In fact, Python's@dataclass
decorator already works this way for classes. A more general approach would be to allow structs to define their own destructuring behaviour — analogous to how__getitem__
works, but for destructuring rather than indexing.if x is Foo(a, b, c)...
(a hypothetical syntax for destructuringx
), one would writex is Foo
, which merely tests whetherx
is an instance of the structFoo
. This test would be implemented in the same efficient manner as enums, as discussed earlier.None | None = None
. How should unification be treated?Optional[T]
as follows:type Optional[T] = T | None
However, this definition suffers from the issue (?) that
Optional[None]
is justNone
, and thatOptional[Optional[Int]]
is justOptional[Int]
. This seems hard to reason about!type Optional[T] = Some[T] | None
instead (whereSome
is a pre-defined struct). It seems reasonable to impose this restriction, given that it remains strictly more expressive than traditional enums.None | None
doesn't seem so nefarious — and such unification will obviously happen during type inference.This is probably enough detail for the moment. What does the Mojo team think of this approach?
The text was updated successfully, but these errors were encountered: