[red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order #13722

AlexWaygood · 2024-10-11T22:08:08Z

Summary

A Python class's "Method Resolution Order" ("MRO") is the order in which superclasses of that class are traversed by the Python runtime when searching for an attribute (which includes methods) on that class. Accurately inferring a class's MRO is essential for a type checker if it is going to be able to accurately lookup the types of attributes and methods accessed on that class (or instances of that class).

For simple classes, the MRO (which is accessible at runtime via the __mro__ attribute on a class) is simple:

>>> object.__mro__
(<class 'object'>,)
>>> class Foo: pass
... 
>>> Foo.__mro__  # classes without explicit bases implicitly inherit from `object`
(<class '__main__.Foo'>, <class 'object'>)
>>> class Bar(Foo): pass
... 
>>> Bar.__mro__
(<class '__main__.Bar'>, <class '__main__.Foo'>, <class 'object'>)

For more complex classes that use multiple inheritance, things can get a bit more complicated, however:

>>> class Foo: pass
... 
>>> class Bar(Foo): pass
... 
>>> class Baz(Foo): pass
... 
>>> class Spam(Bar, Baz): pass
... 
>>> Spam.__mro__  # no class ever appears more than once in an `__mro__`
(<class '__main__.Spam'>, <class '__main__.Bar'>, <class '__main__.Baz'>, <class '__main__.Foo'>, <class 'object'>)

And for some classes, Python realises that it cannot determine which order the superclasses should be positioned in order to create the MRO at class creation time:

>>> class Foo(object, int): pass
... 
Traceback (most recent call last):
  File "<python-input-12>", line 1, in <module>
    class Foo(object, int): pass
TypeError: Cannot create a consistent method resolution order (MRO) for bases object, int
>>> class A: pass
... 
>>> class B: pass
... 
>>> class C(A, B): pass
... 
>>> class D(B, A): pass
... 
>>> class E(C, D): pass
... 
Traceback (most recent call last):
  File "<python-input-17>", line 1, in <module>
    class E(C, D): pass
TypeError: Cannot create a consistent method resolution order (MRO) for bases A, B

The algorithm Python uses at runtime to determine what a class's MRO should be is known as the C3 linearisation algorithm. An in-depth description of the motivations and details of the algorithm can be found in this article in the Python docs. The article is quite old, however, and the algorithm given at the bottom of the page is written in Python 2. As part of working on this PR, I translated the algorithm first into Python 3 (see this gist), and then into Rust (the c3_merge function in mro.rs in this PR). In order for us to correctly infer a class's MRO, we need our own implementation of the C3 linearisation algorithm.

The C3 linearisation algorithm isn't too complicated by itself. However, there are additional complexities for a type checker when it comes to calculating a class's MRO. The C3 linearisation algorithm calculates an MRO from a list of bases given to it: but what if one of the bases of a class is inferred as a Union type by red-knot? In this situation, there will be multiple possible lists of bases, and therefore potentially multiple possible MROs, for that single class. For this reason, mypy and pyright both reject classes that have an object in their bases list which is inferred as a Union type, as it adds a fair amount of complexity to the semantic model. However, this PR nonetheless attempts to support such cases, for several reasons:

As long as all elements in the union are valid class bases, it's perfectly type-safe. We should, in general, try not to emit false positives on valid code.
It's nice to support more things than mypy and pyright do; we should attempt to provide added value over these existing type checkers that goes beyond simply being faster
Doing multi-version checking will probably be difficult without this feature due to the way that typeshed branches on sys.version_info conditions, which means that we will likely often infer a union for an object used as a class base.

As well as having to adapt the C3 linearisation algorithm to account for the possibility that a class might have a Union in its bases, I also had to make some changes to account for the fact that a class might have a dynamic type in its bases: in our current model, we have three dynamic types, which are Unknown, Any and Todo. This PR takes the simple approach of deciding that the MRO of Any is [Any, object], the MRO of Unknown is [Unknown, object], and the MRO of Todo is [Todo, object]; other than that, they are not treated particularly specially by the C3 linearisation algorithm. Other than simplicity, this has a few advantages:

It matches the runtime:

>>> from typing import Any
>>> Any.__mro__
(typing.Any, <class 'object'>)

It means that they behave just like any other class base in Python: an invariant upheld by all other class bases in Python is that they all inherit from object.

Dynamic types will have to be treated specially when it comes to attribute and method access from these types; however, that is for a separate PR.

Implementation strategy

The implementation is encapsulated in a new red_knot_python_semantic submodule, types/mro.rs. ClassType::mro_possibilities returns a HashSet of possible MROs that the class could have given its bases, and this calls out to functions in the mro.rs submodule. If there are no union types in the bases list (or in any of the bases of those bases), then there will be exactly one possible MRO for the class; but if there are any union types in the class's bases or the bases of those bases (etc.), then the class could have one of several possible MROs. At each stage, the implementation tries to avoid forking if possible. I've attempted to optimise the implementation based on the assumption that most classes will not have union types in their bases (or the bases of their bases, etc.); there are various fast paths that are predicated on this.

It's necessary for us to emit a diagnostic if we can determine that a particular list of bases will (or could) cause a TypeError to be raised at runtime due to an unresolvable MRO. However, we can't do this while creating the ClassType and storing it in self.types.declarations in infer.rs, as we need to know the bases of the class in order to determine its MRO, and some of the bases may be deferred. This PR therefore iterates over all classes in a certain scope after all types (including deferred types) have been inferred, as part of TypeInferenceBuilder::infer_region_scope. For types that will (or could) raise an exception due to an invalid MRO, we infer the MRO as being [<class in question>, Unknown, object] as well as emitting the diagnostic.

For discussion

There's a performance regression of around 4% on the codspeed incremental benchmark. I've tried to get it down a bit, and it has gone down a bit, but I'm out of ideas for how to reduce it further. Open to suggestions :-)

Test Plan

Beautiful Markdown tests added using @carljm's new test framework. Several tests taken from https://docs.python.org/3/howto/mro.html#python-2-3-mro.

codspeed-hq · 2024-10-12T18:00:20Z

CodSpeed Performance Report

Merging #13722 will degrade performances by 4.05%

_{Comparing alex/mro (953f6f8) with main (04b636c)}

Summary

❌ 1 regressions
✅ 31 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`alex/mro`	Change
❌	`red_knot_check_file[incremental]`	4.5 ms	4.7 ms	-4.05%

AlexWaygood · 2024-10-12T20:52:47Z

@MichaReiser I've updated the PR summary and taken the PR out of draft mode :-)

AlexWaygood · 2024-10-13T13:31:36Z

crates/red_knot_python_semantic/resources/mdtest/mro.md

+This is "ex_2" from <https://docs.python.org/3/howto/mro.html#the-end>
+
+```py
+class O:


I'm not a massive fan of using O (that's a capital-O letter!) as a variable name here, because it's so easy to misread it as 0 (the digit zero). However, I experimented with using a different variable name, and I found it even more confusing, because it was very hard to remember which class it corresponded to in the ex_2 example in https://docs.python.org/3/howto/mro.html#the-end. Unfortunately that article uses O as the class name.

MichaReiser · 2024-10-14T12:35:35Z

@AlexWaygood do you have an understanding of what's causing the perf regression?

AlexWaygood · 2024-10-14T12:57:27Z

@AlexWaygood do you have an understanding of what's causing the perf regression?

Not fully. The CodSpeed flamegraph isn't very informative because of all the Salsa frames.

The fact that the incremental benchmark regresses much more than the cold benchmark makes me think that it's something to do with the method having the #[salsa::tracked] attribute on it: these are reasonably large objects I'm storing in the Salsa cache. I tried to reduce the size of the objects stored in the cache by doing something like this:

diff --git a/crates/red_knot_python_semantic/src/types/mro.rs b/crates/red_knot_python_semantic/src/types/mro.rs
index b32faf71a..ad4b6ea73 100644
--- a/crates/red_knot_python_semantic/src/types/mro.rs
    b/crates/red_knot_python_semantic/src/types/mro.rs
@@ -20,19  20,19 @@ pub(super) enum MroPossibilities<'db> {
     /// There are multiple possible `__mro__` values for this class, but they would
     /// all lead to the class being successfully created. Here are the different
     /// possibilities:
-    MultipleSuccesses(FxHashSet<Mro<'db>>),
     MultipleSuccesses(Box<[Mro<'db>]>),
 
     /// It can be statically determined that the `__mro__` possibilities for this class
     /// (possibly one, possibly many) always fail. Here are the various possible
     /// bases that all lead to class creation failing:
-    CertainFailure(FxHashSet<BasesList<'db>>),
     CertainFailure(Box<[BasesList<'db>]>),
 
     /// There are multiple possible `__mro__`s for this class. Some of these
     /// possibilities result in the class being successfully created; some of them
     /// result in class creation failure.
     PossibleSuccess {
-        possible_mros: FxHashSet<Mro<'db>>,
-        failure_cases: FxHashSet<BasesList<'db>>,
         possible_mros: Box<[Mro<'db>]>,
         failure_cases: Box<[BasesList<'db>]>,
     },

but unfortunately converting the HashSets into boxed slices seemed to cause a slowdown rather than a speedup; the cost of the conversion and new allocation was worse than any speed gained by storing smaller objects in the cache. Defining the Mro type as struct Mro<'db>(Box<[ClassBase<'db>]>); rather than struct Mro<'db>(VecDeque<ClassBase<'db>>); (which it was in an earlier iteration) did provide a significant speedup, however.

The algorithm is in itself also inherently nontrivial (even with the fast paths I added for the common cases), and we have to call it for every class definition we encounter. There's quite a lot of allocation currently in some places: I think some of this is unfortunately unavoidable, but fork_bases and add_next_base could probably be improved somewhat.

MichaReiser

Thanks for the excellent write up.

This code is indeed complicated. I don't think I fully understand what's going on and supporting multiple MROs also results in a fair amount of allocations and hashing all over the place (both are expensive). That's why I'm inclined to not support union MROs unless we have common use cases demonstrating why it's needed. I do fear that this requires making a decision on sys.version_info.
I think we should prioritise our decision on checking multiple python versions because my first reaction on supporting unions was to defer the implementation (and all downstream complexity) until we have real use cases where it matters. But you then mentioned sys.version_info support...

I do feel like there's a lot of boilerplate going on with many intermediate types. I would have to take a deeper look to understand if it's possible to simplify it some way but I noted that I struggled to connect all the dots and would probably also struggle to make changes to the implementation. I'm not sure if there's any simplification potential but it might be worth going over the implementation again to see if we can remove some of the many intermediate structs.

crates/red_knot_python_semantic/src/types.rs

MichaReiser · 2024-10-14T12:53:40Z

crates/red_knot_python_semantic/src/types.rs

+ ///
+ /// [method resolution order]: https://docs.python.org/3/glossary.html#term-method-resolution-order
+ #[salsa::tracked(return_ref)]
+ fn mro_possibilities(self, db: &'db dyn Db) -> MroPossibilities<'db> {


Do we expect this method to be called frequently for it to be worth to track as a separate salsa query?

Could you add an example where the resolved MRO has a cycle.

I expect it to be called very frequently. We need to iterate over the MRO whenever we resolve the type of an attribute or method access (explicit or implict) on a class or an instance of a class. This PR already starts using it in ClassType::inherited_class_member, which currently (incorrectly) only iterates over a class's bases rather than over a class's entire MRO.

Could you add an example where the resolved MRO has a cycle.

Hmmm... what are you thinking of exactly? An MRO that requires an import cycle to be resolved?

what are you thinking of exactly? An MRO that requires an import cycle to be resolved?

No, an actual cycle in the MRO, i.e. what you'd get if you had class str(str): ... in a type stub. Clearly we can't allow this and have to error on it as an invalid MRO, but it would be good to have a test showing that we do this, rather than getting stuck in an infinite loop building the MRO.

Ahh, got it, thanks.

MichaReiser · 2024-10-14T12:55:38Z

crates/red_knot_python_semantic/src/types.rs

+ pub fn bases(&self, db: &'db dyn Db) -> Box<[Type<'db>]> {
+ let base_nodes = self.node(db).bases();
+ let object = KnownClass::Object.to_class(db);
+
+ if Type::Class(*self) == object {
+ // Uniquely among Python classes, `object` has no bases:
+ //
+ // ```pycon
+ // >>> object.__bases__
+ // ()
+ // ```
+ Box::default()
+ } else if base_nodes.is_empty() {
+ // All Python classes that have an empty bases list in their AST
+ // implicitly inherit from `object`:
+ Box::new([object])
+ } else {
+ base_nodes
+ .iter()
+ .map(move |base_expr| definition_expression_ty(db, self.definition(db), base_expr))
+ .collect()
+ }
 }


I suggest that we

implement this as an Iterator to avoid allocating for every class if the code is in the hot path (probably?)

return an impl Iterator and always returning a Vec over a Box<[]> (boxed slices are mainly useful when storing in another struct it doesn't matter much when the variable is stored on the stack) when the code isn't performance sensitive.

MichaReiser · 2024-10-14T13:00:15Z

crates/red_knot_python_semantic/src/types.rs

+ let member = base.own_class_member(db, name);
+ if !member.is_unbound() {
+ union_builder = union_builder.add(member);
+ continue 'outer;


Nit: Isn't this the same as using break?

Suggested change

continue 'outer;

break;

No, we need to skip the union_builder = union_builder.add(Type::Unbound); line which is called after the termination of the inner loop. We wouldn't skip that line if we used break.

crates/red_knot_python_semantic/src/types/infer.rs

crates/red_knot_python_semantic/src/types/mro.rs

MichaReiser · 2024-10-14T13:45:38Z

crates/red_knot_python_semantic/src/types/mro.rs

+ let mut mro = Vec::with_capacity(8);
+
+ loop {
+ sequences.retain(|sequence| !sequence.is_empty());


Does the sequence order matter? It might otherwise be faster to use swap_remove (and remove the element when the pop_front call poped the last element)

yes, the sequence order is crucial here

MichaReiser · 2024-10-14T13:47:34Z

crates/red_knot_python_semantic/src/types/mro.rs

+ // Make sure we don't try to add the candidate to the MRO twice:
+ for sequence in &mut sequences {
+ if sequence[0] == mro_entry {
+ sequence.pop_front();


It seems we're always poping from the front. Would it instead be possible to use regular Vecs and reverse the elements (and poping from the end)?

Would you expect that to be more performant than using a VecDeque here?

Probably not by much but a Vec is simpler than a VecDeque. It also simplifies the method signature by requiring a less specific type (requiring a Vec is more common)

I'll try this out and see if it's workable. A downside is that the implementation will become less recognisably a reimplementation of the algorithm given in the Python docs, so it'll be less self-evident that the function does the correct thing. But I've already made many changes in the process of rewriting it in Python 3, then Rust, so perhaps that's already lost

crates/red_knot_python_semantic/src/types/mro.rs

carljm · 2024-10-16T12:43:37Z

I'm not doing a detailed review of this yet because my understanding is we need to come to agreement on multi-version checking before we are ready to land this with union support. Let me know if that's wrong.

AlexWaygood · 2024-10-16T12:45:22Z

I'm not doing a detailed review of this yet because my understanding is we need to come to agreement on multi-version checking before we are ready to land this with union support. Let me know if that's wrong.

That, and also @MichaReiser's review made me look again and realise that I can make some of this a fair bit more efficient and easier to understand with a rewrite :-) So yes, not much use looking at it right now.

MichaReiser · 2024-10-16T13:08:08Z

We could also consider splitting the PR so that we can land a "simple" version now. That would also give us a nice "diff" to assess the extra complexity

carljm · 2024-10-18T19:26:48Z

It occurred to me today that if we do want to go forward with supporting multi-version-checking and thus unions in class bases, the better representation for that would probably be to have the single class statement result in a union of Type::Class, one for each possible MRO. Having a single ClassType with multiple MRO will I think result in us having to effectively re-implement a lot of union handling inside methods of ClassType.

The trick here is that we can't really do this easily in stub files, where type inference of class bases can be cyclic and must be deferred :/ I wonder if the right fix for that is to not actually defer resolution of class bases in stub files, but only defer resolution of generic parameters within class bases; this would be sufficient to handle the actual cases that we need to support (like class str(Sequence[str]) -- we don't actually need to support class str(str):, because it's meaningless.)

AlexWaygood added 19 commits October 11, 2024 13:06

Add a failing test

80741a4

Implement the easy case: no bases

48cf93d

Implement the special case for object

8292a8e

implement inheritance from single base

916f11e

Make it work for multiple bases

035ef34

Add tests for complex diamond inheritance

203dc17

Add tests for inheritance from Unknown

adaf3bd

Better docs and minor cleanup

15fe706

Use mro() in ClassType::inherited_class_member()

7e722cb

Badly optimized forking version working!

74433ae

Add test for inheritance from Union

e8038ed

Add test for multiple inheritance with multiple unions

6308b2a

Add Salsa caching

e32ff53

minor cleanups

ad4305a

optimize

e97b0b8

optimize, make submodule

3d8aae9

cleanup

3fa55f6

encapsulate and optimize

1151e08

more docs and cleanup

7189224

AlexWaygood added the red-knot Multi-file analysis & type inference label Oct 11, 2024

AlexWaygood changed the title ~~Add a failing test~~ [WIP] [red-knot] Implement C3 linearisation for calculating a Method Resolution Order Oct 11, 2024

AlexWaygood changed the title ~~[WIP] [red-knot] Implement C3 linearisation for calculating a Method Resolution Order~~ [WIP] [red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order Oct 11, 2024

AlexWaygood force-pushed the alex/mro branch from 56b0c7a to 76f6285 Compare October 11, 2024 22:09

nits

d302044

AlexWaygood force-pushed the alex/mro branch from 76f6285 to d302044 Compare October 11, 2024 22:11

Document the tests more

dacd024

AlexWaygood force-pushed the alex/mro branch 3 times, most recently from 8152f52 to c9a0e1f Compare October 11, 2024 22:29

Make the C3 implementation more idiomatic

ea9bc56

AlexWaygood added 2 commits October 12, 2024 17:55

Add tests for mros that error involving unions

903f5b2

Emit diagnostics on classes with invalid MROs

d4876a1

AlexWaygood added 3 commits October 12, 2024 19:04

more comments

5334f71

Switch internal layout of Mro type

7a7e852

optimize

b1f9979

AlexWaygood force-pushed the alex/mro branch 2 times, most recently from 674a98b to d6db5d9 Compare October 12, 2024 18:53

Try to reduce cloning

60bd2d3

AlexWaygood force-pushed the alex/mro branch 2 times, most recently from f735ce0 to 6895e75 Compare October 12, 2024 19:44

rework internals of Mro again

90a7fe8

AlexWaygood force-pushed the alex/mro branch from 6895e75 to 90a7fe8 Compare October 12, 2024 19:55

AlexWaygood marked this pull request as ready for review October 12, 2024 20:47

AlexWaygood requested review from carljm and MichaReiser as code owners October 12, 2024 20:47

AlexWaygood changed the title ~~[WIP] [red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order~~ [red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order Oct 12, 2024

AlexWaygood commented Oct 13, 2024

View reviewed changes

more docs

d9315c1

AlexWaygood force-pushed the alex/mro branch from dd3962c to d9315c1 Compare October 13, 2024 16:36

MichaReiser reviewed Oct 14, 2024

View reviewed changes

Many trivial simplifications and review comments

953f6f8

AlexWaygood force-pushed the alex/mro branch from cef2f23 to 953f6f8 Compare October 14, 2024 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order #13722

[red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order #13722

AlexWaygood commented Oct 11, 2024 •

edited

Loading

codspeed-hq bot commented Oct 12, 2024 •

edited

Loading

AlexWaygood commented Oct 12, 2024

AlexWaygood Oct 13, 2024 •

edited

Loading

MichaReiser commented Oct 14, 2024

AlexWaygood commented Oct 14, 2024

MichaReiser left a comment •

edited by AlexWaygood

Loading

MichaReiser Oct 14, 2024

AlexWaygood Oct 14, 2024 •

edited

Loading

AlexWaygood Oct 14, 2024

carljm Oct 14, 2024

AlexWaygood Oct 14, 2024

MichaReiser Oct 14, 2024

MichaReiser Oct 14, 2024

AlexWaygood Oct 14, 2024

MichaReiser Oct 14, 2024

AlexWaygood Oct 14, 2024

MichaReiser Oct 14, 2024

AlexWaygood Oct 14, 2024

MichaReiser Oct 14, 2024

AlexWaygood Oct 14, 2024 •

edited

Loading

carljm commented Oct 16, 2024

AlexWaygood commented Oct 16, 2024

MichaReiser commented Oct 16, 2024

carljm commented Oct 18, 2024 •

edited

Loading

[red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order #13722

Are you sure you want to change the base?

[red-knot] Implement C3 linearisation for calculating a class's Method Resolution Order #13722

Conversation

AlexWaygood commented Oct 11, 2024 • edited Loading

Summary

Implementation strategy

For discussion

Test Plan

codspeed-hq bot commented Oct 12, 2024 • edited Loading

Merging #13722 will degrade performances by 4.05%

Summary

Benchmarks breakdown

AlexWaygood commented Oct 12, 2024

AlexWaygood Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

MichaReiser commented Oct 14, 2024

AlexWaygood commented Oct 14, 2024

MichaReiser left a comment • edited by AlexWaygood Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

carljm commented Oct 16, 2024

AlexWaygood commented Oct 16, 2024

MichaReiser commented Oct 16, 2024

carljm commented Oct 18, 2024 • edited Loading

AlexWaygood commented Oct 11, 2024 •

edited

Loading

codspeed-hq bot commented Oct 12, 2024 •

edited

Loading

AlexWaygood Oct 13, 2024 •

edited

Loading

MichaReiser left a comment •

edited by AlexWaygood

Loading

AlexWaygood Oct 14, 2024 •

edited

Loading

AlexWaygood Oct 14, 2024 •

edited

Loading

carljm commented Oct 18, 2024 •

edited

Loading