Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Unclear partial binding semantics #110

Open
dmoisset opened this issue Jun 28, 2020 · 25 comments
Open

Unclear partial binding semantics #110

dmoisset opened this issue Jun 28, 2020 · 25 comments
Labels
accepted Discussion leading to a final decision to include in the PEP fully pepped Issues that have been fully documented in the PEP

Comments

@dmoisset
Copy link
Collaborator

I may have missed it, but it's not clear to me at which point are locals bound to the matched value in case of a partial match failure. For example:

match [0,1]
    case [x,0]: pass
assert x == 0 # Success or fail?

What I could find is:

  • This section: "Name bindings made during a successful pattern match outlive the executed suite and can be used after the match statement". The problem above is that the pattern x succeeded, but [x, 0] failed. is there a bind?
  • There's something about the walrus pattern (named sub-pattern) which says that the name is bound if the subpattern succeeds. What about other kinds of patterns?
  • It's clear after reading the guards section that the names are bound by the time the pattern in the case clause has succeeded and before the guard is evaluated. But it's not so clear if binding can happen while the pattern is evaluating.

These things are a bit spread around. Perhaps a short "name binding semantics" could aggregate all these kinds of things together. I can help with that once I understand what's actually going on :)

@JelleZijlstra
Copy link

#96 also discusses this issue.

@dmoisset
Copy link
Collaborator Author

Sorry, I missed that. So if I understand correctly from the conversation there, "as the matching proceeds, partial subpattern matches can do bindings that will persist even if the outer pattern fails". Should something like that be added to the PEP?

@brandtbucher
Copy link
Collaborator

It is undefined. All the implementation guarantees is that:

  • If a pattern succeeds, all bindings in it happen before moving on to the next pattern.
  • Patterns are matched from left to right.

The same is true of sub-patterns in e.g. mapping, sequence, or class patterns.

As I noted in the linked issue:

It's specifically undefined so as to allow the compiler the flexibility to break the matching logic early (for example, if the length of a sequence or mapping pattern is incompatible with the value of the target).

So, for example, this is guaranteed to work:

match items:
    case (x, .x):
        print("Got a matching pair!")

I agree that this could probably be clarified in the PEP somewhere.

@dmoisset
Copy link
Collaborator Author

I see. I'm a bit surprised about it: I don't remember another part of Python where it's not clearly defined if some evaluation happens or not, and other related constructs like unpacking assignment where it's defined that you don't get any values assigned unless all are. But for the sake of separating things, I'll keep this issue related to "the decided partial binding behaviour should be explicit in the PEP"

@JelleZijlstra
Copy link

I agree with @dmoisset; I don't care much what the semantics are, but it should be clearly defined. Python isn't C and we shouldn't have undefined behavior.

I suppose in most languages with pattern matching this problem doesn't really come up because the newly defined variables are in a new, separate scope, so they're invisible after the case block ends either way.

@brandtbucher
Copy link
Collaborator

brandtbucher commented Jun 28, 2020

Hm. Realistically what this would look like is guaranteeing that no bindings happen until the whole pattern succeeds (but before the guard is evaluated). Though well-defined, I'm not sure that it is really an improvement... but my opinions on this are not very strongly held.

Honestly, I think that any code that relies on (possibly) partially-matched values isn't very well-written. But maybe there are examples that could persuade me otherwise.

@dmoisset
Copy link
Collaborator Author

A definition like that (no bindings happen until the whole pattern succeeds, but before the guard) would be ideal for me, but any other will probably be ok, because as you say code relying on that is probably bad code.

The problem with the set of variables assigned being undefined, from a user standpoint, is that I can not rely on those variables being set, but also I have to be careful about how I use those names, because they might be clobbered anyway. And that could change from python3.9.1 to python 3.9.2. The second problem is that cpython will have some behaviour (doesn't matter which), and someone will rely on it, and then other interpreters like pypy will have to implement the same because they want to be compatible, no matter what fun optimisations they can come up with :)

Sidenote: I just realised that I lied about unpacking... even if a,b=seq assigns all or nothing, a, (b,c) = [2, [3,4,5] binds only a

@gvanrossum
Copy link
Owner

Well, the set of variables that may be clobbered is unambiguous. The reason we don’t want to specify the exact rules is that they are complex, and small changes to a pattern may drastically change the behavior. (E.g. adding a * to a sequence pattern.) So users are better off not counting on variables being clobbered or not.

@gvanrossum
Copy link
Owner

So if I understand correctly from the conversation there, "as the matching proceeds, partial subpattern matches can do bindings that will persist even if the outer pattern fails". Should something like that be added to the PEP?

That's a reasonable thing to add to the PEP. It doesn't commit us to trying subpatterns in a particular order (though other sections may) and it doesn't state the bindings must be made. (Maybe change "can" to "may"?)

@gvanrossum gvanrossum added the needs more pep An issue which needs to be documented in the PEP label Jun 29, 2020
@Tobias-Kohn
Copy link
Collaborator

Just to give two examples to illustrate that the rules are not quite as easy as one might expect.

  • If we have guards as in case (x, y) if x == y:, we really want to make sure that the variables are already defined before the guard is being tested, as it would become quickly pointless to have guards if they could not access the data. But this means that the variables need to be assigned before Python decides whether the pattern fails or not.

  • We currently support two kinds of assignments, as illustrated by, e.g., case Line(start := Point(x, y), end := Point(0, _)):. Now, let's assume that this line does not end on the y-axis ((0, _)), which variables would we bind? In order to stay consistent, the walrus operator would bind start, before the fail is discovered, whereas x and y would not be bound because they follow slightly different semantics...?

I feel this is also slightly related to the scoping issue #17, with the question whether captured variables would be limited in scope to the case handler.

@dmoisset
Copy link
Collaborator Author

@Tobias-Kohn , the two examples you mention are explicitly covered in the PEP already: guards specify that the binding on the pattern is made before the condition is evaluated. The walrus says that the name is bound when the subpattern attached to it succeeds.

@dmoisset
Copy link
Collaborator Author

ok, I see if I can submit a rephrase later today

@dmoisset
Copy link
Collaborator Author

dmoisset commented Jun 30, 2020

I've added a sample PR (note that this is against my personal fork of python/peps) for discussion and comments. If people here like this phrasing, I can submit this upstream: dmoisset/peps#3

@dmoisset
Copy link
Collaborator Author

dmoisset commented Jul 1, 2020

Just as a note, I haven't moved the PR to python/peps yet because I'm sorting out some legal approval I require from my employer to contribute to open source and sign the CLA. (it shouldn't be a problem, other Python core devs already have gone through this same process and are helping me with this, but it may take a couple of days until they clear that)

@dmoisset
Copy link
Collaborator Author

dmoisset commented Jul 2, 2020

Sorted the legal approval, and pushed to PEPS repo

@gvanrossum gvanrossum added accepted Discussion leading to a final decision to include in the PEP fully pepped Issues that have been fully documented in the PEP and removed needs more pep An issue which needs to be documented in the PEP labels Jul 2, 2020
@gvanrossum
Copy link
Owner

gvanrossum commented Jul 2, 2020

Landed as python/peps@60a15e8

@markshannon
Copy link

See python/cpython#22917 (comment)

I see no justification for sloppy half matching, leaving unmatched variables assigned.
Either match a pattern and assign all the variables, or don't match a pattern and assign none.

@gvanrossum
Copy link
Owner

Moving the discussion here.

The way I see it, a case that isn't a match may still need to assign variables in certain situations, e.g.:

case x if x > 0:
    ...

In this situation I definitely want the semantics to be that a plain local variable x is bound to the subject, and then the guard is evaluated. If the guard is falsey, should we erase x, like we do for except E as e:? I prefer not; the code for that is a bit unpleasant (and the reason is to avoid a GC cycle from frame -> exception -> traceback -> frame, which is an important case to avoid -- but I don't think the same consideration exists here).

If we were to erase the variable again when the guard is false, we should make sure that's also done when the guard raises an exception (like the except..as code does).

Anyway, the simplest thing would be to leave x bound in this case, and if we're okay with that I think leaving it potentially bound in other cases is acceptable too.

I would also like to be able to step through pattern matching code (or at least through the guard) in a debugger and observe the variables.

Finally it's not like we need to avoid stomping on a pre-existing x -- that pre-existing x will be stomped on anyway if the case succeeds.

That said I would not object if you contributed a patch implementing the semantics you prefer (in the context of Brandt's implementation), and in that case I'd be happy to amend the PEP to require this behavior. But my preference would be to keep the PEP as is and implement something better in the future. I don't want to have to spend an inordinate amount of effort creating an impeccable implementation, I just want pattern matching to land in 3.10. That kind of iteration is what we do for most feature development.

(So, to be clear, I don't think this wins a beauty prize, but I also don't find the behavior allowed by the current proposal objectionable enough to disallow it.)

@markshannon
Copy link

You don't need to erase variables.
The values can remain on the stack until the guard passes.
Your example could be compiled as:

DUP_TOP
LOAD_CONST 0
COMPARE_OP <
POP_JUMP_IF_FALSE next_case
STORE_FAST x
# body of case goes here
JUMP done
next_case:

@Tobias-Kohn
Copy link
Collaborator

Incidentally, your code is buggy, of course, as it still leaves the value of x on the stack if the guard does not succeed.

More to the point, it is certainly not done by compiling the simplest possible example of a guard. Yes, it could all be done by leaving values on the stack, but in order to be usable in general we would also have to introduce some new bytecode instructions to fetch a specific values from the stack or store a value in the stack (and not necessarily on the top). So, in short in order for this to work in practice, we need enough power to establish basically an activation frame on the stack. All of this is not too difficult, of course, but it is not just easily solved with the existing infrastructure present in the interpreter.

At this point, I would claim that it might even make more sense to introduce "shadow" local variables to hold intermediate values. Which, again, requires to modify the interpreter itself.

@brandtbucher
Copy link
Collaborator

brandtbucher commented Nov 3, 2020

Incidentally, your code is buggy, of course, as it still leaves the value of x on the stack if the guard does not succeed.

Well, presumably the next case will use it. :)

At this point, I would claim that it might even make more sense to introduce "shadow" local variables to hold intermediate values. Which, again, requires to modify the interpreter itself.

Just brainstorming here.

I think we could maybe make this work by stealing a trick from comprehensions and generator expressions. Really, we only need something like two new instructions:

DELETE_TEMP_NAMES: Clear all fast locals whose names start with a dot.
STORE_TEMP_NAMES: Move all non-null fast locals whose names start with a dot to their corresponding name bindings.

Then, Mark's disassembly becomes:

STORE_FAST        .x
LOAD_FAST         .x
LOAD_CONST        0
COMPARE_OP        <
POP_JUMP_IF_FALSE next_case
STORE_TEMP_NAMES
...
JUMP              done
next_case:
DELETE_TEMP_NAMES
...
done:
DELETE_TEMP_NAMES

This is nice in that it gives us many of the benefits of a register VM without any real architectural modification. I've already been toying around with similar ideas for caching other information (like the lengths of sequences and mappings), and the results are very promising. The only big problem that I see is that it limits the introspection capabilities of guards.

Open question: whether these names, if bound, are present in locals(). Precedent says yes:

>>> [locals() for i in [42]]
[{'.0': <tuple_iterator object at 0x7f7fb433c430>, 'i': 42}]

@brandtbucher
Copy link
Collaborator

brandtbucher commented Nov 3, 2020

And just to clarify: we wouldn't need the two new instructions. It's just that managing the temporary namespace using LOAD_FAST/STORE_*/DELETE_FAST on the stack is likely to be much too painful, complex, and slow, given the wide range of possible partial bindings.

@Tobias-Kohn
Copy link
Collaborator

Yes, I thought along the same lines. However, I based my answer on the notion that these shadow variables would be completely hidden and inaccessible (like values on the stack), which then led to the idea that we would have to modify the code objects to account for these hidden variables. But using names like '.x' would work, too :), and as you say, would probably be a rather nice solution.

@dmoisset
Copy link
Collaborator Author

dmoisset commented Nov 3, 2020

How does this tie with exception handling? I'm worried that using .name will leave the fast locals polluted and leaking references. It's different than comprehensions where the function automatically pops out and takes the frame with it, so you can get away with this trick.

@Tobias-Kohn
Copy link
Collaborator

Good point. So we would also need something along the lines of a finally block.

Bottom line: although it is possible to implement that semantic, it is not that simple and straight forward. Hence, we might want to concentrate on more important issues for the time being and leave this to a later version as Guido has indicated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
accepted Discussion leading to a final decision to include in the PEP fully pepped Issues that have been fully documented in the PEP
Projects
None yet
Development

No branches or pull requests

6 participants