Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

An alternative '__match__' protocol can be better in simplicity, efficiency, and expressivity #100

Closed
thautwarm opened this issue Jun 25, 2020 · 5 comments

Comments

@thautwarm
Copy link

thautwarm commented Jun 25, 2020

Guido said my blog post is too long, that's because my proposal for an alternative __match__ protocol took most of the pages...

Let me firstly introduce the alternative __match__ protocol, by showing how it works in the match statements.

  • RULE: no args

    match v: case C() matches, iff

    C.__match__(v, 0, ()) is ().

  • RULE: only positional args

    match v: case C(a1, a2, ..., an) matches, iff

    C.__match__(v, n, ()) matches a sequence pattern (a1, ..., an).

  • RULE: only keyword args

    match v: case C(k1=kp1, ..., kn=kpn) matches, iff

    C.__match__(v, 0, ('k1', ..., 'kn')) matches a tuple pattern (kp1, ..., kpn).

  • RULE: positional keyword

    match v: case C(a1, ..., am, k1=kp1, ..., kn=kpn) matches, iff

    C.__match__(v, m, ('k1', ..., 'kn')) matches a tuple pattern (a1, ..., am, kp1, ..., kpn)

In fact, above 4 rules are consistent, and can be unified into one.

I separated them into 4, to show this is FULL-FEATURED, which means it can cover the functionalities of the __match__ protocol presented in PEP 622.

The advantage of this alternative design will make the custom patterns

  • simpler to write and understand, and
  • more efficient in runtime, and
  • more expressive

Simplicity of the alternative __match__ protocol

Avoiding __match_args__ already brings simplicity.

Besides, as the PEP says, InRange shall be made with custom patterns,
we could consider its implementation.

After reading the PEP, I got to know how to implement this:

class Check:
    def __init__(self, f):
        self.f = f
    def __eq__(self, o):
        return self.f(o)

class InRangeProxy:
    def __init__(self, lo_check, hi_check):
        self.lo_check, self.hi_check = lo_check, hi_check
    
class InRange:
    __match_args__ = ['lo_check', 'hi_check']

    @classmethod    
    def __match__(self, instance):
        return InRangeProxy(
            Check(lambda lo: lo <= instance),
            Check(lambda hi: instance < hi)
        )

match 5:
    case InRange(0, 10):
        ...

However, with the protocol proposed in this article:

class Check:
    def __init__(self, f):
        self.f = f
    def __eq__(self, o):
        return self.f(o)

class InRange:
    def __match__(self, instance, argcount: int, kwargs: Tuple[str, ...]):
       assert not kwargs
       match argcount:
           case 1: return (Check(lambda hi: instance < hi), )
           case 2: return (Check(lambda lo: lo <= instance),
                           Check(lambda hi: instance < hi))
           case _: raise ImpossibleMatch("...")

match 5:
    case InRange(6, 10): # in 'range(0, 10)'
        ...
    case InRange(6): # in 'range(6)'
        ...

Works exactly the same.

We will get the following benefits:

  • we avoid introducing a __match_args__, and its relatively complex rules for __match__.
  • we avoid introducing a proxy type
  • the return of __match__ has a shape quite similar to the pattern(intuitive).

The alternative __match__ protocol is efficient

Using a proxy type can result in more allocations, but this is not the most severe drawbacks.

The most crucial 2 drawbacks of current __match__ protocol are:

  1. it cannot verify the validation of argument count/keyword names earlier.
  2. __match_args__ produces an indirect lookup.

For the first point:

match expr:
    case C1(a=C2(...), b=C3(...), c=C4(...)):
        ...

Now, tell me, what if destructuring C1 should only accept keyword arguments a and b?

Currently, should we have to compute the matching for C2(...), C3(...), then find c=C4(...) invalid.

This is inefficient.

Also, when it comes to the positional arguments, currently we can limitedly check if the argument count is valid, by checking the count of sub-patterns with len(__match_args__). However our proposal __match__ protocol allows more flexible checks. See Better Expressivity for more details.

Anyway, with our proposal __match__ protocol, we can fail a match in proper time, to avoid redundant computations.

For the second point, indirect lookups can be expensive for tiny operations:

In [1]: x = [1, 2, 3]

In [2]: y = ["aaa", "bbb", "aba"]


In [3]: class S:
   ...:     def __init__(self, kws, args):
   ...:          for kw, arg in zip(kws, args):
   ...:              setattr(self, kw, arg)
   ...:          self._kws = kws
 
In [4]: s = S(y, x)

In [5]: %timeit x[1]
73.1 ns ± 1.54 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [6]: %timeit getattr(s, s._kws[1])
296 ns ± 8.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

With compiler tricks, we can make getattr(s, s._kws[1]) faster, but it shall be still far slower than x[1].

Drawbacks of this proposal: when a proxy type is not needed, this alternative __match__ may be slower, as a tuple is always allocated.

The alternative __match__ protocol results in better expressivity

So far, the design for patterns [] is generic, which means Sequence instead of list, which can be non-intuitive in some degree.

With this alternative __match__ protocol, * in class pattern is automatically supported:

Like Sequence(a1, a2, *aseq, an), we need to do nothing other than implementing tuple patterns and this __match__ protocol.

This is sufficient to support Sequence interface, and can be implemented with our proposal __match__ protocol.

It's considered useful to support arbitrary number of positional arguments for our custom patterns:

  • We have many kinds of data structure interfaces, other than only Sequence and Mapping.
  • We can then leave list literals([*_]) as-is, I mean this should only matches a list instead of any Sequences.
@thautwarm
Copy link
Author

Default Implementation of __match__ for object

def __match__(cls, argcount: int, kwargs: Tuple[str, ...]):
      if argcount:
         raise ImpossibleMatchError(f"{cls} pattern expects 0 positional arguments, got {argcount}.")

      if not kwargs:
         return ()

      subs = []
      subs_append = subs.append
      undef = object()

      for kwarg in kwargs:
           sub = getattr(cls, kwargs, undef)
           if sub is undef:
                  raise AttributeError(kwarg)
           subs_append(sub)

      return tuple(subs)
      

thautwarm added a commit to thautwarm/cpython that referenced this issue Jun 25, 2020
1. interpreter part:
    add 'MATCH_CALL' instruction to use protocol by
       gvanrossum/patma#100

   * stack effect of 'MATCH_CALL' is -3.

2. compiler part:
    'C(arg1, ..., argm, kw1=p1, ..., kwn=pn)'
     emits

     <compile(C)>
     LOAD_CONST m
     LOAD_CONST ("kw1", ..., "kwn")
     MATCH_CALL

3. Syntax part:
   avoid adding dedicated pattern grammar:
   "case" pattern=star_expressions guard=guard? ':' body=block
@thautwarm
Copy link
Author

This is now implemented in thautwarm/patma.

@thautwarm
Copy link
Author

class X:
    @classmethod
    def __match__(self, inst, argc, kwargs):
        return inst, argc, kwargs

match (1, 2):
    case X(!a, !b, !c), 2:
        print(a, b, c)

1 3 ()

I temporarily use ! to indicate store context, don't mind this.

@gvanrossum
Copy link
Owner

This exact proposal was considered and eventually rejected. Please search the tracker.

@thautwarm
Copy link
Author

I still think this is negotiable because I find the reason you reject this can be resolved by providing a pattern matching utility library.

However I see this is not the current focus and discussions about this can be harmful in this stage. I won't talk about this any more unless higher priority issues got addressed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants