Skip to content

Pattern Matching for Python 3.7 in a simple, yet powerful, extensible manner.

License

Notifications You must be signed in to change notification settings

scravy/awesome-pattern-matching

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Pattern Matching (apm) for Python

Github Actions Downloads PyPI version

pip install awesome-pattern-matching
  • Simple
  • Powerful
  • Extensible
  • Composable
  • Functional
  • Python 3.7 , PyPy3.7
  • Typed (IDE friendly)
  • Offers different styles (expression, declarative, statement, ...)

There's a ton of pattern matching libraries available for python, all with varying degrees of maintenance and usability; also since Python 3.10 there is the PEP-634 match statement. However, this library still offers functionality that PEP-634 doesn't offer, as well as pattern matching for python versions before 3.10. A detailed comparison of PEP-634 and apm is available.

apm defines patterns as objects which are composable and reusable. Pieces can be matched and captured into variables, much like pattern matching in Haskell or Scala (a feature which most libraries actually lack, but which also makes pattern matching useful in the first place - the capability to easily extract data). Here is an example:

from apm import *

if result := match([1, 2, 3, 4, 5], [1, '2nd' @ _, '3rd' @ _, 'tail' @ Remaining(...)]):
    print(result['2nd'])   # 2
    print(result['3rd'])   # 3
    print(result['tail'])  # [4, 5]

# If you find it more readable, '>>' can be used instead of '@' to capture a variable
match([1, 2, 3, 4, 5], [1, _ >> '2nd', _ >> '3rd', Remaining(...) >> 'tail'])

Patterns can be composed using &, |, and ^, or via their more explicit counterparts AllOf, OneOf, and Either . Since patterns are objects, they can be stored in variables and be reused.

positive_integer = InstanceOf(int) & Check(lambda x: x >= 0)

Some fancy matching patterns are available out of the box:

from apm import *

def f(x: int, y: float) -> int:
    pass

if match(f, Arguments(int, float) & Returns(int)):
    print("Function satisfies required signature")

Table of Contents generated with DocToc

Multiple Styles

For matching and selecting from multiple cases, choose your style:

from apm import *

value = 7

# The simple style
if match(value, Between(1, 10)):
    print("It's between 1 and 10")
elif match(value, Between(11, 20)):
    print("It's between 11 and 20")
else:
    print("It's not between 1 and 20")
    
# The expression style
case(value) \
    .of(Between(1, 10), lambda: print("It's between 1 and 10")) \
    .of(Between(11, 20), lambda: print("It's between 11 and 20")) \
    .otherwise(lambda: print("It's not between 1 and 20"))

# The statement style
try:
    match(value)
except Case(Between(1, 10)):
    print("It's between 1 and 10")
except Case(Between(11, 20)):
    print("It's between 11 and 20")
except Default:
    print("It's not between 1 and 20")

# The declarative style
@case_distinction
def f(n: Match(Between(1, 10))):
    print("It's between 1 and 10")

@case_distinction
def f(n: Match(Between(11, 20))):
    print("It's between 11 and 20")

@case_distinction
def f(n):
    print("It's not between 1 and 20")

f(value)

# The terse (pampy) style
match(value,
      Between( 1, 10), lambda: print("It's between 1 and 10"),
      Between(11, 20), lambda: print("It's between 11 and 20"),
      _,               lambda: print("It's not between 1 and 20"))

Nested pattern matches

Patterns are applied recursively, such that nested structures can be matched arbitrarily deep. This is super useful for extracting data from complicated structures:

from apm import *

sample_k8s_response = {
    "containers": [
        {
            "args": [
                "--cert-dir=/tmp",
                "--secure-port=4443",
                "--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname",
                "--kubelet-use-node-status-port"
            ],
            "image": "k8s.gcr.io/metrics-server/metrics-server:v0.4.1",
            "imagePullPolicy": "IfNotPresent",
            "name": "metrics-server",
            "ports": [
                {
                    "containerPort": 4443,
                    "name": "https",
                    "protocol": "TCP"
                }
            ]
        }
    ]
}

if result := match(sample_k8s_response, {
        "containers": Each({
            "image": 'image' @ _,
            "name": 'name' @ _,
            "ports": Each({
                "containerPort": 'port' @ _
            }),
        })
    }):
    print(f"Image: {result['image']}, Name: {result['name']}, Port: {result['port']}")

The above will print

Image: k8s.gcr.io/metrics-server/metrics-server:v0.4.1, Name: metrics-server, Port: 4443

Multimatch

By default match records only the last match for captures. If for example 'item' @ InstanceOf(int) matches multiple times, the last match will be recorded in result['item']. match can record all captures using the multimatch=True flag:

if result := match([{'foo': 5}, 3, {'foo': 7, 'bar': 9}], Each(OneOf({'foo': 'item' @ _}, ...)), multimatch=True):
    print(result['item'])  # [5, 7]

# The default since v0.15.0 is multimatch=False
if result := match([{'foo': 5}, 3, {'foo': 7, 'bar': 9}], Each(OneOf({'foo': 'item' @ _}, ...))):
  print(result['item'])  # 7

Strict vs non-strict matches

Any value which occurs verbatim in a pattern is matched verbatim (int, str, list, ...), except Dictionaries ( anything which has an items() actually).

Thus:

some_very_complex_object = {
    "A": 1,
    "B": 2,
    "C": 3,
}
match(some_very_complex_object, {"C": 3})  # matches!

If you do not want unknown keys to be ignored, wrap the pattern in a Strict:

# does not match, only matches exactly `{"C": 3}`
match(some_very_complex_object, Strict({"C": 3}))

Lists (anything iterable which does not have an items() actually) are also compared as they are, i.e.:

ls = [1, 2, 3]
match(ls, [1, 2, 3])  # matches
match(ls, [1, 2])  # does not match

Match head and tail of a list

It is possible to match the remainder of a list though:

match(ls, [1, 2, Remaining(InstanceOf(int))])

And each item:

match(ls, Each(InstanceOf(int)))

Patterns can be joined using &, |, and ^:

match(ls, Each(InstanceOf(int) & Between(1, 3)))

Wild-card matches are supported using Ellipsis (...):

match(ls, [1, Remaining(..., at_least=2)])

The above example also showcases how Remaining can be made to match at_least n number of items (Each also has an at_least keyword argument).

Wildcard matches anything using _

A wildcard pattern can be expressed using _. _ is a Pattern and thus >> and @ can be used with it.

match([1, 2, 3, 4], [1, _, 3, _])

Wildcard matches anything using ...

The Ellipsis can be used as a wildcard match, too. It is however not a Pattern (so |, &, @, etc. can not be used on it). If you actually want to match Ellipsis, wrap it using Value(...).

Otherwise ... is equivalent for most intents and purposes to _:

match([1, 2, 3, 4], [1, ..., 3, ...])

Support for dataclasses

@dataclass
class User:
    first_name: str
    last_name: str

value = User("Jane", "Doe")

if match(value, User(_, "Doe")):
    print("Welcome, member of the Doe family!")
elif match(value, User(_, _)):
    print("Welcome, anyone!")

The different styles in detail

Simple style

  • 💚 has access to result captures
  • 💚 vanilla python
  • 💔 no case guards
  • 💔 can not return values (since it's a statement, not an expression)
  • 🖤 a bit repetetive
  • 💚 simplest and most easy to understand style
  • 🖤 fastest of them all
from apm import *

value = {"a": 7, "b": "foo", "c": "bar"}

if result := match(value, EachItem(_, 'value' @ InstanceOf(str) | ...), multimatch=True):
    print(result['value'])  # ["foo", "bar"]

pre := version (Python 3.7)

bind() can be used on a MatchResult to bind the matched items to an existing dictionary.

from apm import *

value = {"a": 7, "b": "foo", "c": "bar"}

result = {}
if match(value, EachItem(_, 'value' @ InstanceOf(str) | ...)).bind(result):
    print(result['value'])  # ["foo", "bar"]
elif match(value, {"quux": _ >> 'quux'}).bind(result):
    print(result['quux'])

Expression style

  • 💚 has access to result captures
  • 💚 vanilla python
  • 💚 can return values directly as it is an expression
  • 💚 can use case guards via when= or guarded
  • 🖤 so terse that it is sometimes hard to read

The expression style is summarized:

case(value).of(pattern, action) ... .otherwise(default_action)

...where action is either a value or a callable. The captures from the matching result are bound to the named parameters of the given callable, i.e. result['foo'] and result['bar'] from 'foo' @ _ and 'bar' @ _ will be bound to foo and bar respectively in lambda foo, bar: ....

from apm import *

display_name = case({'user': 'some-user-id', 'first_name': "Jane", 'last_name': "Doe"}) \
    .of({'first_name': 'first' @ _, 'last_name': 'last' @ _}, lambda first, last: f"{first}, {last}") \
    .of({'user': 'user_id' @ _}, lambda user_id: f"#{user_id}") \
    .otherwise("anonymous")

Note: To return a value an .otherwise(...) case must always be present.

Statement style

This is arguable the most hacky style in apm, as it re-uses the try .. except mechanism. It is nevertheless quite readable.

  • 💚 has access to result captures
  • 💚 very readable
  • 💔 can not return values (since it's a statement, not an expression)
  • 💚 can use case guards via when=
  • 🖤 misuse of the try .. except statement
from apm import *

try:
    match({'user': 'some-user-id', 'first_name': "Jane", 'last_name': "Doe"})
except Case({'first_name': 'first' @ _, 'last_name': 'last' @ _}) as result:
    user = f"{result['first']} {result['last']}"
except Case({'user': 'user_id' @ _}) as result:
    user = f"#{result['user_id']}"
except Default:
    user = "anonymous"
    
print(user)  # "Jane Doe"

Declarative style

  • 💔 does not have access to result captures
  • 💚 very readable
  • 💚 can use case guards via when=
  • 💚 can return values
  • 🖤 the most bloated version of all styles
from apm import *

@case_distinction
def fib(n: Match(OneOf(0, 1))):
   return n

@case_distinction
def fib(n):
    return fib(n - 2)   fib(n - 1)

for i in range(0, 6):
    print(fib(i))

Nota bene: Overloading using @case_distinction

If not for its pattern matching capabilities, @case_distinction can be used to implement overloading. In fact, it can be imported as @overload. The mechanism is aware of arity and argument types.

from apm.overload import overload

@overload
def add(a: str, b: str):
    return "".join([a, b])

@overload
def add(a: int, b: int):
    return a   b

add("a", "b")
add(1, 2)

Terse style

  • 💚 has access to result captures
  • 💚 can use case guards via guarded
  • 💚 very concise
  • 💚 can return values
  • 🖤 very readable when formatted nicely
  • 🖤 not so well suited for larger match actions
  • 🖤 slowest of them all

As the name indicates the "terse" style is terse. It is inspired by the pampy pattern matching library and mimics some of its behavior. Despite a slim surface area it also comes with some simplifications:

  • A type given as a pattern is matched against as if it was wrapped in an InstanceOf
  • re.Pattern objects (result of re.compile) are matched against as if it was given via Regex
  • Captures are passed to actions in the same order as they occur in the pattern (not by name)
from apm import *

def fibonacci(n):
  return match(n,
               1, 1,
               2, 1,
               _, lambda x: fibonacci(x - 1)   fibonacci(x - 2)
               )

fibonacci(6)  # -> 8 


class Animal:        pass
class Hippo(Animal): pass
class Zebra(Animal): pass
class Horse(Animal): pass

def what_am_i(x):
  return match(x,
               Hippo,  'hippopotamus',
               Zebra,  'zebra',
               Animal, 'some other animal',
               _,      'not at all an animal',
               )

what_am_i(Hippo())  # -> 'hippopotamus'
what_am_i(Zebra())  # -> 'zebra'
what_am_i(Horse())  # -> 'some other animal'
what_am_i(42)       # -> 'not at all an animal'

Available patterns

Capture(pattern, name=<str>)

Captures a piece of the thing being matched by name.

if result := match([1, 2, 3, 4], [1, 2, Capture(Remaining(InstanceOf(int)), name='tail')]):
    print(result['tail'])  ## -> [3, 4]

As this syntax is rather verbose, two shorthand notations can be used:

# using the matrix multiplication operator '@' (syntax resembles that of Haskell and Scala)
if result := match([1, 2, 3, 4], [1, 2, 'tail' @ Remaining(InstanceOf(int))]):
    print(result['tail'])  ## -> [3, 4]

# using the right shift operator
if result := match([1, 2, 3, 4], [1, 2, Remaining(InstanceOf(int)) >> 'tail']):
    print(result['tail'])  ## -> [3, 4]

Strict(pattern)

Performs a strict pattern match. A strict pattern match also compares the type of verbatim values. That is, while apm would match 3 with 3.0 it would not do so when using Strict. Also apm performs partial matches of dictionaries (that is: it ignores unknown keys). It will perform an exact match for dictionaries using Strict.

# The following will match
match({"a": 3, "b": 7}, {"a": ...})
match(3.0, 3)

# These will not match
match({"a": 3, "b": 7}, Strict({"a": ...}))
match(3.0, Strict(3))

OneOf(*pattern)

Matches against any of the provided patterns. Equivalent to p1 | p2 | p3 | .. (but operator overloading does not work with values that do not inherit from Pattern)

match("quux", OneOf("bar", "baz", "quux"))
match(3, OneOf(InstanceOf(int), None))

Patterns can also be joined using | to form a OneOf pattern:

match(3, InstanceOf(int) | InstanceOf(float))

The above example is rather contrived, as InstanceOf already accepts multiple types natively:

match(3, InstanceOf(int, float))

Since bare values do not inherit from Pattern they can be wrapped in Value:

match("quux", Value("foo") | Value("quux"))

AllOf(*pattern)

Checks whether the value matches all of the given pattern. Equivalent to p1 & p2 & p3 & .. (but operator overloading does not work with values that do not inherit from Pattern)

match("quux", AllOf(InstanceOf("str"), Regex("[a-z] ")))

NoneOf(*pattern)

Same as Not(OneOf(*pattern)) (also ~OneOf(*pattern)).

Not(pattern)

Matches if the given pattern does not match.

match(3, Not(4))  # matches
match(5, Not(4))  # matches
match(4, Not(4))  # does not match

The bitflip prefix operator (~) can be used to express the same thing. Note that it does not work on bare values, so they need to be wrapped in Value.

match(3, ~Value(4))  # matches
match(5, ~Value(4))  # matches
match(4, ~Value(4))  # does not match

Not can be used do create a NoneOf kind of pattern:

match("string", ~OneOf("foo", "bar"))  # matches everything except "foo" and "bar"

Not can be used to create a pattern that never matches:

Not(...)

Each(pattern [, at_least=]

Matches each item in an iterable.

match(range(1, 10), Each(Between(1, 9)))

EachItem(key_pattern, value_pattern)

Matches an object if each key satisfies key_pattern and each value satisfies value_pattern.

match({"a": 1, "b": 2}, EachItem(Regex("[a-z] "), InstanceOf(int)))

Some(pattern) (aka Many and Remaining)

Matches a sequence of items within a list:

if result := match(range(1, 10), [1, 'a' @ Some(...), 4, 'b' @ Some(...), 8, 9]):
    print(result['a'])  # [2, 3]
    print(result['b'])  # [5, 6, 7]

Takes the optional values exactly, at_least, and at_most which makes Some match either exactly n items, at_least n, or at_most n items (at_least and at_most can be given at the same time, but not together with exactly).

Note the difference between Some(1, 2) and Some([1, 2]). The first version matches subsequences, the second version matches items which are themselves lists:

match([0,  1, 2 ,  1, 2 , 3], [0, Some( 1, 2 ), 3])  # matches the subsequence 1, 2 twice
match([0, [1, 2], [1, 2], 3], [0, Some([1, 2]), 3])  # matches the item [1, 2] twice, which happen to be lists

Some also goes by the names of Many and Remaining, which is sometimes nice to convey meaning:

match(range(1, 10), [1, 2, 'remaining' @ Remaining()])
match([0, 1, 1, 1, 2, 1], [0, Many(1), Remaining(InstanceOf(int))])

When used with no arguments, Some() is the same as Some(...).

Remainder(pattern)

Can be used to match the unmatched parts of a Dictionary/Mapping.

result = match({
    "foo": 1,
    "bar": 2,
    "qux": 4,
    "quuz": 8,
}, {"foo": 'foo' @ _, "bar": 'bar' @ _} ** Remainder('rs' @ _))
print(result.foo)  # 1
print(result.bar)  # 2
print(result.rs)   # {'qux': 4, 'quuz': 8}

Remainder is, strictly speaking, not a Pattern and only works in conjunction with ** on dictionaries, and it only works on the right-hand side of the dictionary.

Between(lower, upper)

Matches an object if it is between lower and upper (inclusive). The optional keyword arguments lower_bound_exclusive and upper_bound_exclusive can be set to True respectively to exclude the lower/upper from the range of matching values.

Length(length)

Matches an object if it has the given length. Alternatively also accepts at_least and at_most keyword arguments.

match("abc", Length(3))
match("abc", Length(at_least=2))
match("abc", Length(at_most=4))
match("abc", Length(at_least=2, at_most=4))

Contains(item)

Matches an object if it contains the given item (as per the same logic as the in operator).

match("hello there, world", Contains("there"))
match([1, 2, 3], Contains(2) & Contains(3))
match({'foo': 1, 'bar': 2}, Contains('quux') | Contains('bar'))

Regex(regex_pattern, bind_groups: bool = True)

Matches a string if it completely matches the given regex, as per re.fullmatch. If the regular expression pattern contains named capturing groups and bind_groups is set to True, this pattern will bind the captured results in the MatchResult (the default).

To mimic re.match or re.search the given regular expression x can be augmented as x.* or .*x.* respectively.

Check(predicate)

Matches an object if it satisfies the given predicate.

match(2, Check(lambda x: x % 2 == 0))

InstanceOf(*types)

Matches an object if it is an instance of any of the given types.

match(1, InstanceOf(int, flaot))

SubclassOf(*types)

Matches if the matched type is a subclass of any of the given types.

match(int, SubclassOf(int, float))

Parameters(...)

Matches the parameters of a callable.

def f(x: int, *xs: float, y: str, **kwargs: bool):
    pass


match(f, Parameters(int, VarArgs(float), y=str, KwArgs(bool)))

Each argument to Parameters is expected to be the type of a positional argument.

Parameters matches function signatures if their positional arguments match completely, i.e.

def f(x: int, y: float):
    pass


print(bool(match(f, Parameters(int))))  # False
print(bool(match(f, Parameters(int, float))))  # True
print(bool(match(f, Parameters(int, Remaining(_)))))  # True

Keyword arguments are matched only if they are keyword only arguments. In contrast to positional arguments it matches also impartially (which aligns with the non-strict matching behavior with respect to dictionaries):

def f(x: int, *, y: str, z: float):
    pass


print(bool(match(f, Parameters(int))))  # True
print(bool(match(f, Parameters(y=str))))  # False – positional parameters not matched
print(bool(match(f, Parameters(int, y=str))))  # True

This can be changed with Strict:

def f(x: int, *, y: str, z: float):
    pass


print(bool(match(f, Strict(Parameters(int)))))  # False
print(bool(match(f, Strict(Parameters(int, y=str)))))  # False  (z not mentioned but present)
print(bool(match(f, Strict(Parameters(int, y=str, z=float)))))  # True  (has y and z exactly)

Arguments(*types)

DEPRECATED, use Parameters instead (see above)

Matches a callable if it's type annotations correspond to the given types.

def f(x: int, y: float, z):
    ...


match(f, Arguments(int, float, None))

Arguments has an alternate form which can be used to match keyword arguments:

def f(x: int, y: float, z: str):
    ...

match(f, Arguments(x=int, y=float))

The strictness rules are the same as for dictionaries (which is why the above example works).

# given the f from above
match(f, Strict(Arguments(x=int, y=float)))  # does not match
match(f, Strict(Arguments(x=int, y=float, z=str)))  # matches

Returns(type)

Matches a callable if it's type annotations denote the given return type.

def g(x: int) -> str:
    ...


match(g, Arguments(int) & Returns(str))

Transformed(function, pattern)

Transforms the currently looked at value by applying function on it and matches the result against pattern. In Haskell and other languages this is known as a view pattern.

def sha256(v: str) -> str:
    import hashlib
    return hashlib.new('sha256', v.encode('utf8')).hexdigest()

match("hello", Transformed(sha256, "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"))

This is handy for matching data types like datetime.date as this pattern won't match if the transformation function errored out with an exception.

from apm import *
from datetime import date

if result := match("2020-08-27", Transformed(date.fromisoformat, 'date' @ _):
    print(repr(result['date']))  # result['date'] is a datetime.date

At(path, pattern)

Checks whether the nested object to be matched satisfies pattern at the given path. The match fails if the given path can not be resolved.

record = {
    "foo": {
        "bar": {
            "quux": {
                "value": "deeply nested"
            }
        }
    }
}

result := match(record, At("foo.bar.quux", {"value": Capture(..., name="value")}))
result['value']  # "deeply nested"

# alternate form
result := match(record, At(['foo', 'bar', 'quux'], {"value": Capture(..., name="value")}))

Items(**kwargs))

Mostly syntactic sugar to match a dictionary nicely (and anything that provides an .items() method).

from apm import *
from datetime import datetime

request = {
    "api_version": "v1",
    "job": {
        "run_at": "2020-08-27 14:09:30",
        "command": "echo 'booya'",
    }
}

if result := match(request, Items(
    api_version="v1",
    job=Object(
        run_at=Transformed(datetime.fromisoformat, 'time' @ _),
    ) & OneOf(
        Items(command='command' @ InstanceOf(str)),
        Items(spawn='container' @ InstanceOf(str)),
    )
)):
    print(repr(result['time']))      # datetime(2020, 8, 27, 14, 9, 30)
    print('container' not in result) # True
    print(result['command'])         # "echo 'booya'"

Object(type, *args, **kwargs)

Matches any object of the specific type with the given attrs as in **kwargs. It respects the __match_args__ introduced by PEP-634.

from apm import *
from typing import Literal, Tuple

class Click:
    __match_args__ = ("position", "button")

    def __init__(self, pos: Tuple[int, int], btn: Literal['left', 'right', 'middle']):
        self.position = pos
        self.button = btn

assert match(Click((1, 2), 'left'), Object(Click, (1, 2)))
assert match(Click((1, 2), 'left'), Object(Click, (1, 2), 'left'))
assert match(Click((1, 2), 'left'), Object(Click, (1, 2), button='left'))

Extensible

New patterns can be added, just like the ones in apm.patterns.*. Simply extend the apm.Pattern class:

class Min(Pattern):
    def __init__(self, min):
        self.min = min

    def match(self, value, *, ctx: MatchContext, strict=False) -> MatchResult:
        return ctx.match_if(value >= self.min)

match(3, Min(1))  # matches
match(3, Min(5))  # does not match