Version 0.13.0 throws an Error::BacktrackLimitExceeded #138

jcold · 2024-06-07T02:39:42Z

The following code snippet works fine in v0.12.0, but in v0.13.0 it throws an Error::BacktrackLimitExceeded. The regex crates have been checked and they are fine. Any assistance would be appreciated.

error message: Error executing regex: Max limit for backtracking count exceeded

#[test]
fn fancy() -> anyhow::Result<()> {
    let re = fancy_regex::Regex::new(r"\b\$?t\('(. ?)'\)")?;
    for m in re.captures_iter("content...") {
        if let Some(v) = m?.get(1) {
            println!("{}", v.as_str().to_string());
        }
    }
    Ok(())
}

The text was updated successfully, but these errors were encountered:

keith-hall · 2024-06-07T19:19:07Z

hmm, I tried to reproduce this with the toy example and wasn't able to:

cargo run --example toy -- run '\b\$?t\('\''(. ?)'\''\)' 'content...'

    Finished dev [unoptimized   debuginfo] target(s) in 0.03s
     Running `target/debug/examples/toy run '\b\$?t\('\''(. ?)'\''\)' content...`
no match

Guillermogsjc · 2024-09-09T14:12:20Z

Hi folks,

I'm also encountering this issue with complex, user-defined regex patterns (bad practices, horrible pattern), such as:

(a\w?).*?(b\w?).*?(c\w?).*?(d\w?).*?(e\w?).*?(f\w?).*?(g\w?).*?(h\w?).*?(i\w?).*?(j\w*).*?(k\w?).*?(l\w?)

The error RuntimeError(BacktrackLimitExceeded) frequently occurs when using fancy_regex==0.13.0, compared to 0.12.0.

I’m aware that the backtracking limit can be increased here: https://docs.rs/fancy-regex/latest/fancy_regex/struct.RegexBuilder.html#method.backtrack_limit, but the question is: why does this error happen much more often in this version than in the previous one? It seems like the internals are consuming significantly more resources or states.

I suspect this is related to the switch from the regex crate to the lower-level regex-automata. Perhaps something is missing in how greedy particles like .* are being handled.

https://github.com/fancy-regex/fancy-regex/blob/main/CHANGELOG.md#0130---2023-12-22

Switch from the regex crate to regex-automata and regex-syntax (lower-level APIs) to simplify internals (#121)

keith-hall · 2024-09-29T19:39:59Z

Thanks for reporting with an additional example. I unfortunately have very little free time these days to investigate. I wonder if the different toy examples give different output in the version of fancy-regex before the regex-automata change compared to after, and whether that would give any clues as to the cause...

robinst · 2024-10-12T03:54:30Z

I had a look at the first example by @jcold. The difference seems to be caused by the \b for that one:

$ git switch --detach 0.12.0
$ cargo run --example toy -- trace '\b\$?t\('\''(. ?)'\''\)' 'content...'
    Finished dev [unoptimized   debuginfo] target(s) in 0.85s
    Running `target/debug/examples/toy trace '\b\$?t\('\''(. ?)'\''\)' content...`
pos	instruction
0	0 Delegate { inner: Regex("^\\b\\$?t\\('(. ?)'\\)"), inner1: Some(Regex("^(?s:.)\\b\\$?t\\('(. ?)'\\)")), start_group: 0, end_group: 1 }
fail

So 0.12.0 delegates the entire expression. Whereas 0.13.0 doesn't delegate the WordBoundary:

$ git switch --detach 0.13.0
$ cargo run --example toy -- trace '\b\$?t\('\''(. ?)'\''\)' 'content...'
    Finished dev [unoptimized   debuginfo] target(s) in 0.05s
     Running `target/debug/examples/toy trace '\b\$?t\('\''(. ?)'\''\)' content...`
pos	instruction
0	0 Assertion(WordBoundary)
0	1 Delegate ...
...

@ruihe774 Thoughts about this? I would probably go back to delegating word boundary matches again.

ruihe774 · 2024-10-12T04:12:22Z

I had a look at the first example by @jcold. The difference seems to be caused by the \b for that one:

$ git switch --detach 0.12.0
$ cargo run --example toy -- trace '\b\$?t\('\''(. ?)'\''\)' 'content...'
    Finished dev [unoptimized   debuginfo] target(s) in 0.85s
    Running `target/debug/examples/toy trace '\b\$?t\('\''(. ?)'\''\)' content...`
pos	instruction
0	0 Delegate { inner: Regex("^\\b\\$?t\\('(. ?)'\\)"), inner1: Some(Regex("^(?s:.)\\b\\$?t\\('(. ?)'\\)")), start_group: 0, end_group: 1 }
fail

So 0.12.0 delegates the entire expression. Whereas 0.13.0 doesn't delegate the WordBoundary:

$ git switch --detach 0.13.0
$ cargo run --example toy -- trace '\b\$?t\('\''(. ?)'\''\)' 'content...'
    Finished dev [unoptimized   debuginfo] target(s) in 0.05s
     Running `target/debug/examples/toy trace '\b\$?t\('\''(. ?)'\''\)' content...`
pos	instruction
0	0 Assertion(WordBoundary)
0	1 Delegate ...
...

@ruihe774 Thoughts about this? I would probably go back to delegating word boundary matches again.

Let me look into this.

ruihe774 · 2024-10-21T06:33:30Z

I'd suggest to increase the backtrack limit.

If we delegate the word boundary assertions to Regex, it will switch to use PikeVM, which is also a backtrack-based regex virtual machine and is slower than the default DFA/NFA implementation.

The total times of backtracking does not change. It is just that in previous implementation we delegated some backtracking to the underlying PikeVM of Regex, and we did not measure the times of backtracking in PikeVM. In current implementation, the backtracking is moved from PikeVM to the VM of fancy-regex, hence the measured times of backtracking increase.

There is no performance regression or memory usage regression.

Guillermogsjc · 2024-10-21T14:01:08Z

The total times of backtracking does not change. It is just that in previous implementation we delegated some backtracking to the underlying PikeVM of Regex, and we did not measure the times of backtracking in PikeVM. In current implementation, the backtracking is moved from PikeVM to the VM of fancy-regex, hence the measured times of backtracking increase.

Thats it, perfect, thanks for the explanation . Maybe warn on changelog to expect a need on increasing this limit that can be hitten more easily (as it counts more backtracks that were done under the hoods before) after upgrading to 0.13.0 from <0.13.0. It would reduce future noise on issues about this subject.

robinst added a commit that referenced this issue Oct 24, 2024

Add note for 0.13.0 and backtrack limit (see #138)

c1b8a31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 0.13.0 throws an Error::BacktrackLimitExceeded #138

Version 0.13.0 throws an Error::BacktrackLimitExceeded #138

jcold commented Jun 7, 2024

keith-hall commented Jun 7, 2024

Guillermogsjc commented Sep 9, 2024 •

edited

Loading

keith-hall commented Sep 29, 2024

robinst commented Oct 12, 2024

ruihe774 commented Oct 12, 2024

ruihe774 commented Oct 21, 2024

Guillermogsjc commented Oct 21, 2024 •

edited

Loading

Version 0.13.0 throws an Error::BacktrackLimitExceeded #138

Version 0.13.0 throws an Error::BacktrackLimitExceeded #138

Comments

jcold commented Jun 7, 2024

keith-hall commented Jun 7, 2024

Guillermogsjc commented Sep 9, 2024 • edited Loading

keith-hall commented Sep 29, 2024

robinst commented Oct 12, 2024

ruihe774 commented Oct 12, 2024

ruihe774 commented Oct 21, 2024

Guillermogsjc commented Oct 21, 2024 • edited Loading

Guillermogsjc commented Sep 9, 2024 •

edited

Loading

Guillermogsjc commented Oct 21, 2024 •

edited

Loading