-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find/captures disagree about match when using $ or \z #334
Labels
Comments
CmdrMoozy
changed the title
Discrepancy in Regex::find / Regex::captures behavior when using '\z'
find/captures disagree about match when using $ or \z
Feb 12, 2017
I turned this into a real regression test according to this crate's existing style:
Another thing I noted while doing this is that, while not in multi-line mode, the same test fails regardless of whether I'm using |
BurntSushi
added a commit
to BurntSushi/regex
that referenced
this issue
Feb 18, 2017
When searching for captures, we first use the DFA to find the start and end of the match. We then pass just the matched region of text to the NFA engine to find sub-capture locations. This is a key optimization that prevents the NFA engine from searching a lot more text than what is necessary in some cases. One problem with this is that some instructions determine their match state based on whether the engine is at the boundary of the search text. For example, `$` matches if and only if the engine is at EOF. If we only provide the matched text region, then assertions like `\b` might not work, since it needs to examine at least one character past the end of the match. If we provide the matched text region plus one character, then `$` may match when it shouldn't. Therefore, we provide the matched text plus (at most) two characters. Fixes rust-lang#334
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I believe I have found a bug in this library. I've tried to condense it down into the simplest possible test case which reproduces the problem:
I would expect this test case to pass. The first
assert_eq!
does, but the secondassert_eq!
reports thatre.captures(
actually returned "ab" instead of just "a". It seems like(b*(X|\z))?
matches theb
following thea
and returns it as part of the full capture group, even though the following characterc
doesn't matchX|\z
. Interestingly, this only seems to happen when usingcaptures
, and notfind
.The text was updated successfully, but these errors were encountered: