-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENT] Optimize regexps with multiple prefixes #13843
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job Bryan! The logic looks solid to me. I left few minor comments, mostly to fix some comments and naming to match the new logic. I also suggest to improve a bit unit tests to add more cases with prefixes (e.g. alternation of prefixes with mixed case sensitivity).
Can you ping me once ready for a final review, please? 🙏
func (m *equalMultiStringMapMatcher) setMatches() []string { | ||
if len(m.values) >= maxSetMatches { | ||
if len(m.values) >= maxSetMatches || len(m.prefixes) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look to TestFindSetMatches
I don't think we already have a test case covering the new case (when len(m.prefixes) > 0
). Can you add it, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I checked the logic with threshold set to 0, so it did execute the new code, but the tests should check every time.
That reminds me: Do you think 16 is still the right choice of threshold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxSetMatches
is 256 and I don't remember how it was computed exactly (I remember we did few tests querying TSDB index at that time).
16 is the value of minEqualMultiStringMatcherMapThreshold
. I computed the value using BenchmarkOptimizeEqualStringMatchers
(see comment to minEqualMultiStringMatcherMapThreshold
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I see you are commenting specifically on setMatches
. I can add that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked: setMatches
is only called from optimizeAlternatingLiterals
which currently does not cover prefixes.
So it is not possible to create a case where len(m.prefixes) > 0
.
model/labels/regexp.go
Outdated
@@ -900,7 948,7 @@ func optimizeEqualStringMatchers(input StringMatcher, threshold int) StringMatch | |||
// findEqualStringMatchers analyze the input StringMatcher and calls the callback for each | |||
// equalStringMatcher found. Returns true if and only if the input StringMatcher is *only* | |||
// composed by an alternation of equalStringMatcher. | |||
func findEqualStringMatchers(input StringMatcher, callback func(matcher *equalStringMatcher) bool) bool { | |||
func findEqualStringMatchers(input StringMatcher, callback func(matcher *equalStringMatcher) bool, prefixCallback func(matcher *literalPrefixStringMatcher) bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this function is doing more than its name and description. We should update the description and rename it.
Signed-off-by: Marco Pracucci <[email protected]>
@bboreham I would like to move forward with this PR. With the aim to help, I opened a PR against your branch to address my own comments: bboreham#3 |
Signed-off-by: Marco Pracucci <[email protected]>
Address review comments on prometheus#13843 Signed-off-by: Marco Pracucci <[email protected]>
f9224d1
to
5b7ab53
Compare
Update 2024-07-02: the slow-down has disappeared, possibly because of optimisations to case-insensitive search in other PRs. Meanwhile, I discovered another approach, to sort the alternates during regexp compilation grafana/regexp#11, which works well when the strings have some similarity. |
For example `foo.*|bar.*|baz.*`. Instead of checking each one in turn, we build a map of prefixes, then check the smaller set that could match the string supplied. Signed-off-by: Bryan Boreham <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
Address review comments on prometheus#13843 Signed-off-by: Marco Pracucci <[email protected]>
Address review comments on prometheus#13843 Signed-off-by: Marco Pracucci <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
5b7ab53
to
73edbc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work, LGTM! I've also locally run some fuzzy tests and they couldn't spot any issue. I left few minor comments I would like you to consider before merging, thanks!
Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Bryan Boreham <[email protected]>
Storing MaxInt would break various validity checks. Signed-off-by: Bryan Boreham <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for addressing my feedback!
For example
foo.*|bar.*|baz.*
. Instead of checking each one in turn, we build a map of prefixes, then check the smaller set that could match the string supplied.Benchmarks (updated 2024-07-02):