Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Optimize regexps with multiple prefixes #13843

Merged
merged 6 commits into from
Jul 3, 2024

Conversation

bboreham
Copy link
Member

@bboreham bboreham commented Mar 26, 2024

For example foo.*|bar.*|baz.*. Instead of checking each one in turn, we build a map of prefixes, then check the smaller set that could match the string supplied.

Benchmarks (updated 2024-07-02):

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
                                                       │  before.txt  │              after.txt               │
                                                       │    sec/op    │    sec/op      vs base               │
FastRegexMatcher/#00-8                                   44.89n ±  4%   43.62n ±   1%   -2.82% (p=0.002 n=6)
FastRegexMatcher/foo-8                                   51.32n ±  3%   51.75n ±   5%        ~ (p=0.193 n=6)
FastRegexMatcher/^foo-8                                  46.49n ±  1%   47.13n ±   2%        ~ (p=0.093 n=6)
FastRegexMatcher/(foo|bar)-8                             55.58n ±  2%   56.74n ±   1%        ~ (p=0.065 n=6)
FastRegexMatcher/foo.*-8                                 94.66n ±  4%   94.98n ±   1%        ~ (p=0.221 n=6)
FastRegexMatcher/.*foo-8                                 113.0n ±  3%   114.1n ±   1%        ~ (p=0.333 n=6)
FastRegexMatcher/^.*foo$-8                               112.3n ±  1%   114.8n ±   5%    2.23% (p=0.002 n=6)
FastRegexMatcher/^. foo$-8                               112.6n ±  0%   114.2n ±   0%    1.51% (p=0.002 n=6)
FastRegexMatcher/.?-8                                    68.52n ±  5%   68.44n ±   1%        ~ (p=1.000 n=6)
FastRegexMatcher/.*-8                                    79.41n ± 10%   80.76n ±   3%        ~ (p=0.394 n=6)
FastRegexMatcher/. -8                                    82.43n ±  5%   83.16n ±   1%        ~ (p=0.180 n=6)
FastRegexMatcher/foo. -8                                 93.81n ±  1%   95.14n ±   1%    1.42% (p=0.004 n=6)
FastRegexMatcher/. foo-8                                 114.0n ±  4%   114.4n ±   4%        ~ (p=0.420 n=6)
FastRegexMatcher/foo_. -8                                80.70n ±  2%   81.91n ±   1%    1.49% (p=0.015 n=6)
FastRegexMatcher/foo_.*-8                                81.61n ±  3%   82.73n ±   3%        ~ (p=0.071 n=6)
FastRegexMatcher/.*foo.*-8                               192.7n ±  2%   199.5n ±   1%    3.56% (p=0.002 n=6)
FastRegexMatcher/. foo. -8                               200.6n ±  2%   204.4n ±   2%        ~ (p=0.084 n=6)
FastRegexMatcher/(?s:.*)-8                               42.90n ±  2%   44.48n ±   1%    3.68% (p=0.002 n=6)
FastRegexMatcher/(?s:. )-8                               51.12n ±  5%   52.06n ±   2%        ~ (p=0.065 n=6)
FastRegexMatcher/(?s:^.*foo$)-8                          109.6n ±  0%   109.3n ±   1%        ~ (p=0.372 n=6)
FastRegexMatcher/(?i:foo)-8                              70.81n ±  2%   70.38n ±   2%        ~ (p=0.394 n=6)
FastRegexMatcher/(?i:(foo|bar))-8                        157.2n ±  3%   156.6n ±   6%        ~ (p=0.937 n=6)
FastRegexMatcher/(?i:(foo1|foo2|bar))-8                  283.8n ±  3%   285.7n ±   0%        ~ (p=0.394 n=6)
FastRegexMatcher/^(?i:foo|oo)|(bar)$-8                   710.2n ±  5%   722.5n ±   6%    1.73% (p=0.041 n=6)
FastRegexMatcher/(?i:(foo1|foo2|aaa|bbb|ccc|ddd|e-8      798.0n ±  7%   800.5n ±   4%        ~ (p=0.818 n=6)
FastRegexMatcher/((.*)(bar|b|buzz)(. )|foo)$-8           485.9n ±  2%   490.7n ±   1%    0.99% (p=0.002 n=6)
FastRegexMatcher/^$-8                                    42.68n ±  3%   43.17n ±   1%        ~ (p=0.065 n=6)
FastRegexMatcher/(prometheus|api_prom)_api_v1_. -8       186.8n ±  4%   187.2n ±   1%        ~ (p=0.190 n=6)
FastRegexMatcher/10\.0\.(1|2)\. -8                       81.48n ±  1%   81.94n ±   3%        ~ (p=0.240 n=6)
FastRegexMatcher/10\.0\.(1|2). -8                        80.74n ±  1%   82.16n ±   3%    1.76% (p=0.002 n=6)
FastRegexMatcher/((fo(bar))|. foo)-8                     204.9n ±  2%   205.8n ±   2%        ~ (p=0.732 n=6)
FastRegexMatcher/zQPbMkNO|NNSPdvMi|iWuuSoAl|qbvKM-8      203.5n ± 15%   215.4n ±   9%        ~ (p=0.093 n=6)
FastRegexMatcher/jyyfj00j0061|jyyfj00j0062|jyyfj9-8      204.2n ±  5%   213.8n ±   8%        ~ (p=0.180 n=6)
FastRegexMatcher/(?i:(zQPbMkNO|NNSPdvMi|iWuuSoAl|-8      794.8n ±  3%   803.5n ±   3%        ~ (p=0.240 n=6)
FastRegexMatcher/(?i:(AAAAAAAAAAAAAAAAAAAAAAAA|BB-8      356.9n ±  2%   357.9n ±   1%        ~ (p=1.000 n=6)
FastRegexMatcher/(?i:(.*zQPbMkNO|.*NNSPdvMi|.*iWu-8      7.481µ ±  1%   7.575µ ±   1%    1.26% (p=0.004 n=6)
FastRegexMatcher/fo.?-8                                  93.49n ±  2%   95.07n ± 141%    1.68% (p=0.024 n=6)
FastRegexMatcher/foo.?-8                                 94.03n ±  4%   94.62n ±   3%        ~ (p=0.310 n=6)
FastRegexMatcher/f.?o-8                                  81.32n ±  2%   80.81n ±   2%        ~ (p=0.937 n=6)
FastRegexMatcher/.*foo.?-8                               201.9n ±  2%   200.9n ±   0%   -0.45% (p=0.011 n=6)
FastRegexMatcher/.?foo. -8                               192.2n ±  3%   189.9n ±   0%   -1.22% (p=0.002 n=6)
FastRegexMatcher/foo.?|bar-8                             147.5n ±  6%   156.1n ±   6%        ~ (p=0.106 n=6)
FastRegexMatcher/ſſs-8                                   51.20n ±  2%   51.90n ±   3%        ~ (p=0.132 n=6)
FastRegexMatcher/.*-.*-.*-.*-.*-8                        195.6n ±  2%   193.6n ±   2%        ~ (p=0.333 n=6)
FastRegexMatcher/(. )-(. )-(. )-(. )-(. )-8              194.8n ±  1%   199.1n ±   4%    2.21% (p=0.004 n=6)
FastRegexMatcher/((.*))(?i:f)((.*))o((.*))o((.*))-8      4.642µ ±  2%   4.616µ ±   1%        ~ (p=0.394 n=6)
FastRegexMatcher/((.*))f((.*))(?i:o)((.*))o((.*))-8      3.750µ ±  0%   3.802µ ±   2%    1.40% (p=0.004 n=6)
FastRegexMatcher/(?i:(zQPbMkNO.*|NNSPdvMi.*|iWuuS#01-8   5.638µ ±  1%   1.040µ ±   1%  -81.55% (p=0.002 n=6)
FastRegexMatcher/(?i:(zQPbMkNO.*|NNSPdvMi.*|iWuuS-8      257.6n ±  1%   261.5n ±   3%    1.49% (p=0.002 n=6)
geomean                                                  172.4n         168.4n          -2.32%

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job Bryan! The logic looks solid to me. I left few minor comments, mostly to fix some comments and naming to match the new logic. I also suggest to improve a bit unit tests to add more cases with prefixes (e.g. alternation of prefixes with mixed case sensitivity).

Can you ping me once ready for a final review, please? 🙏

model/labels/regexp.go Outdated Show resolved Hide resolved
model/labels/regexp.go Outdated Show resolved Hide resolved
func (m *equalMultiStringMapMatcher) setMatches() []string {
if len(m.values) >= maxSetMatches {
if len(m.values) >= maxSetMatches || len(m.prefixes) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look to TestFindSetMatches I don't think we already have a test case covering the new case (when len(m.prefixes) > 0). Can you add it, please?

Copy link
Member Author

@bboreham bboreham Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I checked the logic with threshold set to 0, so it did execute the new code, but the tests should check every time.
That reminds me: Do you think 16 is still the right choice of threshold?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxSetMatches is 256 and I don't remember how it was computed exactly (I remember we did few tests querying TSDB index at that time).

16 is the value of minEqualMultiStringMatcherMapThreshold. I computed the value using BenchmarkOptimizeEqualStringMatchers (see comment to minEqualMultiStringMatcherMapThreshold).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I see you are commenting specifically on setMatches. I can add that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked: setMatches is only called from optimizeAlternatingLiterals which currently does not cover prefixes.
So it is not possible to create a case where len(m.prefixes) > 0.

model/labels/regexp.go Show resolved Hide resolved
model/labels/regexp.go Show resolved Hide resolved
@@ -900,7 948,7 @@ func optimizeEqualStringMatchers(input StringMatcher, threshold int) StringMatch
// findEqualStringMatchers analyze the input StringMatcher and calls the callback for each
// equalStringMatcher found. Returns true if and only if the input StringMatcher is *only*
// composed by an alternation of equalStringMatcher.
func findEqualStringMatchers(input StringMatcher, callback func(matcher *equalStringMatcher) bool) bool {
func findEqualStringMatchers(input StringMatcher, callback func(matcher *equalStringMatcher) bool, prefixCallback func(matcher *literalPrefixStringMatcher) bool) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this function is doing more than its name and description. We should update the description and rename it.

model/labels/regexp.go Show resolved Hide resolved
model/labels/regexp.go Outdated Show resolved Hide resolved
pracucci added a commit to pracucci/prometheus that referenced this pull request Apr 22, 2024
@pracucci
Copy link
Contributor

@bboreham I would like to move forward with this PR. With the aim to help, I opened a PR against your branch to address my own comments: bboreham#3

bboreham pushed a commit to bboreham/prometheus that referenced this pull request Apr 30, 2024
bboreham pushed a commit to bboreham/prometheus that referenced this pull request Apr 30, 2024
Address review comments on prometheus#13843

Signed-off-by: Marco Pracucci <[email protected]>
@bboreham bboreham marked this pull request as draft May 26, 2024 11:17
@bboreham
Copy link
Member Author

bboreham commented May 26, 2024

Update on this: the extra benchmark Marco added shows that small sets of alternates slow down with my change, so more work is needed.

Update 2024-07-02: the slow-down has disappeared, possibly because of optimisations to case-insensitive search in other PRs.

Meanwhile, I discovered another approach, to sort the alternates during regexp compilation grafana/regexp#11, which works well when the strings have some similarity.

bboreham and others added 3 commits July 1, 2024 12:01
For example `foo.*|bar.*|baz.*`. Instead of checking each one in turn,
we build a map of prefixes, then check the smaller set that could match
the string supplied.

Signed-off-by: Bryan Boreham <[email protected]>
Address review comments on prometheus#13843

Signed-off-by: Marco Pracucci <[email protected]>
pracucci added a commit to pracucci/prometheus that referenced this pull request Jul 1, 2024
Address review comments on prometheus#13843

Signed-off-by: Marco Pracucci <[email protected]>
@bboreham bboreham marked this pull request as ready for review July 2, 2024 09:41
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work, LGTM! I've also locally run some fuzzy tests and they couldn't spot any issue. I left few minor comments I would like you to consider before merging, thanks!

model/labels/regexp.go Outdated Show resolved Hide resolved
model/labels/regexp.go Show resolved Hide resolved
model/labels/regexp.go Outdated Show resolved Hide resolved
model/labels/regexp.go Outdated Show resolved Hide resolved
model/labels/regexp.go Outdated Show resolved Hide resolved
bboreham and others added 2 commits July 2, 2024 16:13
Co-authored-by: Marco Pracucci <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
Storing MaxInt would break various validity checks.

Signed-off-by: Bryan Boreham <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing my feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants