Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add upstream strategy random #1221

Merged
merged 9 commits into from
Nov 18, 2023

Conversation

DerRockWolf
Copy link
Contributor

@DerRockWolf DerRockWolf commented Nov 4, 2023

If you want these changes in separate PRs I'll do so 🙂

This PR includes multiple changes:

  • removal of the resolversPerClient struct used in all upstream group/branch resolvers and replaced by a separate groupName string and a flat upstreamResolverStatus slice.
    • The NewXYZResolver() func now accept the new UpstreamGroupConfig struct instead of UpstreamsConfig.
    • Doing this allows removal of a lot of redundant code used to select the only existing upstreamResolverStatus slice in the map (Add upstream strategy strict #1093 implemented a "tree" like architecture which resulted in only one slice existing in the map).
  • some fixed comments and logs
  • removing the context.Canceled check for the parallel_best resolver. I'm not 100% certain but from what I can tell the upstream_resolver Resolve() won't ever return context.Canceled. Edit: we keep the check to not miss it when we will pass a context to upstream_resolver Resolve() func.
  • And the actual feature: the new random upstream strategy 🎉
    • It's basically the same as parallel_best without the race. If the first randomly selected upstream fails a second will be asked.

There are also a few things we could discuss or create issues for:

  • Passing a context into upstream_resolver Resolve() to allow cancellation of the retry.Do if e.g. a timeout is reached or the other resolver won the race (for parallel_best).

    • Currently if a timeout is reached or once one resolver won the race the retry.Do func will still retry any failed calls.
  • In Add upstream strategy strict #1093 and also in the PR I've used the Config.Upstreams.Timeout to "fail" early and continue with the "try next resolver" logic even though retry.Do might be able to successfully get an answer on one of the two remaining retries.

    • We could either get rid of the retry.Do or multiply the timeout for the contexts I've set by 3 to only continue with the next resolvers if all retries failed.
  • The structs returned by NewXYZResolver all embed the Configurable interface and I'm not 100% sure if this is still needed/useful because the only configuration they currently have is the group name and the Upstreams slice.

Thanks for maintaining blocky ❤️

Copy link

codecov bot commented Nov 4, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (f1a6fb0) 93.70% compared to head (6d335f7) 93.77%.
Report is 2 commits behind head on main.

Files Patch % Lines
lists/list_cache.go 60.00% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1221       /-   ##
==========================================
  Coverage   93.70%   93.77%    0.06%     
==========================================
  Files          70       72        2     
  Lines        5687     5892      205     
==========================================
  Hits         5329     5525      196     
- Misses        277      284        7     
- Partials       81       83        2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DerRockWolf DerRockWolf force-pushed the feat/resolver-random branch 2 times, most recently from 20f06ec to ac4b80d Compare November 4, 2023 21:39
@kwitsch
Copy link
Collaborator

kwitsch commented Nov 5, 2023

Looks promising. ♥️

I'll look into it after I fix the parallel processing of unit tests. 🥴

Copy link
Collaborator

@ThinkChaos ThinkChaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea and most of the changes are great. I wonder if we could make random a specific parallel best configuration: add an option (could just be in the code for now, not necessarily user facing) of how many resolvers to race.
Then random would just set that to 1 :)

config/upstreams.go Outdated Show resolved Hide resolved
@DerRockWolf
Copy link
Contributor Author

I like the idea and most of the changes are great. I wonder if we could make random a specific parallel best configuration: add an option (could just be in the code for now, not necessarily user facing) of how many resolvers to race. Then random would just set that to 1 :)

Good idea, I'll look into it :)

@DerRockWolf
Copy link
Contributor Author

Good idea, I'll look into it :)

I've now implemented the random strategy as custom options for the parallel_best resolver.

I'm not 100% happy with the implementation and it still feels a bit rough.
If you have an idea what could be improved please tell me^^

After my changes the random selection is a little bit different than before (I needed to increase the BeNumerically range from 80-100 to 80-120). I could not found out why, from my point of view the changes shouldn't have impact on the randomness, but maybe you have an idea...

// should be 100 ± 20
// TODO: understand why I needed to adjust this...
Expect(v).Should(BeNumerically("~", 100, 20))

@ThinkChaos
Copy link
Collaborator

Sorry for the delay, I missed the original notification. I'll take another look today :)

@0xERR0R 0xERR0R added the 🔨 enhancement New feature or request label Nov 17, 2023
@0xERR0R 0xERR0R added this to the v0.23 milestone Nov 17, 2023
Copy link
Collaborator

@ThinkChaos ThinkChaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General state is pretty good, looking forward to being able to add a config for how many resolver to query in parallel!

resolver/bootstrap.go Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
resolver/bootstrap_test.go Show resolved Hide resolved
resolver/parallel_best_resolver.go Outdated Show resolved Hide resolved
resolver/resolver.go Show resolved Hide resolved
resolver/parallel_best_resolver.go Outdated Show resolved Hide resolved
resolver/parallel_best_resolver.go Show resolved Hide resolved
resolver/parallel_best_resolver.go Outdated Show resolved Hide resolved
resolver/parallel_best_resolver.go Show resolved Hide resolved
resolver/parallel_best_resolver.go Outdated Show resolved Hide resolved
Regarding the err check in `resolve()` of `parallel_best_resolver.go`
I'm not 100% certain but I don't think the upstream resolver returns err
context canceled at all, there are also not tests for it.
resolver/parallel_best_resolver.go Show resolved Hide resolved
resolver/parallel_best_resolver.go Outdated Show resolved Hide resolved
resolver/parallel_best_resolver.go Outdated Show resolved Hide resolved
- maps.Keys()
- docs
- parallel_best resolver String()
- remove expensive & unnecessary log
- return err from retry failed
- pre-alloc pickRandom chosenResolvers
- fix context comment
- upstreamResolverStatus.resolve(): add resolver to err
@ThinkChaos ThinkChaos changed the title resolversPerClient removal & add upstream strategy random feat: add upstream strategy random Nov 18, 2023
@ThinkChaos ThinkChaos merged commit 94663ee into 0xERR0R:main Nov 18, 2023
11 checks passed
@ThinkChaos
Copy link
Collaborator

Did some tiny changes before merging to remove the TODO in the tests and fix a race because of reading and writing Upstreams.Timeout in the tests.
Also about why the numbers changed: for me the using the old numbers for the parallel tests works, so no change!
And for the "random" tests, I think it's not an issue it doesn't exactly match parallel best, as long as we keep it consistant-ish from now on.

Thanks for all the work on this!

@DerRockWolf
Copy link
Contributor Author

Did some tiny changes before merging to remove the TODO in the tests and fix a race because of reading and writing Upstreams.Timeout in the tests. Also about why the numbers changed: for me the using the old numbers for the parallel tests works, so no change! And for the "random" tests, I think it's not an issue it doesn't exactly match parallel best, as long as we keep it consistant-ish from now on.

Thanks for all the work on this!

Thank you for reviewing the PR 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants