-
Notifications
You must be signed in to change notification settings - Fork 831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(scheduler): Add max elapsed duration for model load/unload #5819
Conversation
scheduler/cmd/agent/main.go
Outdated
@@ -47,6 47,10 @@ const ( | |||
maxElapsedTimeReadySubServiceBeforeStart = 15 * time.Minute // 15 mins is the default MaxElapsedTime | |||
// period for subservice ready "cron" | |||
periodReadySubService = 60 * time.Second | |||
// max time to wait for a model server to load a model, including retrues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit s/retrues//retries
@@ -112,7 111,7 @@ func (b *backOffWithMaxCount) Reset() { | |||
} | |||
|
|||
func (b *backOffWithMaxCount) NextBackOff() time.Duration { | |||
if b.currentCount >= b.maxCount { | |||
if b.currentCount >= b.maxCount-1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm: does this now retry once if b.maxCount is one? or is it that we now consider the initial function call as part of the "maxCount"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backoff.RetryNotify
will run the function at least once and if there is an error returned will then use a backoff policy to decide on the retries. From that perspective the above code was slightly wrong and now fixed in the test as well TestBackOffPolicyWithMaxCount
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is whether for a config parameter named something like (max*RetryCount
) one would expect a maximum number of max*RetryCount
retries, or they would expect that the function runs a total of max*RetryCount
times. Either way should be fine as long as we're explicit to users (when this becomes configurable externally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a ticket to create docs to clarify these semantic differences and what we actually do in core 2.
Currently we have: one would expect a maximum number of max*RetryCount retries bounded by the maximum Elapsed duration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, agreed that we should probably make this configurable
Previously we had a fixed elapsed duration defaulting to 15 minutes for model load retries. For loading large llms (e.g. llama3 70b), this fixed max elapsed time didnt allow for any retries consequently as the size of the model will take the duration of the load towards this limit.
This change then introduces the following:
In the case of retries, hf can resume downloads in the case one download has not finished completely with a single call.
Note that this change doesnt make these values configurable yet for the user to specify.
Fixes: INFRA-1114 (internal).