Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register Node/UpdateTaint event to plugins which has Node/Add only and doesn't have Node/UpdateTaint #122292

Merged
merged 1 commit into from
Mar 18, 2024

Conversation

sanposhiho
Copy link
Member

@sanposhiho sanposhiho commented Dec 13, 2023

What type of PR is this?

/kind bug

What this PR does / why we need it:

register a failure plugin from preCheck into pInfo.UnschedulablePlugins so that we could technically prevent #109437 happening again.

Which issue(s) this PR fixes:

Part of #122284
Fixes #122306
Part of #122305

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Currently, NodeAdded QueueingHint could not always be called because of the internal feature called preCheck.
It's definitely not something expected for plugin developers,
and we're trying to eventually remove preCheck completely to fix this.
Until then we'll register UpdateNodeTaint event for plugins that have NodeAdded event, but don't have UpdateNodeTaint event.
It'd result in a bad impact on the requeuing efficiency though, a lot better than some Pods being stuck in the
unschedulable pod pool.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 13, 2023
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.29 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.29.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Dec 13 05:43:10 UTC 2023.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 13, 2023
@sanposhiho
Copy link
Member Author

@kerthcet This is my proposal. I know I have to add UTs but open it now to show you my idea.
What do you think?

pkg/scheduler/scheduler.go Outdated Show resolved Hide resolved
@kerthcet
Copy link
Member

We have two approaches:

  • Option one is what Kensei did here
  • Option two is add the missing update event to plugins like DRA as we did to other plugins as well

And I have two concerns:

  • Do we need to cherry pick this fix to 1.28, 1.29, I think so because this will make some pods unschedulable unhonouring the node events.
  • Do we need to consider out-of-tree plugins, if so, Kensei's fix is a more general way, or we should only consider the in-tree plugins, then option2 seems more simple.

@kerthcet
Copy link
Member

cc @kubernetes/sig-scheduling-misc

@sanposhiho
Copy link
Member Author

sanposhiho commented Dec 14, 2023

Thanks, that's a good summary. I'm on the same page with @kerthcet.

Do we need to cherry pick this fix to 1.28, 1.29, I think so because this will make some pods unschedulable unhonouring the node events.

1

Do we need to consider out-of-tree plugins, if so, Kensei's fix is a more general way, or we should only consider the in-tree plugins, then option2 seems more simple.

I think Yes, so I'd prefer this approach basically.
Also, if we mind this PR is not that simple to be cherry-picked, I think we have another option which is to take both approaches - we create a PR for option2 and cherry-pick it as that's gonna be a simple and enough change to fix this issue in the default scheduler. And, we also merge this PR (option1) for the mitigation for custom plugins, but not cherry-pick.

@alculquicondor
Copy link
Member

alculquicondor commented Dec 14, 2023

/hold
I'd like to review this, once no longer a WIP

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 14, 2023
@sanposhiho
Copy link
Member Author

Given no clear objection to this idea so far, I'll work on making it ready this weekend.

@alculquicondor
Copy link
Member

Can we do this only if the featuregate for scheduling hints is enabled?

@sanposhiho
Copy link
Member Author

sanposhiho commented Dec 16, 2023

Regardless of whether QHint is enabled or not, this bug could happen when EventsToRegister has NodeAdd, but doesn't have NodeUpdate. (#109437 shows it happens before QHint) So, I believe we should do this even if the feature gate is disabled

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 18, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2024
@sanposhiho
Copy link
Member Author

Rebased and squashed into one commit with fixing one point from Aldo last time.

@sanposhiho
Copy link
Member Author

/retitle register Node/UpdateTaint event to plugins which has Node/Add only and doesn't have Node/UpdateTaint

@k8s-ci-robot k8s-ci-robot changed the title register a failure plugin from preCheck into pInfo.UnschedulablePlugins for a correct requeuein register Node/UpdateTaint event to plugins which has Node/Add only and doesn't have Node/UpdateTaint Mar 10, 2024
@alculquicondor
Copy link
Member

/lgtm
/approve
Let me discuss with the leads whether this qualifies for test freeze.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 11, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: a5649617d9bd172b548ff6d2cf2b27e2af81e523

@alculquicondor
Copy link
Member

Also, it might be worth changing plugins such as NodeResourcesFit to use UpdateTaint instead of Update. But we can leave that for after the code freeze.

@sanposhiho
Copy link
Member Author

@MrHohn Can you take a look at this PR? Only minor comment change is your area.

@alculquicondor
Copy link
Member

@pohly could you give us a hand?

@alculquicondor
Copy link
Member

@ahg-g and I agreed that, as a bug, this qualifies for the test freeze

/milestone v1.30

@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Mar 15, 2024
@pohly
Copy link
Contributor

pohly commented Mar 15, 2024

could you give us a hand?

I looked through the discussion thread, but I am too much out of the topic to have a specific opinion.

@@ -396,7 396,7 @@ func (pl *dynamicResources) EventsToRegister() []framework.ClusterEventWithHint
{Event: framework.ClusterEvent{Resource: framework.PodSchedulingContext, ActionType: framework.Add | framework.Update}, QueueingHintFn: pl.isSchedulableAfterPodSchedulingContextChange},
// A resource might depend on node labels for topology filtering.
// A new or updated node may make pods schedulable.
{Event: framework.ClusterEvent{Resource: framework.Node, ActionType: framework.Add | framework.UpdateNodeLabel}},
{Event: framework.ClusterEvent{Resource: framework.Node, ActionType: framework.Add | framework.UpdateNodeLabel | framework.UpdateNodeTaint}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the code behind registerNodeAdded && !registerNodeTaintUpdated below adding this automatically? Does it still make sense to also add this manually?

A short comment here that this is a workaround would be useful to avoid confusion when reading just this file, because nothing in it depends on node taints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still make sense to also add this manually?

Yes, especially for QHint - When you add QHint for Node/Add, you probably want to have the same QHint for Node/UpdateNodeTaint, which you couldn't do if we let the scheduler automatically add UpdateNodeTaint.

A short comment here that this is a workaround would be useful to avoid confusion when reading just this file, because nothing in it depends on node taints.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to have the same QHint for Node/UpdateNodeTaint,

This is the part that I don't get. This particular plugin doesn't check taints of a node. It leaves that to the other plugins which check whether a pod fits the node.

If you are adding it because "it might be needed", then it might be necessary. If it's getting added because "node added" is not delivered reliably (= i.e. it's a workaround), then it makes more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that "node added" events are not delivered reliably.

It's that a new node is likely not ready (it has a not-ready taint). And it needs to be retried when it becomes ready.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 16, 2024
@sanposhiho
Copy link
Member Author

sanposhiho commented Mar 16, 2024

As suggested, added the comments in each plugin's Node/UpdateTaint event to give context to readers.

@pohly Your approval would be helpful, as you're a test/OWNERS approver. We only have a small comment update in test/ area.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, pohly, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 17, 2024
@alculquicondor
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c0bb0070bcf9928962d1984973bfe05fdf9f7b4e

@k8s-ci-robot k8s-ci-robot merged commit aa73f31 into kubernetes:master Mar 18, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

plugins that register nodeAdd in EventsToRegister must register nodeUpdate
8 participants