-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaintController #40355
TaintController #40355
Conversation
gmarek
commented
Jan 24, 2017
•
edited
Loading
edited
2eb71de
to
5b33298
Compare
@@ -141,6 141,8 @@ type NodeController struct { | |||
daemonSetStore listers.StoreToDaemonSetLister | |||
// allocate/recycle CIDRs for node if allocateNodeCIDRs == true | |||
cidrAllocator CIDRAllocator | |||
// manages taints | |||
taintManager *TaintController |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a detailed description (from our emails) of why the TaintController is special cased to the NodeController, as this is different then all other controllers.
/cc @aveshagarwal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very, very early untested prototype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So don't expect comments, or anything;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's further along now so I think comment is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - I'm applying all comments today. @timothysc - do you have a piece of email I can paste here (I wasn't taking part in those discussions)
5b33298
to
796cf76
Compare
continue | ||
} | ||
podNamespacedName := types.NamespacedName{Namespace: pods[i].Namespace, Name: pods[i].Name} | ||
if len(tolerations) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume there is a node nodeA tainted with taint {foo=bar:NoSchedule}, and pod podA which has no tolerations is running on nodeA, according to the check here, it results in podA evicted.
Actually what we want is just evict pods those don't tolerate NoExecute taints of nodes, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do filter everything except NoExecute taints in XUpdated functions, so no other taints appear in the rest of the code (I hope).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's fine.
796cf76
to
dff53b8
Compare
I added unit tests. Only e2e test, flag gate in NC and logging tweaks are needed to finish this PR. I hope to finish it tomorrow. @kevin-wangzefeng @davidopp @wojtek-t @timothysc Oh, and we need way more |
dff53b8
to
4969754
Compare
08d4aff
to
92f79c9
Compare
I'm getting errors:
@mikedanese @liggitt @cjcullen - any ideas? |
Re #40355 (comment), there's an OWNERS file per dir, the top lever owners are just for those directories that don't have one, I think. Feel free to add people to it. |
the certificates api dropped v1alpha1 types in #39772 |
457189c
to
523f470
Compare
please squash, and we need to get the tests green again. |
523f470
to
004552f
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED The following people have approved this PR: gmarek, mikedanese Needs approval from an approver in each of these OWNERS Files:
We suggest the following people: |
@timothysc @wojtek-t squashed commits. |
Reviewed 5 of 17 files at r7, 1 of 2 files at r10, 6 of 8 files at r12, 2 of 2 files at r13, 1 of 1 files at r14. Comments from Reviewable |
Automatic merge from submit-queue (batch tested with PRs 39418, 41175, 40355, 41114, 32325) |
@@ -104,6 104,7 @@ func NewCMServer() *CMServer { | |||
ClusterSigningCertFile: "/etc/kubernetes/ca/ca.pem", | |||
ClusterSigningKeyFile: "/etc/kubernetes/ca/ca.key", | |||
ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 5 * time.Second}, | |||
EnableTaintManager: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -196,6 197,7 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet, allControllers []string, disabled | |||
fs.Float32Var(&s.UnhealthyZoneThreshold, "unhealthy-zone-threshold", 0.55, "Fraction of Nodes in a zone which needs to be not Ready (minimum 3) for zone to be treated as unhealthy. ") | |||
fs.BoolVar(&s.DisableAttachDetachReconcilerSync, "disable-attach-detach-reconcile-sync", false, "Disable volume attach detach reconciler sync. Disabling this may cause volumes to be mismatched with pods. Use wisely.") | |||
fs.DurationVar(&s.ReconcilerSyncLoopPeriod.Duration, "attach-detach-reconcile-sync-period", s.ReconcilerSyncLoopPeriod.Duration, "The reconciler sync wait time between volume attach detach. This duration must be larger than one second, and increasing this value from the default may allow for volumes to be mismatched with pods.") | |||
fs.BoolVar(&s.EnableTaintManager, "enable-taint-manager", s.EnableTaintManager, "WARNING: Beta feature. If set to true enables NoExecute Taints and will evict all not-tolerating Pod running on Nodes tainted with this kind of Taints.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to make this be a flag? The code would be simpler without it, and beta is enabled by default... I think if this feature is broken we would do a patch release, not just disable it.
@@ -312,6 356,9 @@ func NewNodeController( | |||
utilruntime.HandleError(fmt.Errorf("Error allocating CIDR: %v", err)) | |||
} | |||
} | |||
if nc.taintManager != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to make taintManager conditional. If it is broken we will do a patch release. Making it unconditional will simplify the code.
} | ||
|
||
func RunPodAndGetNodeName(c clientset.Interface, pod *v1.Pod, timeout time.Duration) (string, error) { | ||
retries := 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already defined as a top-level const, so delete this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
func getNonExecuteTaints(taints []v1.Taint) []v1.Taint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why Non? getNoExecuteTaints would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
} | ||
|
||
// TimedWorkerQueue keeps a set of TimedWorkers that still wait for execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/still wait/are still waiting/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// TimedWorkerQueue keeps a set of TimedWorkers that still wait for execution. | ||
type TimedWorkerQueue struct { | ||
sync.Mutex | ||
workers map[string]*TimedWorker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment what is the map key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
q.Lock() | ||
defer q.Unlock() | ||
if err == nil { | ||
q.workers[key] = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you delete(q.workers, key) in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid thrashing - it's enough to delete Pod once, so I keep the key as a way to dedup calls.
} | ||
} | ||
|
||
// Creates and startes a controller (informer) that watches updates on a pod in given namespace with given name. It puts a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/startes/starts/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 1,335 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a lot of work but can you add a test with three pods on the same node
- one tolerates T without tolerationSeconds
- one tolerates T for some tolerationSeconds==t
- one does not tolerate T
Then add taint T to the node and verify that the first one is never evicted, second is evicted after t, and third is evicted immediately?
A couple of other ideas for tests
- same pod configuration as above, but add two taints T and S, and verify all three pods are evicted immediately
- a pod that tolerates two taints, with different tolerationSeconds, then add both taints and verify that the pod is evicted after the min of the tolerationSeconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return func(args *WorkArgs) error { | ||
ns := args.NamespacedName.Namespace | ||
name := args.NamespacedName.Name | ||
glog.V(0).Infof("NoExecuteTaintManager is deleting Pod: %v", args.NamespacedName.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add an event here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// AddWork adds a work to the WorkerQueue which will be executed not earlier than `fireAt`. | ||
func (q *TimedWorkerQueue) AddWork(args *WorkArgs, createdAt time.Time, fireAt time.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add V(4) logging at entry to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
|
||
// CancelWork removes scheduled function execution from the queue. | ||
func (q *TimedWorkerQueue) CancelWork(key string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add V(4) logging at entry to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
One other question -- I'm a bit rusty on how the Informer stuff works. Is it completely edge-based? That is, when node controller starts up, taint controller doesn't do anything until a pod or node is created/updated/deleted? Or is it level-based at startup, i.e. it somehow triggers the callbacks for every object in the system on startup? I think the behavior we want is that when NC starts up, it scans all the nodes and enqueues the relevant pods for eviction based on the node's taints and the pods' tolerations. |
On startup, it gets an Add event for every existing object |
Thanks. And yeah, what @liggitt wrote answered my question. |