-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question]: Node Under Utilization #133
Comments
Just verified that on the reference cluster running Keep in mind that the bin-packing scheduler will try to pack the pods onto as few nodes as possible. It is not unusual that some nodes will have very low utilization for that reason. That allows Karpenter to spin them down. However, Karpenter will never spin down a controller node. I believe that is likely what you are seeing here. We can look into a way to optimize the behavior here.
A PDB blocks disruption when not enough pods in its set are running and healthy as to allow further disruption. You need to provide more information here for each PDB, specifically why pods in their sets are already unhealthy. Typically this is because they have already been evicted for one reason or another. You can look at the Kubernetes events to find all the reasons for a pod's eviction. Looking at the reference cluster running |
An optimization for the controller node bin-packing has been included in the next release. |
Prior Search
What is your question?
After upgrading to
edge.2024-09-04
andedge.2024-09-10
the node utilization has been sitting around 50%. I've watched for over 4 hours of pods stabilizing and nodes being spun up and down. Now the nodes have stablized for over 2 hours and is no longer consolidating.Here are all the event logs from nodes that I believe could have been consolidated but were blocked
I also noticed that not much was being scheduled onto the controller nodes. Both of my controller nodes only have 4 pods running. I don't know if this is expected but seems to be different than what I remember.
What can be done to resolve the PDBs and why aren't they be scheduled on the controller nodes?
What primary components of the stack does this relate to?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: