-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster-Wide CPU Feature Disablement in KubeVirt #11407
Conversation
@dankenigsberg i linked a bug for Libvirt in the PR description: |
/hold
Same for
The PR description does not explain why KubeVirt has to do this expensive task of manually tracking certain CPU features.
Going forward this will be an ever growing list of cpu fetaures which need special treatment - why shoulda high level mgmt layer (KubeVirt) manage thos, and not something like a VMM runtime daemon (libvirt)? |
I didn't open these bugs these are old bugs describing why we had to read the /usr/share/libvirt/cpu_map/[CPU Model].xml files.
This is meant for fixing similar issues in the future or when we encounter problematic features. It's not part of the kubevirt API, and only administrators can use it since it is in the kubevirt installation namespace.
Hey Fabian. I modifed the pr description to be clear on that and also added a bug i created for libvirt: https://gitlab.com/libvirt/libvirt/-/issues/608 Thanks for the notes |
Can't we then wait for libvirt to fix the bug instead of working around it? |
It might take a while for libvirt to fix the issue. From what I understand, there's a compatibility problem with QEMU, but I'm not sure about the details. The new workaround replaces the old one, which is removed in the first commit of this PR. However, using the old workaround is not recommended by libvirt. These are the advantages of the new approach:
|
@fabiand I'll also add this to the PR description to make it more clear |
This is not directly related to the bug. This PR should allow KubeVirt to react and mitigate issues with CPU models quickly instead of waiting for fixes from libvirt and QEMU. By default, no action is required from the cluster admin. |
@vladikr is the root approver so he makes the call. I take a different approach: these sorts of issues should be fixed in the relevant component (libvirt, qemu) and then consumed by KubeVirt via a rebuild. Exposing this in our API surface makes KubeVirt complicated, adding a burden on our users, tests and documentation. @Barakmor1 I'd appreciation if you keep |
That#s fine, but please recognize that this is a joint effort, @dankenigsberg |
Thanks for the link @vladikr , it would just be good (and now it is) to have this in this pr description as well @Barakmor1 if this is a mitigation - What is the complete workflow for an admin in order to safely use this feature? The timeline
I think it's clear how an admin would use the feature in order to workaround the issue, now some questions:
I'm trying to widen the view in order to get us to understand what burden we are putting onto the admin. @Barakmor1 can you help to find some answres to the questions above? |
@Barakmor1 do we know when https://gitlab.com/libvirt/libvirt/-/issues/608 will see some attention? |
@Barakmor1 @vladikr FWIW wouldn't an alernative be to be able to define custom CPU models? The diff to this proposal (IIUIC) would be: A custom CPU is independent of what libvirt/qemu provides. Aka the upgrade path would be clearer. |
29c1163
to
70caad7
Compare
Hi @dankenigsberg , I appreciate the important notes. I've added a link to Libvirt Issue in the comments. I'll also include a clear explanation in the documentation at https://github.com/kubevirt/user-guide. I'm currently working on a PR and will notify you once it's published. |
Yes, that's the process. Just a heads up: during the second stage, some existing VMs may need to be restarted. This process is simpler compared to the current one, which involves disabling Node Labeller, adding labels to nodes, modifying VMs to disable the feature, restarting them, and manually disabling the feature for new VMs.
With this new workaround, the VMs would appear the same in all the steps. The only thing that needs to be done is to create a config map called "cpu-model-to-features-to-disable" in the installation namespace with the problematic feature and relevant models. With the current(without that PR changes) workaround:
With the workaround, we will list features that are deprecated and should not be enabled, such as mpx in a ConfigMap.
The workaround will not affect anything, and the VM will enable mpx even though it exists in the map. The workaround will only help admins disable deprecated features that should not be used anyway. The libvirt/qemu/kubevirt fix will ensure that these features are disabled by default, even without the workaround. So there is no risk if an admin forgets to remove the ConfigMap, as this won't affect anything. |
There is a discussion regarding this issue with the Libvirt team in a mailing list. Hopefully, this will be addressed soon :) . |
This is similar to what we are doing here: the features in the map will be disabled automatically for some CPU models. There is no need to add features if the user didn't request additional features in the VMI's spec. |
@dankenigsberg here is the documentation PR: @fabiand can we remove the hold label? |
@fabiand partially this is possible today already. https://kubevirt.io/user-guide/virtual_machines/virtual_hardware/#features I don't see how this is an alternative to a cluster-wide approach. This approach puts a burden on the guest user who may not be away for CVEs and cluster wide issues. This PR is about allowing the admin to react quickly to known issues until fixes from lower layered projects will come.
|
/unhold |
Hi. @Barakmor1 @vladikr the topic got started because of the features:
- name: mpx
policy: disable ? |
This PR introduces a Kubevirt CR Configuration for cluster-wide CPU feature disablement. ****Why is this Feature needed?**** - In the past, we encountered cases where libvirt's validation process failed with a CPU model, even though it is usable according to the Libvirt API. This typically happens when a CPU feature associated with the CPU model gets deprecated. Examples of such features are "mpx" and "tsx". Explicit disablement of these features in the domain XML prevents such validation failures. - Currently, we rely on reading Libvirt's internal data to prevent the failures mentioned in point 1. However, Libvirt strongly advises against accessing its internal data. Additionally, it prevents us from supporting some CPU models when a deprecated CPU feature is not supported on the node. - CPU vulnerabilities might appear in the future, and administrators need a way to mitigate these vulnerabilities at the cluster level quickly, even before they are fixed in the libvirt/qemu level. ****How to use this Feature:**** To disable features across the entire cluster, a field named "cpuFeaturesToDisable" should be set in Kubevirt.spec.configuration.cpuFeaturesToDisable. For instance: spec: configuration: cpuFeaturesToDisable: Skylake-Client-noTSX-IBRS: - aes - mpx Opteron_G2" - svm This configuration will result in the disablement of aes and mpx whenever the CPU Model is Skylake-Client-noTSX-IBRS, and also result in the disablement of svm whenever the CPU Model is Opteron_G2. Note that setting this in Kubevirt.spec.configuration.cpuFeaturesToDisable won't be reflected in the VMI Spec. ****Why this solution is the preferred solution?**** Before this PR, if a CPU feature needed to be disabled for a CPU model due to a bug in qemu/libvirt or due to CPU vulnerabilities, users might have had to take multiple actions: Disable the Node Labeller. Add CPU model labels to nodes. Restart VMs with the problematic CPU feature to disable it. Create VMs with the problematic CPU feature disabled. After this PR, the required actions will be reduced to: Restart VMs with the problematic CPU feature. Furthermore, this solution enables broader CPU model support. For example, after this PR, KubeVirt will support Skylake on nodes that lack the mpx feature (which is currently not supported). ****How does this feature affect new VMs?**** Kubevirt.spec.configuration.cpuFeaturesToDisable only affects the libvirt domain XML input provided by KubeVirt, resulting in feature disablement at the QEMU level. For example, if the cpuFeaturesToDisable map contains the key and value Skylake-Client-noTSX-IBRS: "aes,mpx", the libvirt domain XML input will include the following: ... <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Skylake-Client-noTSX-IBRS</model> <topology sockets='1' dies='1' cores='1' threads='1'/> <feature policy='disable' name='aes'/> <feature policy='disable' name='mpx'/> ... With this new configuration, existing guest VMs won't be affected because it only changes the domain XML, which doesn't change once the VMs are running, even during live migrations. Additionally, the VMI CRs at the Kubernetes level won't be affected in any way by the configuration. For more information, please refer to: https://libvirt.org/formatdomain.html#cpu-model-and-topology ****How does this feature affect running VMs?**** With this new feature, existing VMs won't be affected because it only changes the domain XML, which doesn't change once the VMs are running, even during live migrations. ****Handling Conflicts Between cpuFeaturesToDisable and CPU Features API:**** If Kubevirt.spec.configuration.cpuFeaturesToDisable conflicts with the CPU features specified in VMI.Spec.Domain.CPU.Features, the configuration policy specified in VMI.Spec.Domain.CPU.Features will determine the behavior. ****Impact on scheduling:**** Kubevirt.spec.configuration.cpuFeaturesToDisable does not affect scheduling. Neither launcher affinity nor label selectors are influenced by it. Additionally, node labels remain unchanged regardless of the features included in the cpuFeaturesToDisable. ****Changes to Kubevirt.spec.configuration.cpuFeaturesToDisable:**** After removing cpuFeaturesToDisable from KubeVirt's configuration, restarting existing VMs or creating new ones will no longer disable features in the guest VMs. After modifying cpuFeaturesToDisable in KubeVirt's configuration, restarting existing VMs or creating new ones will disable features in the guest VMs according to the modified configuration. In both cases, existing VMs will remain unchanged even when live migrating them to different nodes, so the disabled features will remain disabled. Signed-off-by: bmordeha <[email protected]>
Signed-off-by: bmordeha <[email protected]>
using cpuFeaturesToDisable configuration Signed-off-by: bmordeha <[email protected]>
233a323
to
90a4662
Compare
Yes, for each new VM, we'll need to disable |
The instancetypes's goal is different and they come with limitations, you can't use them in all cases.
I am not sure what you mean here. Is an API designated to help with intermediate API not allowed? Is Kubevirt not allowed to provide better user/admin experience? I would argue that faster security patches or bug fixes are worth the additional (low) maintenance. We need to take into account that Kubevirt is just one layer on top of many and each layer can take considerable time to provide fixes.
|
@Barakmor1 thanks. Now, if we divide our small KubeVirt world into two phases: Env with a problematic CPU model , and an Env with a fixed CPU model, then we are saying
Thus we look for a temporary measure to fix one or more (possibly not all) VMs on a cluster to get through the time before "AFTER" appears. I have my concerns - raised above - with the mangling of CPU models (now sharpening my concerns) on our KubeVirt level. I really wonder - and eventually worry - how we keep track of it, and how to mitigate or rather foresee the impact of this. For multiple VMs a webhook, a tool like Kyverno, or something like the patch-operator) could help to do the job. TLDR We have two strategies to address critical situations (kubevirt fix and CPU features on VMI API), automation tools could be used to address multiple VMs. This implementation in this PR willr equire a mainteinance burden, additional knowledge by admins, and possibly other things we don't know of yet. |
@fabiand For me, relying on external tools to set a cluster wide policy that only affects VMs is not good enough. This API is specific because it should enforce a temporary policy for all VMs without affecting the VM/VMI spec. Also, it could be that admins would want to disable certain CPU features and make these unavailable. For explain, SVM was part of the QEMU CPU models.
Managing CPU flags on a VM level cannot be considered a cluster-wide strategy or an admins policy. In addition, I'm not sure whether CPU
can work with instance types. |
@fabiand @vladikr BTW another intresting usecase: |
My understanding is that instancetypes are a kind of template, allowing a user to share a configuration snippet across a group of VMs. In my view, this is exactly what you are trying to do here with the CPU feature disablement.
It is allowed, but it is not recommended.
I guess this is exactly where we see things differently.
My expectation is for Kubevirt to ideally provide the mitigation through a hotfix release and not through configuration. Then, it is up to Kubevirt to pressure the layered drivers it uses to deliver a proper fix. I was unable to understand why a configuration API is better than a quick, hotfix release with the same logic embedded into Kubevirt and managed by it until a proper resolution is found. Deferring the problem to the Admin using a configuration workaround sounds problematic to me. |
I find it had to agree with this. As an admin who is dealing with a security issue or an inconsistency that affects the cluster, I'd prefer to have a simple configuration rather than wait for an update and be exposed while the hotfix arrives. Issues that are described in this PR have been waiting for a solution for many months. Even though some of these issues are fixed in some layers, we still cannot get a sufficient release. |
Not all VMIs are started from a VM that references an instance type. |
I understand it is not perfect. Personally, if VMIs need to be treated as a must, I would prefer templating to be extended to VMIs over adding a new specific CPU-feature disabling API. But I do not think it is desired. |
tl;dr instance types (well preferences) are not a valid solution to this issue Chiming in late with my $0.05 on instance types (well preferences) somehow being another way to fix this. IMHO they are not. AFAICT we would end up in the middle ground between the admin driven workflow proposed by this PR and the current purely VM owner driven workflow. Instance types are resource containers and do not model CPU features [1]. Preferences can provide preferred CPU Features but these can be overridden within the VM definition [2]. Both only apply to the VM and have a 1:1 point in time relationship with the VM making it awkward to apply cluster-wide updates to already deployed VMs. Introducing preferences into this mix means that an admin would need to mutate all VirtualMachineClusterPreferences to disable certain CPUFeatures and then move all existing VMs to use the new version of the preference. That in itself might move the VM from a much older generation of the resource to something totally new that itself might be undesirable. VM owners might also have their own namespaced VirtualMachinePreferences where they also need to repeat these steps. Hopefully this brings the idea of using instance types and preferences to solve this particular use cases to a close. I've not been following this PR closely enough to provide a complete review but at a high level providing an admin driven workflow over something involving a VM owner is IMHO the way to go. [1] https://kubevirt.io/api-reference/main/definitions.html#_v1beta1_cpuinstancetype |
/cc |
@Barakmor1: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/hold Simply due to the long discussion |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. /close |
@kubevirt-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR introduces a Kubevirt CR Configuration for cluster-wide CPU feature disablement.
Why is this Feature needed?
In the past, we encountered cases where libvirt's validation process failed with a CPU model, even though it is usable according to the Libvirt API. This typically happens when a CPU feature associated with the CPU model gets deprecated.
Examples of such features are "mpx" and "tsx".
Explicit disablement of these features in the domain XML prevents such validation failures.
Currently, we rely on reading Libvirt's internal data to prevent the failures mentioned in point 1. However, Libvirt strongly advises against accessing its internal data. Additionally, it prevents us from supporting some CPU models when a deprecated CPU feature is not supported on the node.
CPU vulnerabilities might appear in the future, and administrators need a way to mitigate these vulnerabilities at the cluster level quickly, even before they are fixed in the libvirt/qemu level.
How to use this Feature:
To disable features across the entire cluster, a field named "cpuFeaturesToDisable" should be set in Kubevirt.spec.configuration.cpuFeaturesToDisable. For instance:
This configuration will result in the disablement of aes and mpx whenever the CPU Model is Skylake-Client-noTSX-IBRS, and also result in the disablement of svm whenever the CPU Model is Opteron_G2. Note that setting this in Kubevirt.spec.configuration.cpuFeaturesToDisable won't be reflected in the VMI Spec.
How users benefit from this solution?
Before this PR, if a CPU feature needed to be disabled for a CPU model due to a bug in qemu/libvirt or due to CPU vulnerabilities, users might have had to take multiple actions:
After this PR, the required actions will be reduced to:
Kubevirt.spec.configuration.cpuFeaturesToDisable
field to disable the problematic CPU feature.Furthermore, this solution enables broader CPU model support because it allows us to avoid reading Libvirt internal data. For example, after this PR, KubeVirt will support Skylake on nodes that lack the mpx feature (which is currently not supported)
The following alternatives were considered:
Using a ConfigMap: Admins could create a ConfigMap with a fixed name to describe features to disable, which is defined similar to the Kubevirt.spec.configuration.cpuFeaturesToDisable field. However, this makes configuration harder as it involves setting multiple objects to configure KubeVirt's behavior.
Manual Feature Disabling: Users would manually disable problematic features each time they create a new VM and restart all VMs with the problematic feature disabled until the issue is fixed, instead of introducing a new API. However, it might take a significant amount of time until it's fixed in QEMU/libvirt, and KubeVirt will need to be upgraded to propagate the fix. During this time, the burden would fall on the VM creator, who would have to manually configure feature disablement each time they create a VM.
How does this feature affect new VMs?
Kubevirt.spec.configuration.cpuFeaturesToDisable only affects the libvirt domain XML input provided by KubeVirt, resulting in feature disablement at the QEMU level.
For example, if the cpuFeaturesToDisable map contains the key and value Skylake-Client-noTSX-IBRS: "aes,mpx", the libvirt domain XML input will include the following:
With this new configuration, existing guest VMs won't be affected because it only changes the domain XML, which doesn't change once the VMs are running, even during live migrations. Additionally, the VMI CRs at the Kubernetes level won't be affected in any way by the configuration.
For more information, please refer to the provided link.
How does this feature affect running VMs and live migration?
With this new feature, existing VMs won't be affected because it only changes the domain XML, which doesn't change once the VMs are running, even during live migrations.
Handling Conflicts Between cpuFeaturesToDisable and CPU Features API:
If Kubevirt.spec.configuration.cpuFeaturesToDisable conflicts with the CPU features specified in VMI.Spec.Domain.CPU.Features, the configuration policy specified in VMI.Spec.Domain.CPU.Features will determine the behavior.
Impact on scheduling:
Kubevirt.spec.configuration.cpuFeaturesToDisable does not affect scheduling. Neither launcher affinity nor label selectors are influenced by it. Additionally, node labels remain unchanged regardless of the features included in the cpuFeaturesToDisable.
Changes to Kubevirt.spec.configuration.cpuFeaturesToDisable:
In both cases, existing VMs will remain unchanged even when live migrating them to different nodes, so the disabled features will remain disabled.
PR for documentation in the user-guide:
kubevirt/user-guide#771
Fixes #
Why we need it and why it was done in this way
allowing broader CPU model support when svm or mpx features are absent.
Special notes for your reviewer
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note