The GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher pod with the GPU bound #11660

rokkiter · 2024-04-08T07:14:56Z

What happened:
A clear and concise description of what the bug is.

When a VM created with kubevirt is bound to a GPU via passthrough mode, the GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher-pod with the GPU bound.

With a GPU bound directly to a pod (not created by kubevirt), the DCMG-EXPORT is able to obtain the GPU metrics for that pod.

What you expected to happen:
A clear and concise description of what you expected to happen.

A kubevirt-created launcher bound to a GPU can be captured by DCMG-EXPORT.

How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.

On a cluster with NVIDIA GPU Operator installed
Create a VM By VMI and binding GPU

apiVersion: kubevirt.io/v1
kind: VirtualMachine
...
  devices:
     gpus:
     - deviceName: nvidia.com/xxxx
       name: gpu-x
....

View GPU-related metrics in prometheus

Additional context:
Add any other context about the problem here.

You can see by the following code that DCMG EXPORT filters based on PodResource, the rules are
resourceName == nvidiaResourceName or strings.HasPrefix(resourceName, nvidiaMigResourcePrefix)
The nvidiaResourceName is "nvidia.com/gpu"

https://github.com/NVIDIA/dcgm-exporter/blob/main/pkg/dcgmexporter/kubernetes.go#L142

func (p *PodMapper) toDeviceToPod(
	devicePods *podresourcesapi.ListPodResourcesResponse, sysInfo SystemInfo,
) map[string]PodInfo {
	deviceToPodMap := make(map[string]PodInfo)

	for _, pod := range devicePods.GetPodResources() {
		for _, container := range pod.GetContainers() {
			for _, device := range container.GetDevices() {

				resourceName := device.GetResourceName()
				if resourceName != nvidiaResourceName {
					// Mig resources appear differently than GPU resources
					if !strings.HasPrefix(resourceName, nvidiaMigResourcePrefix) {
						continue
					}
				}
				...
			}
		}
	}

	return deviceToPodMap
}

But in kubevirt, the GPU-related PodResource Name seems to be gpu.DeviceName.

Environment:

KubeVirt version (use virtctl version): 1.0.0
Kubernetes version (use kubectl version): N/A
VM or VMI specifications: N/A
Cloud provider or hardware configuration: N/A
OS (e.g. from /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Install tools: N/A
Others: N/A

The text was updated successfully, but these errors were encountered:

aburdenthehand · 2024-04-10T14:15:52Z

/cc @machadovilaca

machadovilaca · 2024-04-10T17:17:09Z

Although I don't know in detail how we are handling GPUs in KubeVirt, I think it's clear that we passthrough the device name exactly, and it seems this is related to the NVIDIA metric collector. If you set up your VM requesting a GPU

spec:
  domain:
    devices:
      ...
      gpus:
      - deviceName: nvidia.com/SOMETHING
        name: gpu1

the resulting virt-launcher pod will contain a direct match of that label as expected

resources:
  ...
  requests:
    ...
    nvidia.com/SOMETHING: "1"

but the NVIDIA exporter, as you mentioned, only cares about resources named exactly nvidia.com/gpu or prefixed with nvidia.com/mig-. In the "GPU bound directly to a pod" you mentioned it's working, what is the GPU device name you are using?

example:
https://github.com/machadovilaca/kubevirt/blob/test-nvidia-dcgm-exporter/tests/monitoring/gpu.go

rokkiter · 2024-04-11T03:13:39Z

Thanks so much for focusing on this! Here's the yaml for kubevirt binding to the gpu.

  resources:
     requests:
       ...
       nvidia.com/gpu: 1 
     limits:
       ...
       nvidia.com/gpu: 1

I don't need to specify which GPU card to use when binding the GPU directly to the pod. I only need to focus on the number of GPUs bound to the pod, which will be handled by the GPU Opreator, so I just need to configure it as follows

If I set the deviceName to nvidia.com/gpu in kubevirt's passthrough mode I could theoretically achieve monitoring, but that doesn't seem very reasonable. And there's no way for the user to realize that this is needed to make the GPUs bound to the kubevirt luancher pod available for monitoring.

Instead, I would prefer that the kubevirt luancher pod set the podResource Name to nvidia.com/gpu regardless of the name of the GPU card if it is bound to a Navid GPU.

Of course, a better solution would be to push DCMG Export to change the policy of forcing matching nvidia.com/gpu to instead match the nvidia.com/gpu- prefix as the policy.

machadovilaca · 2024-04-11T11:16:08Z

Why is it not reasonable? Per my understanding, if you request a specific GPU for the pod, you also won't have NVIDIA monitoring there.

If you ask KubeVirt to request a specific nvidia.com/SOMETHING GPU to the VM, I don't think it makes sense for KubeVirt to decide "we don't care which GPU, let's just request nvidia.com/gpu: 1".

rokkiter · 2024-04-12T06:36:53Z

I think you're right.

it's not practical to try to fix this in kubevirt. I have some GPU cards mounted in my cluster and from kubectl describe node I can get the following information.

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests           Limits
  --------                       --------           ------
  ...
  nvidia.com/GP104GL_TESLA_P4    2                  2
  nvidia.com/GRID_P4-1Q          0                  0
  nvidia.com/GRID_P4-4Q          0                  0

if request nvidia.com/gpu: "1" , This is not recognized by k8s and the vm can not been created success.

This appears to be because the DCGM Exporter strictly follows the k8s specification for determining GPU resource. refer k8s device plugin,

The ResourceName it wants to advertise. Here ResourceName needs to follow the extended resource naming scheme as vendor-domain/resourcetype. (For example, an NVIDIA GPU is advertised as nvidia.com/gpu.)

This issue is better suited for upstream discussion, and I'll probably cite this issue as a practical example of what I'm looking for.

Thanks for the help provided!

kubevirt-bot · 2024-07-11T07:35:12Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot · 2024-08-10T08:14:32Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kubevirt-bot · 2024-09-09T08:27:10Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

kubevirt-bot · 2024-09-09T08:27:14Z

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

rokkiter added the kind/bug label Apr 8, 2024

rokkiter mentioned this issue Apr 12, 2024

The pod for a given GPU in k8s mode cannot be captured NVIDIA/dcgm-exporter#314

Open

kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2024

kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 10, 2024

kubevirt-bot closed this as completed Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher pod with the GPU bound #11660

The GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher pod with the GPU bound #11660

rokkiter commented Apr 8, 2024 •

edited

Loading

aburdenthehand commented Apr 10, 2024

machadovilaca commented Apr 10, 2024

rokkiter commented Apr 11, 2024 •

edited

Loading

machadovilaca commented Apr 11, 2024

rokkiter commented Apr 12, 2024

kubevirt-bot commented Jul 11, 2024

kubevirt-bot commented Aug 10, 2024

kubevirt-bot commented Sep 9, 2024

kubevirt-bot commented Sep 9, 2024

The GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher pod with the GPU bound #11660

The GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher pod with the GPU bound #11660

Comments

rokkiter commented Apr 8, 2024 • edited Loading

aburdenthehand commented Apr 10, 2024

machadovilaca commented Apr 10, 2024

rokkiter commented Apr 11, 2024 • edited Loading

machadovilaca commented Apr 11, 2024

rokkiter commented Apr 12, 2024

kubevirt-bot commented Jul 11, 2024

kubevirt-bot commented Aug 10, 2024

kubevirt-bot commented Sep 9, 2024

kubevirt-bot commented Sep 9, 2024

rokkiter commented Apr 8, 2024 •

edited

Loading

rokkiter commented Apr 11, 2024 •

edited

Loading