Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher pod with the GPU bound #11660

Closed
rokkiter opened this issue Apr 8, 2024 · 9 comments
Labels
kind/bug lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@rokkiter
Copy link
Contributor

rokkiter commented Apr 8, 2024

What happened:
A clear and concise description of what the bug is.

When a VM created with kubevirt is bound to a GPU via passthrough mode, the GPU-related metrics data exposed by DCMG-EXPORT does not include the laungcher-pod with the GPU bound.

With a GPU bound directly to a pod (not created by kubevirt), the DCMG-EXPORT is able to obtain the GPU metrics for that pod.

What you expected to happen:
A clear and concise description of what you expected to happen.

A kubevirt-created launcher bound to a GPU can be captured by DCMG-EXPORT.

How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.

  1. On a cluster with NVIDIA GPU Operator installed
  2. Create a VM By VMI and binding GPU
apiVersion: kubevirt.io/v1
kind: VirtualMachine
...
  devices:
     gpus:
     - deviceName: nvidia.com/xxxx
       name: gpu-x
....
  1. View GPU-related metrics in prometheus

Additional context:
Add any other context about the problem here.

You can see by the following code that DCMG EXPORT filters based on PodResource, the rules are
resourceName == nvidiaResourceName or strings.HasPrefix(resourceName, nvidiaMigResourcePrefix)
The nvidiaResourceName is "nvidia.com/gpu"

https://github.com/NVIDIA/dcgm-exporter/blob/main/pkg/dcgmexporter/kubernetes.go#L142

func (p *PodMapper) toDeviceToPod(
	devicePods *podresourcesapi.ListPodResourcesResponse, sysInfo SystemInfo,
) map[string]PodInfo {
	deviceToPodMap := make(map[string]PodInfo)

	for _, pod := range devicePods.GetPodResources() {
		for _, container := range pod.GetContainers() {
			for _, device := range container.GetDevices() {

				resourceName := device.GetResourceName()
				if resourceName != nvidiaResourceName {
					// Mig resources appear differently than GPU resources
					if !strings.HasPrefix(resourceName, nvidiaMigResourcePrefix) {
						continue
					}
				}
				...
			}
		}
	}

	return deviceToPodMap
}

But in kubevirt, the GPU-related PodResource Name seems to be gpu.DeviceName.

Environment:

  • KubeVirt version (use virtctl version): 1.0.0
  • Kubernetes version (use kubectl version): N/A
  • VM or VMI specifications: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Others: N/A
@aburdenthehand
Copy link
Contributor

/cc @machadovilaca

@machadovilaca
Copy link
Member

Although I don't know in detail how we are handling GPUs in KubeVirt, I think it's clear that we passthrough the device name exactly, and it seems this is related to the NVIDIA metric collector. If you set up your VM requesting a GPU

spec:
  domain:
    devices:
      ...
      gpus:
      - deviceName: nvidia.com/SOMETHING
        name: gpu1

the resulting virt-launcher pod will contain a direct match of that label as expected

resources:
  ...
  requests:
    ...
    nvidia.com/SOMETHING: "1"

but the NVIDIA exporter, as you mentioned, only cares about resources named exactly nvidia.com/gpu or prefixed with nvidia.com/mig-. In the "GPU bound directly to a pod" you mentioned it's working, what is the GPU device name you are using?

example:
https://github.com/machadovilaca/kubevirt/blob/test-nvidia-dcgm-exporter/tests/monitoring/gpu.go

@rokkiter
Copy link
Contributor Author

rokkiter commented Apr 11, 2024

Thanks so much for focusing on this! Here's the yaml for kubevirt binding to the gpu.

  resources:
     requests:
       ...
       nvidia.com/gpu: 1 
     limits:
       ...
       nvidia.com/gpu: 1 

I don't need to specify which GPU card to use when binding the GPU directly to the pod. I only need to focus on the number of GPUs bound to the pod, which will be handled by the GPU Opreator, so I just need to configure it as follows

If I set the deviceName to nvidia.com/gpu in kubevirt's passthrough mode I could theoretically achieve monitoring, but that doesn't seem very reasonable. And there's no way for the user to realize that this is needed to make the GPUs bound to the kubevirt luancher pod available for monitoring.

Instead, I would prefer that the kubevirt luancher pod set the podResource Name to nvidia.com/gpu regardless of the name of the GPU card if it is bound to a Navid GPU.

Of course, a better solution would be to push DCMG Export to change the policy of forcing matching nvidia.com/gpu to instead match the nvidia.com/gpu- prefix as the policy.

@machadovilaca
Copy link
Member

Why is it not reasonable? Per my understanding, if you request a specific GPU for the pod, you also won't have NVIDIA monitoring there.

If you ask KubeVirt to request a specific nvidia.com/SOMETHING GPU to the VM, I don't think it makes sense for KubeVirt to decide "we don't care which GPU, let's just request nvidia.com/gpu: 1".

@rokkiter
Copy link
Contributor Author

I think you're right.

it's not practical to try to fix this in kubevirt. I have some GPU cards mounted in my cluster and from kubectl describe node I can get the following information.

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                       Requests           Limits
  --------                       --------           ------
  ...
  nvidia.com/GP104GL_TESLA_P4    2                  2
  nvidia.com/GRID_P4-1Q          0                  0
  nvidia.com/GRID_P4-4Q          0                  0

if request nvidia.com/gpu: "1" , This is not recognized by k8s and the vm can not been created success.

This appears to be because the DCGM Exporter strictly follows the k8s specification for determining GPU resource. refer k8s device plugin,

The ResourceName it wants to advertise. Here ResourceName needs to follow the extended resource naming scheme as vendor-domain/resourcetype. (For example, an NVIDIA GPU is advertised as nvidia.com/gpu.)

This issue is better suited for upstream discussion, and I'll probably cite this issue as a practical example of what I'm looking for.

Thanks for the help provided!

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 11, 2024
@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 10, 2024
@kubevirt-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@kubevirt-bot
Copy link
Contributor

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants