-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows Hyper-V Container Support For CRI #6862
Comments
cc @kevpar |
I feel like.... we already did this 😭 . Thanks for the detailed write up Danny! I'm in! |
cc @marosset @jsturtevant as well |
When a user specifies a container (or the sum of containers) that has limits above or below the default specified in the containerd configuration, what will the behavior be? For example if the pod requests more CPU or memory than the default containerd configuration:
With a pod:
|
One piece of this work is in #6901. It would be nice to have a checklist in the issue for each piece of implementation that needs to be done, with links to the PRs as they are published. |
Since it just came up in #6508, I thought I'd record a thought about the (punted to future) different platform-matcher use-case for Hyper-V isolation, from a question about using (Quoting myself because it's a bit out-of-context)
In the discussion of #6491, I think we had agreed that this would be done with a custom matcher. I don't recall any discussion of how this custom matcher would be triggered. At the time I had assumed it'd be the annotation on the ImageSpec (copied from the Pod Spec) but looking at #6657, I suspect the canonical way would be that the Hyper-V isolation runtime is somehow also able to influence the matcher used by PullImage in the same way it's going to be able to influence the snapshotter. It'd be nice if this was magic from enabling Hyper-V isolation, but in the design currently mooted, that's not visible outside the hcsshim-private |
Hey, Do we need to link this to Azure/AKS#1792? |
@TBBle I completely forgot to reply here my apologies.. Your last train of thought is something we're thinking about as the work described in #6657 (and recently implemented as an experimental feature) is really exciting to think about applying for use cases like this. It'd need some k8s work to really fully be usable though, so that punts the usability quite some months out |
Yes, that'd make sense |
Has there been any progress on "Add new test runs for wcow-hypervisor support"? It looks like those test runs are the only thing in the way of marking this complete for 1.7 milestone. |
@claudiubelu - FYI |
A quick note on one of the "future" tasks (not tracked elsewhere AFAIK, so putting it here)
#6899 has landed (fulfilling the part of #6657 we care about), so we can now have per-runtime snapshotters. However, to use that to deliver the above use-case, we also need a way to provide multiple configurations of the one WCOW snapshotter with different PlatformMatchers. #7431 for host process containers is doing a different thing for its similar use-case though, since in its case the platform is visible in CRI's API, and so the proposal there is for CRI to tell the existing snapshotter to use a different matcher. AFAIR (I'm still on sabbatical, so "R" is carrying a lot of load in that phrase) we don't currently have a "multiple-config snapshotters" setup, snapshotters register themselves by static string name, which is what the runtime config matches. So we'd need to teach the WCOW snapshotter ("windows") to register itself a few times with different platform configs (ideally sharing storage? Same underlying instance underneath, anything else will be wasteful). Or perhaps modify snapshot plugin initialisation to be able to produce multiple snapshotters from All that said, the matcher is needed by the "pull" operation, which is really "Network to content store", the snapshotter doesn't actually see the Matcher at all. So per the early, rambling bit of #7431 (comment), is "per-runtime snapshotter" actually the right tool for distinguishing Hyper-V and Process isolation image-choice logic? Should a "per-runtime platform matcher" be used instead? All three of Hyper-V, Process, and Not-At-All (Host process) isolation share the same on-disk format and images, AFAIK, so they should really share a single snapshotter for ease-of-comprehension if nothing else. |
I agree we should have per-runtime-platform-matcher in addition to per-runtime-snapshotter. However there is also an additional complexity of image management, at least with CRI. CRI API defines image operations that key only off of image name, so we need to figure out what happens when you e.g. pull the same image with two different runtimes/platforms. CRI (and thus kubelet) may need to be enlightened to key images on a name/runtime tuple instead. |
@kevpar - I thought that is why we added annotations to PullImage for CRI so we passed in the sandbox so we knew what type of thing to do here right? I get thats sorta a Windows hack but is there a problem using that? |
I think the annotations were added to facilitate passing in what runtime class a given pull should use. Kubelet doesn't actually do this right now AFAIK, though. |
An image name plus a I was under the impression that kubelet tracked images by their SHA256 ID (returned in the This is the same existing behaviour if a floating tag is named, I guess, and someone updates it between |
Going to close this out and open issues for the Future items for us to track. The foundation is there for this to work in 1.7 so this accomplished what it set out to do for the release |
Has anybody some step-by-step guide on how to make contained with hyper-v working? |
What is the problem you're trying to solve
We'd like to support launching hypervisor isolated Windows containers through the CRI entry point to light up this scenario for K8s. There's support to launch Hyper-V containers present in Containerd itself via the WithWindowsHyperV client option, as well as the ctr testing tools –isolation flag, however there is nothing in the CRI plugin that makes use of this functionality at the moment.
Describe the solution you'd like
There's a few spots that would need to change to add in "full" support, but at least in the 1.7 timeframe for getting in the minimal amount needed to launch/manage these containers, there's not a great deal.
Initial Support (1.7 timeframe)
Filling in the HyperV runtime spec field
The Windows Containerd shim exposes a SandboxIsolation enum that can be used to tell the shim what kind of container/pod to launch. This field in combination with new runtime class definitions in Containerd is how we can differentiate between process and hypervisor isolation for Windows. Below is an example pod spec and runtime class definition in Containerds config file:
We can also additionally expand on what the default CRI config can be in Containerd for Windows if not supplied in the config file. We would have to continually update this to include new runtimes anytime a new OS release/container image pair is made available.
Resource Limits For the VM
One way that the Windows shim supports setting resource limits (memory, vcpu count) for the lightweight VM is via annotations. The virtual machine based annotations all begin with
io.microsoft.virtualmachine.*
, so playing into the last section above would be to allow these annotations via thePodAnnotations
andContainerAnnotations
fields as shown.An example pod spec asking for the VM hosting the containers in the pod to boot with 4GB of memory and 4 vps is below:
Another way resource limits could be set, although the values would be fixed for the duration of a deployment unless Containerd was restarted or the value was overrode by specifying an annotation, would be the vm_process_count and vm_memory_size_in_mb fields that are present in the Windows shim specific options.
This could be extended further by having the runtime class specify the resource limits in the name. For example runhcs-wcow-hypervisor-20348-1vp2gb:
Testing
This is tricky as Github actions runners don't support nested virtualization, we'll likely need to do something similar to the approach the Windows periodic tests use and allocate az vms to do our bidding (https://github.com/containerd/containerd/blob/main/.github/workflows/windows-periodic.yml). This might be the most work..
"Full Support"
Pulling images that don't match hosts build
One of the pros for Hyper-V containers is that you're not constrained to the Windows hosts build number for image choice (ws2019 host no longer has to only use a 1809/ws2019 image). However, the Windows platform matching code is finnicky and tough to get right, and the main selling point for these containers is really security. I'd be alright punting figuring out the platform package changes until we know what's the right approach, and just get in the work to be able to launch these in general.
Resource Limits Looking Forward
There's platform limitations to supporting vcpu hot-add, but ideally k8s would tally up the total resource limits by adding up the container resource limits in the pod and sending it in some field for Windows. If that does come to fruition then we'll need to do something with this data. Writing this down for future reference mainly
Additional context
Thanks for reading the wall of text :)
Tracking
1.7
Future
The text was updated successfully, but these errors were encountered: