Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRI: Windows images created with "shell form" CMD in Dockerfile are broken #5067

Closed
kevpar opened this issue Feb 23, 2021 · 21 comments
Closed
Labels
area/cri Container Runtime Interface (CRI) kind/bug platform/windows Windows

Comments

@kevpar
Copy link
Member

kevpar commented Feb 23, 2021

tl;dr

When Docker creates a Windows image with "shell form" CMD/ENTRYPOINT in the Dockerfile, the args in the image config are formatted differently than CRI expects. This breaks running the image with CRI since the process args in the generated OCI runtime spec are incorrect. Docker works around this by using a non-standard image config field ArgsEscaped to indicate that different handling should be used for the args when the OCI spec is generated.

I'm not sure if this is something we can fix easily in Docker's builder code. Even if that can be fixed, there are probably still many images out in the wild that are already affected. The easiest approach here is probably just getting ArgsEscaped added to the OCI spec, and updating containerd to use it.

Background

On Windows, the command to execute in a container needs to be given as a single CommandLine string. This is different from Linux where an array of string args is given (as used by execve). In a Dockerfile, there are two directives to control what command is executed in the container: ENTRYPOINT and CMD. They are both affected by this issue, but we just focus on CMD below for clarity.

CMD can be specified in either "exec" form:

CMD ["cmd.exe", "/S", "/C", "foo.cmd"]

Or in "shell" form:

CMD foo.cmd

In “shell” form, there is a shell command prepended to the command (e.g. cmd.exe /S /C).

In a container image, the default command to run is stored as a JSON array:

"Cmd": [
    "cmd.exe",
    "/S",
    "/C",
    "foo.cmd"
]

However, there is an issue with Docker-built images, where if using the “shell” form of the CMD directive, the args are all placed in a single array element:

"Cmd": [
    "cmd.exe /S /C foo.cmd"
]

Now, when CRI generates an OCI runtime spec for the container, it needs to produce a single command line string. To do this from the args array, it enumerates the args and concatenates them, but first it escapes each item. This is important in case individual args container spaces:

["cmd.exe", "/S", "/C", "C:\Program Files\MyApp\foo.exe"]

becomes

cmd.exe /S /C "C:\Program Files\MyApp\foo.exe"

However, this behavior is incorrect when working with a container image produced using the “shell” form of CMD:

["cmd.exe /S /C foo.cmd"]

becomes

"cmd.exe /S /C foo.cmd"

(note the quotes)
This has the undesired effect of attempting to locate a binary named cmd.exe /S /C foo.cmd and run it, which fails with the The system cannot find the file specified..

Docker behavior

When Docker builds a Windows image with a "shell form" CMD or ENTRYPOINT, it puts all of the args into a single element in the Cmd field in the image config, and sets a non-standard field ArgsEscaped to true.

At run time, the value of ArgsEscaped is used to determine how to format the process args in the OCI runtime spec. If ArgsEscaped is false or not present, the standard behavior is used (spec.Process.Args receives an array of args). However, if ArgsEscaped is true, Docker instead populates spec.Process.CommandLine with a string containing the exact command line to run (CommandLine is a Windows specific OCI runtime spec field).

The Docker change to introduce this behavior was made in this commit: moby/moby@20833b0
The main point of interest is here: moby/moby@20833b0#diff-6688f4342adf127b206582942bc147a0efab01c2e376b8a1a81e62c4bfee3ce1R242-R249

Repro

This issue is fairly simple to repro.

Build an image in Docker that uses "shell form" CMD

Dockerfile:

FROM mcr.microsoft.com/windows/nanoserver:1809
COPY foo.cmd /foo.cmd
CMD foo.cmd

foo.cmd:

ping -t 127.0.0.1

Run build:

> docker build -t bug .
[...]
Successfully tagged bug:latest

Export the image from Docker and import into containerd

> docker image save -o bug.tar bug
> ctr --namespace k8s.io i import bug.tar

Run a container from the image

> $p=crictl runp --runtime runhcs-wcow-process pod.json
> $c=crictl create --no-pull $p container.json pod.json
> crictl start $c
time="2021-02-22T23:34:08-08:00" level=fatal msg="Starting the container \"355a1e78037b8b38495e7bdf738728c9e23ea338b6a449d707706897eee4bcfd\" failed: rpc error: code = Unknown desc = failed to start containerd task \"355a1e78037b8b38495e7bdf738728c9e23ea338b6a449d707706897eee4bcfd\": hcsshim::System::CreateProcess 355a1e78037b8b38495e7bdf738728c9e23ea338b6a449d707706897eee4bcfd: The system cannot find the file specified.\n(extra info: {\"CommandLine\":\"\\\"cmd /S /C foo.cmd\\\"\",\"User\":\"ContainerUser\",\"WorkingDirectory\":\"C:\\\\\",\"CreateStdOutPipe\":true,\"CreateStdErrPipe\":true}): unknown"

If we look at the image config we can see ArgsEscaped is set:

> ctr --namespace k8s.io content get sha256:<imageid> | convertfrom-json | convertto-json
{
[...]
  "config": {
[...]
    "Cmd": [
      "cmd /S /C foo.cmd"
    ],
    "ArgsEscaped": true,
[...]
  },
[...]
}

Fix

We could potentially fix this for future images with a change to Docker to make it format args differently. However, I see two concerns with this approach:

  • The change to Docker that introduced this behavior was already an attempt to fix other issues with Windows process arg escaping. Since clearly this is an area that has introduced bugs in the past, we risk more problems by changing it yet again.
  • Even if we can fix this in Docker's builder, there are presumably many images that have been built since this change was introduced (~2 years ago) that will still not run in containerd with CRI.

The easiest fix here is probably to get ArgsEscaped added officially on the OCI image config, and then have containerd duplicate Docker's behavior in this regard.

Interested in discussion on how else we might address this.

@siprbaum
Copy link

The pull request opencontainers/image-spec#829 dealing with the specification as proposed above

The easiest fix here is probably to get ArgsEscaped added officially on the OCI image config, and then have containerd duplicate Docker's behavior in this regard.

was closed without merging.

Are there currently any other ideas how to solve this?

@kevpar
Copy link
Member Author

kevpar commented Jun 18, 2021

The pull request opencontainers/image-spec#829 dealing with the specification as proposed above

The easiest fix here is probably to get ArgsEscaped added officially on the OCI image config, and then have containerd duplicate Docker's behavior in this regard.

was closed without merging.

Are there currently any other ideas how to solve this?

I need to update the issue description a bit here. After some internal discussion, we believe adding ArgsEscaped to the OCI spec directly is kind of a hacky fix, since the exact semantics of ArgsEscaped are not really what we want to support long-term. My current thinking here is to do two things:

  • In the short-term, update CRI to parse out the ArgsEscaped field from the image spec and use it when creating the runtime spec. This allows us compatibility with the set of images that exist today relying on ArgsEscaped, though does require CRI to parse out a non-standard field.
  • In the long-term, add a CommandLine field to the OCI image spec that holds a single string. Update Docker and containerd so that built images that would have used ArgsEscaped instead set the CommandLine field with the string to use when starting the container process.

@thaJeztah
Copy link
Member

This related to opencontainers/runtime-spec#998 ?

@jterry75
Copy link
Contributor

Any update on this by chance? Note that its not just Shell form its also Exec form that is broken. And the fix proposed will only fix Shell form commands. (Which is better than no fix at all!).

EX:

FROM mcr.microsoft.com/windows/servercore:ltsc2019

ENTRYPOINT ["ping", "-n", "20", "localhost"]

Which is encoded by Docker as: Entrypoint: ["cmd /S /C [\"ping\", \"-n\", \"20\", \"localhost\"]"]

And

FROM mcr.microsoft.com/windows/servercore:ltsc2019

ENTRYPOINT ping -n 20 localhost

Which is encoded by Docker as: Entrypoint: ["cmd /S /C ping -n 20 localhost"]

In either case its a length 1 array of a single string. Which if we pass as a CommandLine in the Process struct works only for Shell.

I verified that this is easily fixed by adding the ArgsEscaped to the ImageConfig and then applying this in the spec_opts.go WithImageConfigArgs to apply it as the CommandLine instead of Args

Thanks!

@kevpar
Copy link
Member Author

kevpar commented Oct 27, 2021

Any update on this by chance? Note that its not just Shell form its also Exec form that is broken. And the fix proposed will only fix Shell form commands. (Which is better than no fix at all!).

EX:

FROM mcr.microsoft.com/windows/servercore:ltsc2019

ENTRYPOINT ["ping", "-n", "20", "localhost"]

Which is encoded by Docker as: Entrypoint: ["cmd /S /C [\"ping\", \"-n\", \"20\", \"localhost\"]"]

And

FROM mcr.microsoft.com/windows/servercore:ltsc2019

ENTRYPOINT ping -n 20 localhost

Which is encoded by Docker as: Entrypoint: ["cmd /S /C ping -n 20 localhost"]

In either case its a length 1 array of a single string. Which if we pass as a CommandLine in the Process struct works only for Shell.

I verified that this is easily fixed by adding the ArgsEscaped to the ImageConfig and then applying this in the spec_opts.go WithImageConfigArgs to apply it as the CommandLine instead of Args

Thanks!

Hi Justin :)

Yes, unfortunately the issue here is a bit more widespread than I first realized.

I'm hoping to get to work on this again soon. The idea was to make a change so containerd respects the existing ArgsEscaped image config field (even though its not in the OCI image spec), and then longer-term work on a more standardized approach with OCI.

@jterry75
Copy link
Contributor

Hey! :)

This sounds good, do you want to do ArgsEscaped as a "hack" patch to the image config in this repo or do you want to push the change to the image spec repo and then vendor it here? I like the 2 phase approach here though.

@kevpar
Copy link
Member Author

kevpar commented Oct 27, 2021

Hey! :)

This sounds good, do you want to do ArgsEscaped as a "hack" patch to the image config in this repo or do you want to push the change to the image spec repo and then vendor it here? I like the 2 phase approach here though.

I didn't want to mess with changing the actual OCI image config definition, so the thought was to just add ArgsEscpaed as a field on CRI's image type, and do something like here:

var ociimage imagespec.Image
if err := json.Unmarshal(rb, &ociimage); err != nil {
    return nil, errors.Wrapf(err, "unmarshal image config %s", rb)
}

var hackConfig struct{ Config struct{ ArgsEscaped bool } }
if err := json.Unmarshal(rb, &hackConfig); err != nil {
    return nil, errors.Wrapf(err, "unmarshal hack image config %s", rb)
}

return &Image{
    [...]
    ImageSpec:  ociimage,
    ArgsEscpaed: hackConfig.Config.ArgsEscaped,
}, nil

@jterry75
Copy link
Contributor

My favorite part about unmarshalling JSON... Doing it twice :). Yes this would work great. We could for sure do it that way and that avoids the vendor problem as well so its a nice idea.

@robbie-coder
Copy link

robbie-coder commented Feb 22, 2022

I ran into this issue today while testing kubernetes 1.23 on AKS. In my case I'm simply trying to launch a local PowerShell script inside the container which used to work fine but not with the containerd runtime. As mentioned before both the exec and shell forms don't work anymore with containerd.

Kubernetes is removing dockershim soon (in April in 1.24) this will probably affect a lot of Windows containers. The Kubernetes FAQ mentions that "All your existing images will still work exactly the same". Which is not true for probably a lot of Windows containers due to this issue.

What is not clear to me is if there is any workaround that I can apply in my Dockerfile? As far as I have been testing there doesn't seem to be one. Which might make this issue more important.

What used to work:
CMD powershell c:\test\test.ps1

Workarounds I tried but don't work:
CMD powershell c:\\test\\test.ps1
CMD ["powershell.exe", "c:\test\test.ps1"]
CMD ["powershell.exe", "c:\\test\\test.ps1"]

Any suggestion for a workaround?

@kevpar
Copy link
Member Author

kevpar commented Feb 22, 2022

CMD ["powershell.exe", "c:\test\test.ps1"]

I would expect this form to work. Is it possible you have another CMD or ENTRYPOINT elsewhere in your Dockerfile or base image which still uses shell form?

If you don't want to rebuild your image, you can also work around this by overriding the image Entrypoint/Cmd with a command set via the k8s pod spec.

@robbie-coder
Copy link

Today after restarting the cluster I tried the workarounds again and CMD ["powershell.exe", "c:\\test\\test.ps1"] worked fine both in docker and in Kubernetes running containerd.

Somehow while testing my workarounds yesterday the cluster was in a weird state, it also gave different errors then the original issue did.

Thanks @kevpar for pointing this out and indeed you can also override the Entrypoint / Cmd in the k8s pod specs. Good to know that there are workarounds for this issue.

@porrascarlos802018
Copy link

Recommendation, get your container up and running first, with base image , public accessible, then exec into the container, and test the commands, parameters you want to use internally in the container, to confirm the right sintaxis accepted by your image, and or application.

@richshadman
Copy link

This impacts the RUN instruction as well which also has both SHELL and EXEC forms.

https://docs.docker.com/engine/reference/builder/#run

@T-RN-R
Copy link

T-RN-R commented Oct 13, 2022

Any progress on this? This appears to be affecting Windows container images deployed to Azure Container Instances.

@alfeg
Copy link

alfeg commented Nov 2, 2022

Suddenly we were affected by this issue on AKS.
Removing CMD ["cmd.exe", "/S", "/C" from Dockerfile fixed our unexpectedly failing images

@twistypigeon
Copy link

If it helps anyone I found this was my issue on AWS ECS when attempting to run a Windows image on Fargate. They must also be using containerd under the hood.
I overrode the entrypoint in my Dockerfile to the "exec" form and managed the get it running in ECS.

@stevenjh
Copy link

stevenjh commented May 3, 2023

@T-RN-R did you find a resolution to this issue in ACI? I've just run into it today when moving from command line deployed containers to YAML defined deployments.

@stevenjh
Copy link

stevenjh commented May 3, 2023

Resolved my issue in ACI's, the YAML command needed to be rewritten from command: "python file.py --flags" to below

command:
  - "python"
  - "file.py"
  - "--flags"

@kiashok
Copy link
Contributor

kiashok commented Nov 1, 2023

Fixed by #8198 , #9317 (fixing a some changes that missed making it to pkg/cri/sbserver and containerd/main has not switched to sbserver and sbserver has been renamed to pkg/cri/server) . Closing this github issue

@kiashok
Copy link
Contributor

kiashok commented Nov 1, 2023

@stevenjh are you still actively hitting this problem? Are you using ACI?

@kiashok
Copy link
Contributor

kiashok commented Mar 6, 2024

Fixed by #8198 , #9317 (fixing a some changes that missed making it to pkg/cri/sbserver and containerd/main has not switched to sbserver and sbserver has been renamed to pkg/cri/server) . Closing this github issue

This issue has been fixed with the above linked PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cri Container Runtime Interface (CRI) kind/bug platform/windows Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.