Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to kill check command after exceeding timeout is not reported #4981

Closed
hansmi opened this issue Feb 7, 2017 · 1 comment · Fixed by #5231
Closed

Failure to kill check command after exceeding timeout is not reported #4981

hansmi opened this issue Feb 7, 2017 · 1 comment · Fixed by #5231
Assignees
Labels
area/checks Check execution and results bug Something isn't working
Milestone

Comments

@hansmi
Copy link

hansmi commented Feb 7, 2017

We're invoking a number of check commands via sudo as they require privileges not given to the standard user for Icinga. Depending on the command the target user may be root or another dedicated user. We're using a configuration using this layout:

template CheckCommand "docker-check-command" {
  timeout = 60
  command = ["/usr/bin/sudo", "-n", "-u", "$sudo_user$"]
  vars.sudo_user = "root"
}

object CheckCommand "docker_daemon" {
  import "plugin-check-command"
  import "docker-check-command"
  command  = [PluginDir   "/check_docker_daemon"]
}

When the command, check_docker_daemon in this case, runs for more than a minute Icinga is supposed to kill the command. Unfortunately that fails as sudo is setuid and the Icinga user can't send signals.

recvmsg(11, {msg_name(0)=NULL, msg_iov(1)=[{"{\"command\":\"kill\",\"pid\":-11637.0,\"signum\":9.0}", 4096}], msg_controllen=0, msg_flags=0}, 0) = 46
kill(4294955659, SIGKILL) = -1 EPERM (Operation not permitted)

(4294955659 is 2**32 - 11637 as an unsigned integer with 11637 being the group leader PID)

Now, all of this is obvious. Unfortunately Icinga and Icingaweb do not give any indication of this happening. Instead the check's “last check” keeps growing and no alert is ever sent.

Could you please extend Icinga to produce an UNKNOWN status when killing a check command fails?

@hansmi hansmi changed the title Errors when killing check command after timeout are invisible Failure to kill check command after exceeding timeout is not reported Feb 7, 2017
@dnsmichi dnsmichi added bug Something isn't working area/checks Check execution and results labels Feb 10, 2017
@dnsmichi dnsmichi added the help wanted Extra attention is needed label Apr 26, 2017
@jkroepke
Copy link

Our current hacky workaround for this problem:

command = [ "/bin/bash", "-c", "sudo $$*; wait $$!", "--" ]

Because icinga now kill the bash instead sudo. 👎

@dnsmichi dnsmichi removed the help wanted Extra attention is needed label Oct 13, 2017
@dnsmichi dnsmichi added this to the 2.8.0 milestone Oct 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checks Check execution and results bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants