Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors in jupyter_client.provisioning.provisioner_base::KernelProvisionerBase.launch_kernel #960

Open
dchirikov opened this issue Jul 25, 2023 · 1 comment

Comments

@dchirikov
Copy link

I'm currently working on a new custom kernel provisioner but I'm finding it a bit tricky to handle errors properly. It seems like the function KernelProvisionerBase.launch_kernel should always give back a KernelConnectionInfo structure (That's what type hints say). But sometimes, especially when the kernel needs to run on another computer (a remote host), this might not be possible due to some infra or scheduling issues. This means the kernel can't start, and there's no connection info will be returned from the call.

From what I can tell, the only way to let the system know something's gone wrong is to raise a RuntimeError() exception. This kind of works, but it also makes a mess, filling JupyterLab's error output (stderr) with tons of scary and confusing text. I'd really prefer just a couple lines of clean log.error() messages instead in stdeout with copy of them to JupyterLab's frontend user is looking at.

But that's not the biggest problem I'm facing. When I try to shut down JupyterLab, a kernel_id for the kernel which doesn't exist pops up. Here's what that looks like:

      File "/w/.tox/py311/lib/python3.11/site-packages/jupyter_client/multikernelmanager.py", line 306, in _async_shutdown_all
        await asyncio.gather(*futs)
      File "/w/.tox/py311/lib/python3.11/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 418, in _async_shutdown_kernel
        self._check_kernel_id(kernel_id)
      File "/w/.tox/py311/lib/python3.11/site-packages/jupyter_server/services/kernels/kernelmanager.py", line 532, in _check_kernel_id
        raise web.HTTPError(404, "Kernel does not exist: %s" % kernel_id)
    tornado.web.HTTPError: HTTP 404: Not Found (Kernel does not exist: 0546b17a-af0b-4599-8473-cdcaa4011254)

This 0546b17a-af0b-4599-8473-cdcaa4011254 is a kernel_id of never running kernel which did not return its connection_info. So I was thinking I am doing something not quite right and need to indicate a kernel spawning error differently.

Thanks in advance.

@kevin-bates
Copy link
Member

From what I can tell, the only way to let the system know something's gone wrong is to raise a RuntimeError() exception. This kind of works, but it also makes a mess, filling JupyterLab's error output (stderr) with tons of scary and confusing text. I'd really prefer just a couple lines of clean log.error() messages instead in stdeout with copy of them to JupyterLab's frontend user is looking at.

Raising exceptions is the way startup errors are expected to be propagated. How those exceptions are displayed to the user is a different matter (and probably something to deal with in the Lab layer).

But sometimes, especially when the kernel needs to run on another computer (a remote host), this might not be possible due to some infra or scheduling issues. This means the kernel can't start, and there's no connection info will be returned from the call.

Starting remote kernels introduces multiple layers in which errors (and delays) can occur. Work was done in Jupyter Server on Pending Kernels that you may want to look at. IIRC, they need to be enabled, but provide some better handling for startup delays specifically for these kinds of issues. However, if your failure is a hard failure (and not just due to things taking longer) raising an exception is what is expected. (This was true even prior to provisioners, albeit the "provisioner" was essentially the Popen call.)

You might also take a look at the Gateway Provisioner classes. It provides a RemoteProvisionerBase class, as well as a ContainerProvisionerBase (if you're working with containers), that can be subclassed while providing most of the necessary infrastructure. If you have questions about these, please open an issue in that repo for further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants