Docker image tagged `:ollama` does not use GPU/CUDA #3325

gnieutin · 2024-06-20T13:02:50Z

Bug Report

Description

Bug Summary:
Documentation says the following command will run a docker container bundled with ollama and use CUDA but this is not the case : despite nvidia-smi is available inside the container (I follow nvidia instructions), my GPU is not used and the model run using my CPU.

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Steps to Reproduce:
Follow your documentation here and use the With GPU Support: command line

Expected Behavior:
When I use the chat in Open Webui interface I expect my GPU to be used while the model is processing.

Actual Behavior:
My CPUs are fully used for some times but my GPU usage statys near 0%.

Environment

Open WebUI Version: v0.3.5
Ollama (if applicable): bundled version inside open webui provided container
Operating System: Linux Mint 20.1
Browser (if applicable): Chrome 125

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:

Loading WEBUI_SECRET_KEY from file, not provided as an environment variable.
Loading WEBUI_SECRET_KEY from .webui_secret_key
USE_OLLAMA is set to true, starting ollama serve.
2024/06/20 12:53:31 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-20T12:53:31.803Z level=INFO source=images.go:740 msg="total blobs: 5"
time=2024-06-20T12:53:31.803Z level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-20T12:53:31.804Z level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.43)"
time=2024-06-20T12:53:31.855Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1794248913/runners
time=2024-06-20T12:53:34.966Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]"
time=2024-06-20T12:53:34.989Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="15.5 GiB" available="874.7 MiB"
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL C to quit)
/app

  ___                    __        __   _     _   _ ___ 
 / _ \ _ __   ___ _ __   \ \      / /__| |__ | | | |_ _|
| | | | '_ \ / _ \ '_ \   \ \ /\ / / _ \ '_ \| | | || | 
| |_| | |_) |  __/ | | |   \ V  V /  __/ |_) | |_| || | 
 \___/| .__/ \___|_| |_|    \_/\_/ \___|_.__/ \___/|___|
      |_|                                               

      
v0.3.5 - building the best open-source AI user interface.

https://github.com/open-webui/open-webui

Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Installation Method

Docker image bundled with ollama : docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Additional Information

When running env command inside the container, I see some variables having unexpected values (imo) :

$> env
....
USE_CUDA_DOCKER=false
USE_CUDA_DOCKER_VER=cu121
....

I tried updating those values by setting my cuda version and true to USE_CUDA_DOCKER but doing so make the start.sh script to fail :

...
AssertionError: Torch not compiled with CUDA enabled

I suspect your docker image tagged with :ollama cannot use CUDA/GPU 🤔

How can I use the ollama bundled version AND CUDA ?

The text was updated successfully, but these errors were encountered:

justinh-rahb · 2024-06-20T13:05:56Z

Do you have the Nvidia Container Toolkit installed? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

gnieutin · 2024-06-20T13:19:31Z

Yes I followed those instructions, I also followed https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html to test my setup so I ran this command:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

And got the following result :

justinh-rahb · 2024-06-20T13:26:32Z

Ahh, try updating your Nvidia driver and CUDA runtime on the host, Ollama needs CUDA 12.x and a 5xx series driver afaik. Which GPU is it?

gnieutin · 2024-06-20T13:29:18Z

Which GPU is it?

NVIDIA GeForce RTX 3060 Laptop GPU

justinh-rahb · 2024-06-20T13:35:38Z

We've seen some driver issues with 55x, try a 54x driver and CUDA 12.4, I think this should resolve it.

gnieutin · 2024-06-20T16:55:29Z

You were right, it was a nvidia driver issue !
I managed to get to the latest version available for me (535) which includes CUDA 12.2 and it is now using far less CPU ! The model response is also printed faster and I see the GPU usage growing

Thanks 👍 !

gnieutin closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker image tagged `:ollama` does not use GPU/CUDA #3325

Docker image tagged `:ollama` does not use GPU/CUDA #3325

gnieutin commented Jun 20, 2024

justinh-rahb commented Jun 20, 2024

gnieutin commented Jun 20, 2024 •

edited

Loading

justinh-rahb commented Jun 20, 2024

gnieutin commented Jun 20, 2024

justinh-rahb commented Jun 20, 2024

gnieutin commented Jun 20, 2024

Docker image tagged :ollama does not use GPU/CUDA #3325

Docker image tagged :ollama does not use GPU/CUDA #3325

Comments

gnieutin commented Jun 20, 2024

Bug Report

Description

Environment

Reproduction Details

Logs and Screenshots

Installation Method

Additional Information

justinh-rahb commented Jun 20, 2024

gnieutin commented Jun 20, 2024 • edited Loading

justinh-rahb commented Jun 20, 2024

gnieutin commented Jun 20, 2024

justinh-rahb commented Jun 20, 2024

gnieutin commented Jun 20, 2024

Docker image tagged `:ollama` does not use GPU/CUDA #3325

Docker image tagged `:ollama` does not use GPU/CUDA #3325

gnieutin commented Jun 20, 2024 •

edited

Loading