Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image tagged :ollama does not use GPU/CUDA #3325

Closed
3 of 4 tasks
gnieutin opened this issue Jun 20, 2024 · 6 comments
Closed
3 of 4 tasks

Docker image tagged :ollama does not use GPU/CUDA #3325

gnieutin opened this issue Jun 20, 2024 · 6 comments

Comments

@gnieutin
Copy link

Bug Report

Description

Bug Summary:
Documentation says the following command will run a docker container bundled with ollama and use CUDA but this is not the case : despite nvidia-smi is available inside the container (I follow nvidia instructions), my GPU is not used and the model run using my CPU.

docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Steps to Reproduce:
Follow your documentation here and use the With GPU Support: command line

Expected Behavior:
When I use the chat in Open Webui interface I expect my GPU to be used while the model is processing.

Actual Behavior:
My CPUs are fully used for some times but my GPU usage statys near 0%.

Environment

  • Open WebUI Version: v0.3.5

  • Ollama (if applicable): bundled version inside open webui provided container

  • Operating System: Linux Mint 20.1

  • Browser (if applicable): Chrome 125

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:

Loading WEBUI_SECRET_KEY from file, not provided as an environment variable.
Loading WEBUI_SECRET_KEY from .webui_secret_key
USE_OLLAMA is set to true, starting ollama serve.
2024/06/20 12:53:31 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-20T12:53:31.803Z level=INFO source=images.go:740 msg="total blobs: 5"
time=2024-06-20T12:53:31.803Z level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-20T12:53:31.804Z level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.43)"
time=2024-06-20T12:53:31.855Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1794248913/runners
time=2024-06-20T12:53:34.966Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]"
time=2024-06-20T12:53:34.989Z level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="15.5 GiB" available="874.7 MiB"
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL C to quit)
/app

  ___                    __        __   _     _   _ ___ 
 / _ \ _ __   ___ _ __   \ \      / /__| |__ | | | |_ _|
| | | | '_ \ / _ \ '_ \   \ \ /\ / / _ \ '_ \| | | || | 
| |_| | |_) |  __/ | | |   \ V  V /  __/ |_) | |_| || | 
 \___/| .__/ \___|_| |_|    \_/\_/ \___|_.__/ \___/|___|
      |_|                                               

      
v0.3.5 - building the best open-source AI user interface.

https://github.com/open-webui/open-webui

Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Installation Method

Docker image bundled with ollama : docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Additional Information

When running env command inside the container, I see some variables having unexpected values (imo) :

$> env
....
USE_CUDA_DOCKER=false
USE_CUDA_DOCKER_VER=cu121
....

I tried updating those values by setting my cuda version and true to USE_CUDA_DOCKER but doing so make the start.sh script to fail :

...
AssertionError: Torch not compiled with CUDA enabled

I suspect your docker image tagged with :ollama cannot use CUDA/GPU 🤔

How can I use the ollama bundled version AND CUDA ?

@justinh-rahb
Copy link
Collaborator

Do you have the Nvidia Container Toolkit installed? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

@gnieutin
Copy link
Author

gnieutin commented Jun 20, 2024

Yes I followed those instructions, I also followed https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html to test my setup so I ran this command:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

And got the following result :
image

@justinh-rahb
Copy link
Collaborator

Ahh, try updating your Nvidia driver and CUDA runtime on the host, Ollama needs CUDA 12.x and a 5xx series driver afaik. Which GPU is it?

@gnieutin
Copy link
Author

Which GPU is it?

NVIDIA GeForce RTX 3060 Laptop GPU

@justinh-rahb
Copy link
Collaborator

We've seen some driver issues with 55x, try a 54x driver and CUDA 12.4, I think this should resolve it.

@gnieutin
Copy link
Author

You were right, it was a nvidia driver issue !
I managed to get to the latest version available for me (535) which includes CUDA 12.2 and it is now using far less CPU ! The model response is also printed faster and I see the GPU usage growing

Thanks 👍 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants