llama-server with cpu device is not working in docker image #2634

b-reich · 2024-07-13T12:55:07Z

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    entrypoint: /opt/tabby/bin/tabby-cpu
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct
    volumes:
      - ".data/tabby:/data"
    ports:
      - 8080:8080

which is document here https://tabby.tabbyml.com/docs/quick-start/installation/docker-compose/ wont work

tabby-1  | 2024-07-13T12:53:36.624504Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code 127
tabby-1  | 2024-07-13T12:53:36.624528Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: /opt/tabby/bin/llama-server: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

Originally posted by @b-reich in #2082 (comment)

The text was updated successfully, but these errors were encountered:

wsxiaoys · 2024-07-13T16:44:31Z

Hi, thanks for reporting the issue. For a workaround, I recommend use the linux binary distribution directly: https://tabby.tabbyml.com/docs/quick-start/installation/linux/#download-the-release

kannae97 · 2024-07-22T13:50:49Z

I also encountered the same error😭

0x4139 · 2024-07-23T08:04:30Z

The issue seems to be related to the llama-server, the LD_LIBRARY_PATH should be updated to something like /usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH also the cuda version should be updated to 12.5 @wsxiaoys do you want me to submit a PR?

updates cuda and documentation regarding running tabby inside docker containers with cuda support

0x4139 · 2024-07-23T11:33:53Z

Submitted the pull request #2711 . In the meanwhile you can use my temporary image 0x4139/tabby-cuda (cuda 12.2) or tabbyml/tabby (cuda 11.7) with the LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH environment path

If you're using docker compose you can use the following snippet:

version: '3.8'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
    volumes:
      - "$HOME/.tabby:/data"
    ports:
      - 8080:8080
    environment:
      - PATH=/usr/local/cuda/bin:$PATH
      - LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

If you're using docker you can use the following snippet:

docker run -it --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  -e PATH=/usr/local/cuda/bin:$PATH \
  -e LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH \
  tabbyml/tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda

@b-reich @kannae97 This should solve your issue.

b-reich · 2024-07-25T07:39:23Z

@0x4139 nope this is not my issue. I want to run it without a GPU. Just CPU Mode.

dtrckd · 2024-07-28T15:25:55Z

Same error here.

wsxiaoys · 2024-07-29T01:50:20Z

For those experiencing the issue, please refer to the comment at #2634 (comment) to see if it resolves the problem for you. If it doesn't, feel free to share your experiences. Thank you!

0x4139 · 2024-07-29T13:26:36Z

@0x4139 nope this is not my issue. I want to run it without a GPU. Just CPU Mode.

The issues are related, the binary won't start even in cpu mode due to the fact that the cuda libraries are not linked. Just tested it now, and it works also in cpu mode.

b-reich · 2024-07-30T09:53:58Z

@0x4139 ur docker command and compose uses different images.

0x4139 · 2024-07-30T09:55:53Z

@0x4139 ur docker command and compose uses different images

That is the point, i mentioned that i created a temporary image with the LD path fix, that works both on CPU and GPU. If the image works for you as well, probably @wsxiaoys will merge the fix.

disce-omnes · 2024-08-02T12:49:25Z

I'm experiencing similar issue, but for me Docker image works fine, it's Linux release that doesn't work.

Error:

⠼     1.124 s	Starting...2024-08-02T12:30:38.407959Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code 127
2024-08-02T12:30:38.408050Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: /path/to/tabby/llama-server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory

I'm using command: ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda

Adding env as suggested #2634 (comment) doesn't help: LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda

EndeavourOS, tabby 0.14.0 with NVIDIA GeForce RTX 2060 and CUDA 12.5

P.S. Is it fine to discuss it here or should I open new issue?

0x4139 · 2024-08-05T17:31:12Z

I'm experiencing similar issue, but for me Docker image works fine, it's Linux release that doesn't work.

Error:
⠼     1.124 s	Starting...2024-08-02T12:30:38.407959Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code 127
2024-08-02T12:30:38.408050Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: /path/to/tabby/llama-server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
I'm using command: ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda

Adding env as suggested #2634 (comment) doesn't help: LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/compat:$LD_LIBRARY_PATH ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda

EndeavourOS, tabby 0.14.0 with NVIDIA GeForce RTX 2060 and CUDA 12.5

P.S. Is it fine to discuss it here or should I open new issue?

Be sure you have installed cuda-development-toolkit for your linux distribution.

disce-omnes · 2024-08-08T18:37:08Z

Be sure you have installed cuda-development-toolkit for your linux distribution.

Thank that fixed it. Now I'm getting:

⠴     2.006 s	Starting...2024-08-08T18:29:55.586365Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1

Seems similar to this #2803

0x4139 · 2024-08-09T10:36:30Z

Be sure you have installed cuda-development-toolkit for your linux distribution.

Thank that fixed it. Now I'm getting:
⠴     2.006 s	Starting...2024-08-08T18:29:55.586365Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
Seems similar to this #2803

Could you provide a broader view of the logs, as well as your tabby configuration?

disce-omnes · 2024-08-09T20:26:37Z

Could you provide a broader view of the logs, as well as your tabby configuration?

I'm on EndeavourOS and I've downloaded https://github.com/TabbyML/tabby/releases/tag/v0.14.0 / tabby_x86_64-manylinux2014-cuda122

Command ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda produces output:

⠇     2.257 s	Starting...2024-08-09T20:00:30.651731Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠼     4.340 s	Starting...2024-08-09T20:00:32.745170Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠙     6.502 s	Starting...2024-08-09T20:00:34.864970Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠧     8.584 s	Starting...2024-08-09T20:00:36.966576Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠸    10.666 s	Starting...2024-08-09T20:00:39.041895Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠙    12.108 s	Starting...^C

It just goes on forever.

Here's nvidia-smi output:

 ----------------------------------------------------------------------------------------- 
| NVIDIA-SMI 555.58.02              Driver Version: 555.58.02      CUDA Version: 12.5     |
|----------------------------------------- ------------------------ ---------------------- 
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|========================================= ======================== ======================|
|   0  NVIDIA GeForce RTX 2060        Off |   00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8              5W /   90W |       7MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
 ----------------------------------------- ------------------------ ---------------------- 
                                                                                         
 ----------------------------------------------------------------------------------------- 
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1257      G   /usr/lib/Xorg                                   4MiB |
 -----------------------------------------------------------------------------------------

Where can I find tabby configuration so I can provide it? I looked at ~/.tabby, but didn't see much there.

0x4139 · 2024-08-10T07:56:09Z

Could you provide a broader view of the logs, as well as your tabby configuration?

I'm on EndeavourOS and I've downloaded https://github.com/TabbyML/tabby/releases/tag/v0.14.0 / tabby_x86_64-manylinux2014-cuda122

Command ./tabby serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda produces output:

⠇     2.257 s	Starting...2024-08-09T20:00:30.651731Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠼     4.340 s	Starting...2024-08-09T20:00:32.745170Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠙     6.502 s	Starting...2024-08-09T20:00:34.864970Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠧     8.584 s	Starting...2024-08-09T20:00:36.966576Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠸    10.666 s	Starting...2024-08-09T20:00:39.041895Z  WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:99: llama-server <embedding> exited with status code -1
⠙    12.108 s	Starting...^C

It just goes on forever.

Here's nvidia-smi output:

 ----------------------------------------------------------------------------------------- 
| NVIDIA-SMI 555.58.02              Driver Version: 555.58.02      CUDA Version: 12.5     |
|----------------------------------------- ------------------------ ---------------------- 
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|========================================= ======================== ======================|
|   0  NVIDIA GeForce RTX 2060        Off |   00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8              5W /   90W |       7MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
 ----------------------------------------- ------------------------ ---------------------- 
                                                                                         
 ----------------------------------------------------------------------------------------- 
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1257      G   /usr/lib/Xorg                                   4MiB |
 -----------------------------------------------------------------------------------------

Where can I find tabby configuration so I can provide it? I looked at ~/.tabby, but didn't see much there.

Seems to be related to some flags passed to the llama-server embedding process, for the time being i think you can revert to this version https://github.com/TabbyML/tabby/releases/download/v0.13.1/tabby_x86_64-manylinux2014-cuda122.zip, it should fix your issue.

zwpaper · 2024-08-12T10:51:42Z

tabby build the image with cuda by default,

tabby/docker/Dockerfile.cuda

Line 42 in 48d9c08

 cargo build --no-default-features --features cuda,prod --release --package tabby && \ 

that's why llama-cpp-server looks for the libcuda, and failed to start if no GPU existed.

the libcuda is mounted at runtime by nvidia-container-runtime.

maybe we need a cpu Dockerfile to build the CPU image? it can also largely reduce the image size without the cuda dependencies.

WDYT @wsxiaoys

disce-omnes · 2024-08-12T14:26:48Z

for the time being i think you can revert to this version

Thank you that fixed the problem.

v15 was released but I get the same error on it. Is there any issue/pr that related to it? So I can monitor when it's safe to upgrade.

0x4139 · 2024-08-13T19:25:11Z

There is a merge request here #2711

wsxiaoys added the bug Something isn't working label Jul 13, 2024

wsxiaoys changed the title ~~tabby-cpu docker wont work~~ llama-server with cpu device is not working in docker image Jul 13, 2024

0x4139 added a commit to 0x4139/tabby that referenced this issue Jul 23, 2024

fix: 🐛 fix docker cuda path (TabbyML#2634)

15cb8c1

updates cuda and documentation regarding running tabby inside docker containers with cuda support

0x4139 mentioned this issue Jul 23, 2024

fix: 🐛 fix docker cuda path (#2634) #2711

Closed

ymettier mentioned this issue Aug 12, 2024

I tried the docker image without GPU #2082

Closed

TabbyML locked and limited conversation to collaborators Aug 14, 2024

wsxiaoys converted this issue into discussion #2867 Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

llama-server with cpu device is not working in docker image #2634

llama-server with cpu device is not working in docker image #2634

b-reich commented Jul 13, 2024

wsxiaoys commented Jul 13, 2024

kannae97 commented Jul 22, 2024

0x4139 commented Jul 23, 2024 •

edited

Loading

0x4139 commented Jul 23, 2024 •

edited

Loading

b-reich commented Jul 25, 2024

dtrckd commented Jul 28, 2024

wsxiaoys commented Jul 29, 2024

0x4139 commented Jul 29, 2024

b-reich commented Jul 30, 2024 •

edited

Loading

0x4139 commented Jul 30, 2024

disce-omnes commented Aug 2, 2024

0x4139 commented Aug 5, 2024

disce-omnes commented Aug 8, 2024

0x4139 commented Aug 9, 2024

disce-omnes commented Aug 9, 2024

0x4139 commented Aug 10, 2024

zwpaper commented Aug 12, 2024

disce-omnes commented Aug 12, 2024

0x4139 commented Aug 13, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

llama-server with cpu device is not working in docker image #2634

llama-server with cpu device is not working in docker image #2634

Comments

b-reich commented Jul 13, 2024

wsxiaoys commented Jul 13, 2024

kannae97 commented Jul 22, 2024

0x4139 commented Jul 23, 2024 • edited Loading

0x4139 commented Jul 23, 2024 • edited Loading

b-reich commented Jul 25, 2024

dtrckd commented Jul 28, 2024

wsxiaoys commented Jul 29, 2024

0x4139 commented Jul 29, 2024

b-reich commented Jul 30, 2024 • edited Loading

0x4139 commented Jul 30, 2024

disce-omnes commented Aug 2, 2024

0x4139 commented Aug 5, 2024

disce-omnes commented Aug 8, 2024

0x4139 commented Aug 9, 2024

disce-omnes commented Aug 9, 2024

0x4139 commented Aug 10, 2024

zwpaper commented Aug 12, 2024

disce-omnes commented Aug 12, 2024

0x4139 commented Aug 13, 2024

This issue was moved to a discussion.

0x4139 commented Jul 23, 2024 •

edited

Loading

0x4139 commented Jul 23, 2024 •

edited

Loading

b-reich commented Jul 30, 2024 •

edited

Loading