Replies: 1 comment
-
I realize the default datatype is fp8... |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I am using https://huggingface.co/Qwen/Qwen1.5-7B-Chat for the testing.
The two different clam image is created by:
With command as:
./build/run qwen1.5-7B.clam -n -1 -i "<|im_start|>user\nwho are you<|im_end|>\n<|im_start|>assistant" -t 0
qwen1.5-7B.clam: 7.7B params (7.2 GiB @ 8.00 bpw), 4096 context (kvcache 2.0 GiB @ fp16)
CUDA: NVIDIA GeForce RTX 4090, compute 8.9, 128 SMs, 23.6 GiB, peak bandwidth 1008 GB/s (ECC 0)
<|im_start|>user\nwho are you<|im_end|>\n<|im_start|>assistantI am a large language model created by Alibaba Cloud, known as Qwen. I am here to assist you with your questions and provide information to the best of my ability. How can I help you today?
51 tokens: throughput: 123.49 tok/s; latency: 8.10 ms/tok; bandwidth: 895.82 GB/s; total 0.413 sec```
Beta Was this translation helpful? Give feedback.
All reactions