no difference speed between fp8 and original fp16 #11

leiwen83 · 2024-05-13T10:08:16Z

leiwen83
May 13, 2024

Hi,

I am using https://huggingface.co/Qwen/Qwen1.5-7B-Chat for the testing.

The two different clam image is created by:

python  tools/convert.py --dtype fp8 qwen1.5-7B_fp8.clam  Qwen1.5-7B-Chat
python  tools/convert.py qwen1.5-7B.clam  Qwen1.5-7B-Chat

With command as:

./build/run qwen1.5-7B_fp8.clam -n -1 -i "<|im_start|>user\nwho are you<|im_end|>\n<|im_start|>assistant" -t 0
# qwen1.5-7B_fp8.clam: 7.7B params (7.2 GiB @ 8.00 bpw), 4096 context (kvcache 2.0 GiB @ fp16)
# CUDA: NVIDIA GeForce RTX 4090, compute 8.9, 128 SMs, 23.6 GiB, peak bandwidth 1008 GB/s (ECC 0)
<|im_start|>user\nwho are you<|im_end|>\n<|im_start|>assistantI am a large language model created by Alibaba Cloud, known as Qwen. I am here to assist you with your questions and provide information to the best of my ability. How can I help you today?
# 51 tokens: throughput: 122.89 tok/s; latency: 8.14 ms/tok; bandwidth: 891.50 GB/s; total 0.415 sec

qwen1.5-7B.clam: 7.7B params (7.2 GiB @ 8.00 bpw), 4096 context (kvcache 2.0 GiB @ fp16)

CUDA: NVIDIA GeForce RTX 4090, compute 8.9, 128 SMs, 23.6 GiB, peak bandwidth 1008 GB/s (ECC 0)

51 tokens: throughput: 123.49 tok/s; latency: 8.10 ms/tok; bandwidth: 895.82 GB/s; total 0.413 sec```


So it seems to there is no difference in fp8 conversion?

leiwen83 · 2024-05-13T14:42:07Z

leiwen83
May 13, 2024
Author

I realize the default datatype is fp8...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

no difference speed between fp8 and original fp16 #11

{{title}}

Replies: 1 comment

{{title}}

Select a reply

no difference speed between fp8 and original fp16 #11

leiwen83 May 13, 2024

qwen1.5-7B.clam: 7.7B params (7.2 GiB @ 8.00 bpw), 4096 context (kvcache 2.0 GiB @ fp16)

CUDA: NVIDIA GeForce RTX 4090, compute 8.9, 128 SMs, 23.6 GiB, peak bandwidth 1008 GB/s (ECC 0)

51 tokens: throughput: 123.49 tok/s; latency: 8.10 ms/tok; bandwidth: 895.82 GB/s; total 0.413 sec```

Replies: 1 comment

leiwen83 May 13, 2024 Author

leiwen83
May 13, 2024

leiwen83
May 13, 2024
Author