A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Dec 24, 2024 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
Finetune an LLM to generate SQL from text on Intel GPUs (XPUs) using QLoRA
Purplecoin/XPU Core integration/staging tree
A Cloud-native Bare-metal Networking Orchestration
Add a description, image, and links to the xpu topic page so that developers can more easily learn about it.
To associate your repository with the xpu topic, visit your repo's landing page and select "manage topics."