vllm-project / vllm Sponsor Star 24.4k Code Issues Pull requests Discussions A high-throughput and memory-efficient inference and serving engine for LLMs amd cuda inference pytorch transformer llama gpt rocm model-serving tpu mlops xpu llm inferentia llmops llm-serving trainium Updated Aug 8, 2024 Python
aws-samples / foundation-model-benchmarking-tool Star 160 Code Issues Pull requests Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options. benchmarking benchmark p5 bedrock sagemaker p4d g5 foundation-models inferentia generative-ai llama2 trainium llama3 Updated Aug 7, 2024 Jupyter Notebook