fastertransformer

Star

Here are 5 public repositories matching this topic...

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Dec 26, 2024
Python

Curt-Park / serving-codegen-gptj-triton

Star

Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes

docker kubernetes pytorch codegen triton-inference-server huggingface-transformers fastertransformer

Updated May 30, 2023
Python

detail-novelist / novelist-triton-server

Star

Deploy KoGPT with Triton Inference Server

transformers triton huggingface triton-inference-server kogpt gptj large-language-models fastertransformer

Updated Nov 18, 2022
Shell

clam004 / triton-ft-api

Star

tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server

nvidia gpt nvidia-docker nvidia-gpu fastapi huggingface fastertransformer

Updated Dec 3, 2022
Python

RajeshThallam / fastertransformer-converter

Star

This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.

inference gke googlecloudplatform large-scale-machine-learning triton-inference-server llm fastertransformer

Updated Apr 5, 2023
Python

Improve this page

Add a description, image, and links to the fastertransformer topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fastertransformer topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastertransformer

Here are 5 public repositories matching this topic...

InternLM / lmdeploy

Curt-Park / serving-codegen-gptj-triton

detail-novelist / novelist-triton-server

clam004 / triton-ft-api

RajeshThallam / fastertransformer-converter

Improve this page

Add this topic to your repo