feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

hiro-v · 2023-12-04T01:05:34Z

Problem
I have an existing NVIDIA triton inference server with TensorRT-LLM as backend. I want to use that model within Jan

Success Criteria

Inference engine for nvidia Triton inference server - nvidia-inference-engine-trt-llm/engine.json
Setup script and docs for setting up both and connection
model.json for llama2-7b

Additional context

The text was updated successfully, but these errors were encountered:

hiro-v · 2023-12-04T01:10:43Z

Diagram illustrating Jan integration with NVIDIA Inference cluster

hiro-v · 2023-12-04T01:19:36Z

Setup and benchmark script for NVIDIA Inference cluster with TensorRT backend (model: meta-llama/llama2-7b): https://github.com/hamelsmu/llama-inference/tree/master/triton-tensorRT-quantized-awq-batch

hiro-v added P1: important Important feature / fix type: feature request A new feature engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model labels Dec 4, 2023

hiro-v added this to the v0.4.0 milestone Dec 4, 2023

hiro-v self-assigned this Dec 4, 2023

hiro-v mentioned this issue Dec 7, 2023

feat: Add NVIDIA triton trt-llm extension #888

Merged

dan-jan modified the milestones: 0.4.0, 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023

hiro-v closed this as completed Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

hiro-v commented Dec 4, 2023

hiro-v commented Dec 4, 2023 •

edited

Loading

hiro-v commented Dec 4, 2023

feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

Comments

hiro-v commented Dec 4, 2023

hiro-v commented Dec 4, 2023 • edited Loading

hiro-v commented Dec 4, 2023

hiro-v commented Dec 4, 2023 •

edited

Loading