Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add inference engine - NVIDIA triton inference server and TRT-LLM #821

Closed
hiro-v opened this issue Dec 4, 2023 · 2 comments
Closed
Assignees
Labels
engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model P1: important Important feature / fix type: feature request A new feature

Comments

@hiro-v
Copy link
Contributor

hiro-v commented Dec 4, 2023

Problem
I have an existing NVIDIA triton inference server with TensorRT-LLM as backend. I want to use that model within Jan

Success Criteria

  • Inference engine for nvidia Triton inference server - nvidia-inference-engine-trt-llm/engine.json
  • Setup script and docs for setting up both and connection
  • model.json for llama2-7b

Additional context

@hiro-v hiro-v added P1: important Important feature / fix type: feature request A new feature engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model labels Dec 4, 2023
@hiro-v hiro-v added this to the v0.4.0 milestone Dec 4, 2023
@hiro-v hiro-v self-assigned this Dec 4, 2023
@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 4, 2023

Diagram illustrating Jan integration with NVIDIA Inference cluster
Image

@hiro-v
Copy link
Contributor Author

hiro-v commented Dec 4, 2023

Setup and benchmark script for NVIDIA Inference cluster with TensorRT backend (model: meta-llama/llama2-7b): https://github.com/hamelsmu/llama-inference/tree/master/triton-tensorRT-quantized-awq-batch

@dan-jan dan-jan modified the milestones: 0.4.0, 0.4.1, 0.4.2, API Endpoint at localhost:1337, Jan supports multiple Inference Engines Dec 11, 2023
@hiro-v hiro-v closed this as completed Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engineering: Jan Inference Layer Jan can serve models locally: with correct data structs, APIs, multi-inference engines, multi-model P1: important Important feature / fix type: feature request A new feature
Projects
Archived in project
Development

No branches or pull requests

2 participants