The easiest, smallest and fastest local LLM runtime and API server.

bash <(curl -sSfL 'https://code.flows.network/webhook/iwYN1SdN3AmPgR5ao5Gt/run-llm.sh')

Get Started

Lightweight

Total dependency of LlamaEdge is 30MB vs 5GB for Python

Very fast

Automagically use the device’s local hardware and software acceleration.

Cross-platform LLM agents and web services in Rust or JavaScript

Write once run anywhere, for GPUs
‍
Create an LLM web service on a MacBook, deploy it on a Nvidia device.

Native to the heterogeneous edge
‍
Orchestrate and move an LLM app across CPUs, GPUs and NPUs.

FAQs

Learn more about LlamaEdge

Q: Why can’t I just use the OpenAI API?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.

Q: Why can’t I just start an OpenAI-compatible API server myself?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.

Q: Why can’t I use Python to run the LLM?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.

Q: Why can’t I just use native (C/C compiled) inference engines?

A: OpenAI is great! However, when it comes to privacy, data security, and cost, self host open source LLMs are better.