Note: Before reading the following content, please make sure that you are working in an environment of Ubuntu 20.04/22.04 and have installed the following necessary dependencies:
- Rust-stable (>= 1.69.0)
- Add
wasm32-wasi
target to Rust toolchain by runningrustup target add wasm32-wasi
in the terminal- WasmEdge 0.13.4 (Installation)
- WasmEdge TLS plugin (Installation)
Before building the wasm app, you may set the socket address of the web API server in config.yml
. The default socket address is:
socket_address:
ip_address: "0.0.0.0"
port: "8080"
Please guarantee that the port is not occupied by other processes. Now let's build and run the web API server:
-
First, build the
llm-web-api-server
wasm app:git clone https://github.com/apepkuss/llm-web-api-server.git cd llm-web-api-server // build the wasm app cargo build --target wasm32-wasi --release
If the commands are successful, you should find the wasm app in
target/wasm32-wasi/release/llm-web-api-server.wasm
. -
Second, to maximize the performance of the wasm app, use
WasmEdge AOT Compiler
to compile the wasm app to native code:wasmedge compile target/wasm32-wasi/release/llm-web-api-server.wasm llm-web-api-server.so
If the command is successful, you should find
llm-web-api-server.so
in the root directory. -
Finally, run the wasm app, namely starting the web API server:
wasmedge run --dir .:. llm-web-api-server.so
if the
8080
port is available on your machine and the command is successful, you should see the following output in the terminal:Listening on http://0.0.0.0:8080
Note that the command above is only used for testing. In production, you need to specify which LLM model will be used. For example, if you want to use the
llm-model
in thellm-models
directory, you can run the following command:
llm-web-api-server
provides a POST API /echo
for testing. You can use curl
to test it:
curl -X POST http://localhost:8080/echo
If the command is successful, you should see the following output in the terminal:
echo test
-
Download the Llama model of gguf format
curl -LO https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf
-
Start the web API server
wasmedge --dir .:. --nn-preload default:GGML:CPU:llama-2-7b-chat.Q5_K_M.gguf llm-web-api-server.so default
-
Download
wasmedge-web-api-client
and run the clientgit clone https://github.com/second-state/wasmedge-web-api-client.git
Note that you need to update the
URL_CHAT_COMPLETIONS
static variable in themain.rs
file to the address of the web API server.cd wasmedge-web-api-client // build and run the client cargo run
You will see the following output in the terminal if the command runs successfully:
Enter some text (or press Ctrl Q to exit): [Question]:
Now you can enter your question and wait for the answe. For example:
Enter some text (or press Ctrl Q to exit): [Question]: what is the capital of France? [answer] The capital of France is Paris. [Question]: what about Norway? [answer] The capital of Norway is Oslo. [Question]: I have two apples, each costing 5 dollars. What is the total cost of these apples? [answer] The total cost of the two apples is 10 dollars. [Question]: What if I have 3 apples? [answer] If you have 3 apples, each costing 5 dollars, the total cost of the apples is 15 dollars.