A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
meta
cpu
optimization
chatbot
intel
llama
numa
int8
ipex
4-bit-cpu
huggingface
streamlit
bfloat16
neural-compression
chatgpt
langchain
llama2
meta-ai
smooth-quantization
chatbot-memory
-
Updated
Feb 27, 2024 - Python