Offline, Open-Source RAG
Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.
- Offline Embeddings & LLMs Support (No OpenAI!)
- Streaming Responses
- Conversation Memory
- Chat Export
- A pre-existing Ollama instance
- Python 3.10
Local:
pip install pipenv && pipenv install
pipenv shell && streamlit run main.py
Docker:
docker compose up -d
- Set your Ollama endpoint and model under Settings
- Upload your documents for processing
- Once complete, ask questions based on your documents!
- Refactor into modules
- Refactor file processing logic
- Migrate Chat Stream to Llama-Index
- Implement Llama-Index Chat Engine with Memory
- Swap to Llama-Index Chat Engine
- Function to Handle File Embeddings
- Allow Users to Set LLM Settings
- System Prompt
- Chat Mode
- top_k
- chunk_size
- chunk_overlap
- Allow Switching of Embedding Model & Settings
- Delete Files after Index Created/Failed
- Support Additional Import Options
- GitHub Repos
- Websites
- Remove File Type Limitations for Uploads
- Show Loaders in UI (File Uploads, Conversions, ...)
- Export Data (Uploaded Files, Chat History, ...)
- View and Manage Imported Files
- About Tab in Sidebar
- Docker Support
- Implement Log Library
- Improve Logging
- Re-write Docstrings
- Additional Error Handling
- Starting a chat without an Ollama model set
- Incorrect GitHub repos
- Refreshing the page loses all state (expected Streamlit behavior; need to implement local-storage)
- Files can be uploaded before Ollama config is set, leading to embedding errors
- Assuming Ollama is hosted on localhost, Models are automatically loaded and selected, but the dropdown does not render the selected option
- Upon sending a Chat message, the File Processing expander appears to re-run itself (seems something is not using state correctly)