Skip to content

An NVIDIA AI Workbench example project for an Agentic Retrieval Augmented Generation (RAG)

License

Notifications You must be signed in to change notification settings

NVIDIA/workbench-example-agentic-rag

Repository files navigation

Table of Contents

NVIDIA AI Workbench: Introduction

⬇️ Download AI Workbench📖 Read the Docs📂 Explore Example Projects🚨 Facing Issues? Let Us Know!

Project Description

This is an NVIDIA AI Workbench project for developing a websearch-based Retrieval Augmented Generation application with a customizable Gradio Chat app. It lets you:

  • Embed your documents in the form of webpages or PDFs into a locally running Chroma vector database.
  • Run inference using remotely running endpoints and microservices.

This project uses an agentic workflow depicted in the above diagram to improve response quality in RAG. Using LangGraph, user queries will first be sorted under a RAG or Websearch pipeline depending on an LLM evaluation of the query topic. Depending on its user-configurable prompt, the router LLM can narrow its focus on turning a specific subject or topic routable to the RAG Pipeline.

Expand this section for a description of RAG Pipeline.

Under the retrieval pipeline, the user query is first compared to documents in the vector database and the most relevant documents are retrieved.

Another LLM call evaluates the quality of the documents. If satisfactory, it proceeds to the generation phase to produce an response augmented by this relevant context. If the agent decides the best documents are irrelevant to the query, it redirects the user query to the websearch pipeline for a better quality response (see below section).

After generation, another set of LLMs calls evaluate the response for hallucinations and accuracy. If the generation is both faithful to the retrieved context and answers the user's query in a satisfactory manner, the response is forwarded to the user and displayed. Otherwise, the agent will either regenerate the response, or redirect the query to a web search.

Expand this section for a description of Websearch Pipeline.

Under the web search pipeline, the user query is inputted onto the web and the search results are retrieved. Using these results, a response is generated.

After generation, a set of LLMs calls evaluate the response for hallucinations and accuracy. If the generation is both faithful to the retrieved context and answers the user's query in a satisfactory manner, the response is forwarded to the user and displayed. Otherwise, the agent will either regenerate the response, or redirect the query to another web search.

📝 Remember
This project is meant as an example workflow and a starting point; you are free to add new models, rearrange the interface, or edit the source code as you see fit for your particular use case!

Project Deep Dive

Expand this section for a full guide of the user-configurable project settings

When the user lands on the Chat UI application in the browser, they will see several components. On the left hand side is a standard chatbot user interface with a user input for queries (submittable with [ENTER]) and a clear history button. Above this chatbot is a diagram of the agentic RAG pipeline which doubles as a progress bar indicator for any nontrivial user actions a user might take, like uploading a document.

On the right hand side, users will see a collapsable settings panel with several tabs they may choose to navigate to and configure.

Expand for Model Settings.

This tab holds every user-configurable setting for each of the LLM components of the agentic RAG pipeline:

  • Router
  • Retrieval Grader
  • Generator
  • Hallucination Grader
  • Answer Grader

Expanding any such entry will yield a panel where users can specify the model they would like to use for that particular component from a dropdown (using NVIDIA API Catalog endpoints), or they can specify their own remotely running self-hosted NVIDIA NIM custom endpoint.

Below this field is an expandable accordion where users can adjust the default prompts for that particular component's task. For example, under the Router component, users can re-write and customize their prompt to focus on only routing queries relating to LLMs and agents to the RAG pipeline and directing all other queries to the Websearch pipeline.

Expand for Document Settings.

This tab holds every user-configurable setting for the vector database and document ingestion aspects of this agentic RAG pipeline. Users can upload their own webpages to the vector database by entering a newline-seperated list of URLs in the textbox and clicking Upload, or they can upload their own PDF files from their local machine to be stored in the vector datastore.

Expand for Monitoring Settings.

This tab holds the agentic RAG monitoring tools built into this application.

  • The first tool is a console that logs all the actions the agent has decided to take when processing the user query and provides a general overview into the agent's decision making.
  • The second tool is an in-depth trace of the agent's actions for the last submitted query, which gives more detail into the context retrieved, websearch documents found, LLM pipeline components used, etc. when generating out the most recent response.

Sizing Guide

GPU VRAM Example Hardware Compatible?
<16 GB RTX 3080, RTX 3500 Ada Y
16 GB RTX 4080 16GB, RTX A4000 Y
24 GB RTX 3090/4090, RTX A5000/5500, A10/30 Y
32 GB RTX 5000 Ada Y
40 GB A100-40GB Y
48 GB RTX 6000 Ada, L40/L40S, A40 Y
80 GB A100-80GB Y
>80 GB 8x A100-80GB Y

Quickstart

Prerequisites

AI Workbench will prompt you to provide a few pieces of information before running any apps in this project. Ensure you have this information ready.

  • An NVIDIA API Key. You can generate one under Get API Key on any API Catalog model card
  • A Tavily Search API Key. You can generate one under a free account (1000 searches/month) here.

Tutorial (Desktop App)

If you do not NVIDIA AI Workbench installed, first complete the installation for AI Workbench here. Then,

  1. Fork this Project to your own GitHub namespace and copy the link

    https://github.com/[your_namespace]/<project_name>
    
  2. Open NVIDIA AI Workbench. Select a location to work in.

  3. Clone this Project onto your desired machine by selecting Clone Project and providing the GitHub link.

  4. Wait for the project to build. You can expand the bottom Building indicator to view real-time build logs.

  5. When the build completes, set the following configurations.

    • EnvironmentSecretsConfigure. Specify the NVIDIA API Key and Tavily Search Key as project secrets.
  6. On the top right of the window, select Jupyterlab.

  7. Navigate to the code directory of the project. Then, open your fine-tuning notebook and get started. Happy coding!

Tutorial (CLI-Only)

Some users may choose to use the CLI tool only instead of the Desktop App. If you do not NVIDIA AI Workbench installed, first complete the installation for AI Workbench here. Then,

  1. Fork this Project to your own GitHub namespace and copying the link

    https://github.com/[your_namespace]/<project_name>
    
  2. Open a shell and activating the Context you want to clone into by

    $ nvwb list contexts
    
    $ nvwb activate <desired_context>
    
    💡 Tip
    Use nvwb help to see a full list of AI Workbench commands.
  3. Clone this Project onto your desired machine by running

    $ nvwb clone project <your_project_link>
    
  4. Open the Project by

    $ nvwb list projects
    
    $ nvwb open <project_name>
    
  5. Start Jupyterlab by

    $ nvwb start jupyterlab
    
    • Specify the NVIDIA API Key and Tavily Search Key as project secrets.
  6. Navigate to the code directory of the project. Then, open your fine-tuning notebook and get started. Happy coding!

License

This NVIDIA AI Workbench example project is under the Apache 2.0 License

This project may utilize additional third-party open source software projects. Review the license terms of these open source projects before use. Third party components used as part of this project are subject to their separate legal notices or terms that accompany the components. You are responsible for confirming compliance with third-party component license terms and requirements.

❓ Have Questions?
Please direct any issues, fixes, suggestions, and discussion on this project to the DevZone Members Only Forum thread here

About

An NVIDIA AI Workbench example project for an Agentic Retrieval Augmented Generation (RAG)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published