feat: whole/full document mode #3129

PkmX · 2024-06-13T10:43:01Z

Is your feature request related to a problem? Please describe.
Sometimes it may be ideal to pass the whole document to LLM for tasks like summarization. Currently, if you upload a document and ask the LLM to summarize it, the RAG would most likely just return no result. As the user can't turn off the retrieval part, the LLM basically becomes useless in this rather common use case. Passing the full document is also useful of tasks like translation or sentiment analysis.

Describe the solution you'd like
When using an uploaded document or fetched webpage, there should be a checkbox allowing the user to pass the entire ingested document as the context. This should be a relative easy change to just skip the retrieval straight to the query.

Ideally, there should be a warning if the content is larger than the LLM's context size, so truncation or degraded output does not occur.

Describe alternatives you've considered

Just copy/paste the full document into the chatbox: It works but is not very convenient.
Have a separate pipeline specifically made for summarization: This can use a myriad of techniques to provide even better results with smaller context sizes, but requires significantly more effort to implement.

Additional context
This feature is also present in some other chat UI's, e.g., “document pinning” in AnythingLLM.

I'm not sure how this will interact with pipelines, since pipelines can basically do anything. I think such a flag can also be passed to compatible RAG pipelines to let it know if the user wants to perform context retrieval or just do a full-context insertion.

bunnyfu · 2024-06-14T16:26:56Z

Full support, with 32-128k context models now being more common, sometimes it's just easier and more failsafe to pass the whole document into the context if you want a summary.

Peter-De-Ath · 2024-06-14T23:31:16Z

Yes I like this.
I currently open the document then paste it into chat, then do some editing of the messages to get the prompting right for the task i want.

on Summarization I have a modelfile which was setup just for summarizing which i can just paste my entire document (text) and return just the summary

rvkwi · 2024-07-22T19:40:15Z

That would be really useful for several edge cases, recently ran into this not with summaries but some analysis. It was not about the content but patterns in it (so the entire context was important to survive at once). Ended up copying and pasting a lot, and eventually throwing it into Claude because this got tedious quickly.

RAG is amazing, but there's cases where a simple 1:1 is all you need.

pbasov · 2024-07-23T20:43:26Z

With llama3.1 supporting 128k context this would be an important feature to have

justinh-rahb · 2024-07-23T21:30:01Z

With llama3.1 supporting 128k context this would be an important feature to have

I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM...

rvkwi · 2024-07-24T00:30:24Z

With llama3.1 supporting 128k context this would be an important feature to have

I think a lot of people are going to soon realize that Ollama doesn't actually run models at their full context size by default, and that if you wanted to run even 8B at 128K you'd need over 100GB of VRAM...

just because some can't fit it, doesn't mean others can't. num_ctx is not exactly hidden or advanced to understand, and you don't have to max it out just because it can. on top not everything uses standard quadratic attention to sequence length.

open webui even has it more clearly worded as context length in the model file.
i'm just saying, RAG is great, but there are also plenty of uses where you specifically don't want to cut a context insertion apart. a lot of these cases don't even need a massive context, but it's not unreasonable that people run high contexts with the webui. i think it would be a helpful feature to have as a toggle.

justinh-rahb · 2024-07-24T01:41:46Z

I wasn't arguing against it @rvkwi, you can see that I did thumbs-up the OP. You and I may know how num_ctx works full-well, but there are a LOT of WebUI users that don't understand that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: whole/full document mode #3129

feat: whole/full document mode #3129

PkmX commented Jun 13, 2024

bunnyfu commented Jun 14, 2024

Peter-De-Ath commented Jun 14, 2024 •

edited

Loading

rvkwi commented Jul 22, 2024

pbasov commented Jul 23, 2024

justinh-rahb commented Jul 23, 2024

rvkwi commented Jul 24, 2024

justinh-rahb commented Jul 24, 2024

feat: whole/full document mode #3129

feat: whole/full document mode #3129

Comments

PkmX commented Jun 13, 2024

bunnyfu commented Jun 14, 2024

Peter-De-Ath commented Jun 14, 2024 • edited Loading

rvkwi commented Jul 22, 2024

pbasov commented Jul 23, 2024

justinh-rahb commented Jul 23, 2024

rvkwi commented Jul 24, 2024

justinh-rahb commented Jul 24, 2024

Peter-De-Ath commented Jun 14, 2024 •

edited

Loading