Discussion: Collection handling in vector database #3423

jonathan-rohde · 2024-06-25T11:59:09Z

I want to raise the following discussion.

Status quo:

Questions:

What do you think of storing all documents in a single collection?
- My opinion: A vector database is more efficient finding the k nearest vectors from one single collection than a python application fetching k nearest vectors from x collections, sorting them and taking first k vectors from result.
  Filtering based on meta data can easily be implemented to limit a search based on the file hash.
  Please correct me, if my assumption is incorrect.
Shouldn't the vectors get deleted in the vector store, once the document is deleted?
- I can see the benefit of having it still in the store as if the file is scanned/uploaded again, it an just be reused.
  But I see a problem, when for example we want to add or update meta data, like the file name.

open-webui locked and limited conversation to collaborators Jun 25, 2024

tjbck converted this issue into discussion #3427 Jun 25, 2024

Provide feedback