Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Collection handling in vector database #3423

Closed
jonathan-rohde opened this issue Jun 25, 2024 · 0 comments
Closed

Discussion: Collection handling in vector database #3423

jonathan-rohde opened this issue Jun 25, 2024 · 0 comments

Comments

@jonathan-rohde
Copy link
Contributor

I want to raise the following discussion.

Status quo:

  • One document results in one collection in the vector database
  • The hash of the file content is the collection name
  • The meta data contains the filename
  • Deletion of documents keep data still in vector database

Questions:

  • What do you think of storing all documents in a single collection?

    • My opinion: A vector database is more efficient finding the k nearest vectors from one single collection than a python application fetching k nearest vectors from x collections, sorting them and taking first k vectors from result.
      Filtering based on meta data can easily be implemented to limit a search based on the file hash.
      Please correct me, if my assumption is incorrect.
  • Shouldn't the vectors get deleted in the vector store, once the document is deleted?

    • I can see the benefit of having it still in the store as if the file is scanned/uploaded again, it an just be reused.
      But I see a problem, when for example we want to add or update meta data, like the file name.
@open-webui open-webui locked and limited conversation to collaborators Jun 25, 2024
@tjbck tjbck converted this issue into discussion #3427 Jun 25, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant