Documents Indexing [ Collection already exists ] #3229

AhmadMuj · 2024-06-17T08:49:03Z

Bug Report

Description

Bug Summary:
I'm getting an exception while trying to index ( Add ) documents to the workspace collection

Steps to Reproduce:
I think the main problem is that I added a document before and somehow it didn't finish to the end or something so now I'm unable to add the same document again because a chunk of the collection already exists

Expected Behavior:
There should be some handling for large documents ( 2000 pages ) by introducing some kind of queue for the indexing instead of the document not being visible until it's fully indexed ( Which might take up to a few hours in my case )

Actual Behavior:
The document is not appearing until it's fully indexed so I'm not really sure what is the status of the document now

Environment

Open WebUI Version: 0.35
Operating System: Windows [ Client not host ]
Browser (if applicable): Chromium 126.0.6478.71

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
Nothing is getting logged on the browser, the request is timing out due to taking so long from the server side

Docker Container Logs:

metadata={'source': '/app/backend/data/uploads/RSS-D15', 'page': 128, 'start_index': 41})] 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772
ERROR:apps.rag.main:Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists
Traceback (most recent call last):
  File "/app/backend/apps/rag/main.py", line 938, in store_docs_in_vector_db
    collection = CHROMA_CLIENT.create_collection(name=collection_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/api/client.py", line 198, in create_collection
    return self._server.create_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/api/segment.py", line 173, in create_collection
    coll, created = self._sysdb.create_collection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/db/mixins/sysdb.py", line 220, in create_collection
    raise UniqueConstraintError(f"Collection {name} already exists")
chromadb.db.base.UniqueConstraintError: Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists

Installation Method

Docker

Additional Information

Basically the main problem is assuming that a document indexing could be done instantly, I have documents up to 4000 pages that I would like to index and the only way to do that is by adding some kind of queue with jobs for the indexing and showing the document as ( under processing )

The text was updated successfully, but these errors were encountered:

open-webui locked and limited conversation to collaborators Jun 17, 2024

tjbck converted this issue into discussion #3230 Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Documents Indexing [ Collection already exists ] #3229

Documents Indexing [ Collection already exists ] #3229

AhmadMuj commented Jun 17, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Documents Indexing [ Collection already exists ] #3229

Documents Indexing [ Collection already exists ] #3229

Comments

AhmadMuj commented Jun 17, 2024 • edited Loading

Bug Report

Description

Environment

Reproduction Details

Logs and Screenshots

Installation Method

Additional Information

This issue was moved to a discussion.

AhmadMuj commented Jun 17, 2024 •

edited

Loading