Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documents Indexing [ Collection already exists ] #3229

Closed
4 tasks done
AhmadMuj opened this issue Jun 17, 2024 · 0 comments
Closed
4 tasks done

Documents Indexing [ Collection already exists ] #3229

AhmadMuj opened this issue Jun 17, 2024 · 0 comments

Comments

@AhmadMuj
Copy link

AhmadMuj commented Jun 17, 2024

Bug Report

Description

Bug Summary:
I'm getting an exception while trying to index ( Add ) documents to the workspace collection

Steps to Reproduce:
I think the main problem is that I added a document before and somehow it didn't finish to the end or something so now I'm unable to add the same document again because a chunk of the collection already exists

Expected Behavior:
There should be some handling for large documents ( 2000 pages ) by introducing some kind of queue for the indexing instead of the document not being visible until it's fully indexed ( Which might take up to a few hours in my case )

Actual Behavior:
The document is not appearing until it's fully indexed so I'm not really sure what is the status of the document now

Environment

  • Open WebUI Version: 0.35

  • Operating System: Windows [ Client not host ]

  • Browser (if applicable): Chromium 126.0.6478.71

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
Nothing is getting logged on the browser, the request is timing out due to taking so long from the server side

Docker Container Logs:

metadata={'source': '/app/backend/data/uploads/RSS-D15', 'page': 128, 'start_index': 41})] 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772
ERROR:apps.rag.main:Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists
Traceback (most recent call last):
  File "/app/backend/apps/rag/main.py", line 938, in store_docs_in_vector_db
    collection = CHROMA_CLIENT.create_collection(name=collection_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/api/client.py", line 198, in create_collection
    return self._server.create_collection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/api/segment.py", line 173, in create_collection
    coll, created = self._sysdb.create_collection(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/telemetry/opentelemetry/__init__.py", line 143, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chromadb/db/mixins/sysdb.py", line 220, in create_collection
    raise UniqueConstraintError(f"Collection {name} already exists")
chromadb.db.base.UniqueConstraintError: Collection 2cc3c148b3126225d91ab1a587ffb8aaf528aad7bc4def43f5c43c394e6f772 already exists

Installation Method

Docker

Additional Information

Basically the main problem is assuming that a document indexing could be done instantly, I have documents up to 4000 pages that I would like to index and the only way to do that is by adding some kind of queue with jobs for the indexing and showing the document as ( under processing )

@open-webui open-webui locked and limited conversation to collaborators Jun 17, 2024
@tjbck tjbck converted this issue into discussion #3230 Jun 17, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant