Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cascading Bugs in Huggingface and Ollama Embedder #1574

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

kmitul
Copy link
Contributor

@kmitul kmitul commented Jul 24, 2024

Description

Note

This pull request builds upon the previous PR #1548 , which addressed Bug 1. During further testing, additional bugs were discovered and have been documented and resolved in this PR.

Summary

This pull request addresses a series of bugs in the embeddings especially while using 'huggingface' as embedder. Each fix revealed the next issue, which has been documented and resolved in sequence. I already made PR for Bug1 beforehand itself but discovered series of bugs later on as mentioned below.

Details

  1. Bug 1: ModuleNotFoundError: No module named 'embedding'

    • Files: mem0/embeddings/huggingface.py and mem0/embeddings/ollama.py
    • Description: Discoverd the incorrect import of the EmbeddingBase from non-existing directory.
    • Fix: Fixed the import of the EmbeddingBase class with correct parent directory.
  2. Bug 2: Value error, Unsupported embedding provider: huggingface

    • File: mem0/embeddings/configs.py
    • Description: After fixing the Bug 1, config was throwing error for hugginggface as provider in embedder config. 'huggingface' was not included in the validator in EmbedderConfig.
    • Fix: Added huggingface to the provider checklist in EmbedderConfig.
  3. Bug 3: TypeError: Can't instantiate abstract class HuggingFaceEmbedding with abstract method embed

    • File: mem0/embeddings/huggingface.py
    • Description: After fixing the Bug 2, HuggingFaceEmbedding was not being instantiated. The abstract method embed was missing.
    • Fix: Renamed the get_embedding method to embed in HuggingFaceEmbedding
  4. Bug 4: AttributeError: 'HuggingFaceEmbedding' object has no attribute 'dims'

    • File: mem0/embeddings/huggingface.py
    • Description: After fixing the Bug 3, HuggingFaceEmbedding was not being instantiated. The attribute dims was missing.
    • Fix: Added dims attribute to HuggingFaceEmbedding using get_sentence_embedding_dimension method of SentenceTransformer

Impact

These fixes, by addressing multiple cascading issues, improve the reliability and performance of mem0 while using the custom embedder config (like huggingface, ollama), preventing crashes and unexpected results.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • Test Script (please provide)

Test Script

This code initializes the Memory class with the specified configuration, ensuring that the embedding provider 'huggingface' and the model 'multi-qa-MiniLM-L6-cos-v1' are correctly set up and functional.

from mem0 import Memory

config = {
    "embedder": {
        "provider": "huggingface",
        "config": {
            "model": "multi-qa-MiniLM-L6-cos-v1"
        }
    }
}

m = Memory.from_config(config)

Steps to Reproduce

  1. Bug 1: Try using "huggingface" or "ollama" to the embedding provider in config.
  2. Bug 2: After fixing Bug 1, try to instantiate Memory class using from_config method.
  3. Bug 3: After fixing Bug 2, try to instantiate Memory class using from_config method.
  4. Bug 4: After fixing Bug 3, try to instantiate Memory class using from_config method.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Made sure Checks passed

Added 'huggingface' to the provider checklist in validate_config method of EmbedderConfig class
Renamed get_embedding method to embed
Added `dims` attribute to `HuggingFaceEmbedding` using `get_sentence_embedding_dimension` method of `SentenceTransformer`
@CLAassistant
Copy link

CLAassistant commented Jul 26, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants