Skip to content
/ DrFAQ Public

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Notifications You must be signed in to change notification settings

jetnew/DrFAQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DrFAQ

  • DrFAQ is a plug-and-play question answering chatbot that can be generally applied to any organiation's text corpora.
  • Designed and implemented a NLP Question Answering architecture using spaCy, huggingface’s BERT language model, ElasticSearch, Telegram Bot API, and hosted on Heroku.

News

  • 4 Mar 2021 - Transfer learning of language models alongside evaluation study is currently in progress.
  • 13 Dec 2019 - Implementation of 4-step question-answering methodology completed.

Objective

  • Given an organisation's corpus of documents, generate a chatbot to enable natural question-answering capabilities.

Methodology

When a question is asked, the following processes are performed:

  1. FAQ Question Matching using spaCy's Similarity - /match
    • From a given list of Frequently Asked Questions (FAQs), the chatbot detects similarity to the specified question and selects the best answer from the existing list.
  2. NLP Question Answering using huggingface's BERT - /nlp
    • If the question asked is dissimilar to any existing FAQs, perform question answering on the knowledge base and return a sufficiently confident answer.
  3. Answer Search using ElasticSearch - /search
    • If the answer is not sufficiently confident, perform a search on the document corpus and return the search results.
  4. Human Intervention
    • If the search results are still not relevant, prompt a human to add the question-answer pair to the existing list of specified FAQs, or speak to a human.

Research

  • Transfer learning of language models researched in a benchmark study shows that:
    • If a large and clean QA dataset is available, RoBERTa is the best language model.
    • If only a small and unclean generated QA dataset is available, MobileBERT is the best language model.
    • If the QA dataset contains many 'Who' questions, RoBERTa should be considered.

Future Work

  • Release DrFAQ as a pip package.
  • Make an interactive demo available.
  • Integrate abstractive question-answering into the methodology.
  • Leverage databases and cloud services.

References

About

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published