Introduction

This repo contains a demo streamlit aplication with which it is possible to perform parallel question answering across multiple pdf Documents. This functionality is enabled by Llama index and Large Language Models (LLMs). The code is tested with text-davinci-003 from OpenAI with the configuration in app_config.yaml. Every OpenAI model can be used in the configuration app_config.yaml.

General

This repo provides some example pdf Documents for indexing which have been generated via ChatGPT !!! Please note that it is not legally compliant to send personal identifiable information to LLM Apis like OpenAI Make sure to test the App only with fictional CV documents, anynomize the CV documents or execute queries against a locally deployed LLM model instead of using OpenAI. PTC takes no legal responsibility for what data you send to OpenAI via this application !!!

Getting Started

Download the ESCO dataset version 1.1.0 Link to ESCO Download

Version: ESCO dataset - v1.1.0
Content: classification
Language: en
File type: csv

1.1. Unzip the .csv file you get send via Email and set the path as an environment variable

Setup your environment variables e.g. in an .env file

OPENAI_API_KEY = "here comes your openai api key" (example)

ESCO_NER_SEARCHTERMS= "your_path_to_esco_searchterms_skill_ner_csv/ESCO dataset - v1.1.0 - classification - en - csv/searchterms_skill_ner.csv" (example)

Install Poetry Link to Poetry CLI installation tutorial
Create Poetry environment and install the package

Link to Poetry Environment Management

In your terminal confirm that poetry is available:

poetry --version

Start a poetry console:

poetry shell

Install the package and dependencies via the pyproject.toml:

poetry install

If you cannot use Poetry for your dependency management you can alternatively install the requirements via

pip install -r requirements.txt

Launch the streamlit app via the poetry shell:

streamlit run <your_absolute_path_2_the_app>\multi_index_demo\app.py

Overview of the application architecture

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
multi_index_demo		multi_index_demo
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

General

Getting Started

Overview of the application architecture

About

Releases

Packages

Languages

License

vdeeplearn/LLM_Mini_Series_Part_II

Folders and files

Latest commit

History

Repository files navigation

Introduction

General

Getting Started

Overview of the application architecture

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages