nl2query

Convert natural language text inputs to Pandas, MongoDB, Kusto, and Cypher(Neo4j) queries. The models used are fine-tuned versions of CodeT5 220m and Phi2 model.

Getting started

You can get nl2query from PyPI, using

python -m pip install nl2query

Example usage

1. Pandas Query

Suppose you want to convert the textual question to pandas query, follow the code below

from nl2query import PandasQuery

titanic = pd.read_csv('/path/titanic.csv')
queryfier = PandasQuery(titanic, 'titanic')

queryfier.generate_query('''list all people who paid more fare than the fare paid by 'Braund, Mr. Owen Harris' ''')
queryfier.generate_query('''find the names of passengers with age greater than 35 and containing Heath in their name''')
queryfier.generate_query('''which cabinet has average age less than 21?''') #Groupby Query

2. MongoDB Query

Suppose you want to convert the textual question to Mongo query, follow the instruction code below

MongoDB query using CodeT5

The generate_query method takes a textual query and returns a MongoDB query. It also accepts optional parameters to control the generation process, such as num_beams, max_length, repetition_penalty, length_penalty, early_stopping, top_p, top_k, and num_return_sequences.

NOTE: GPU will be required to run Phi2 as quantization is enabled using load_in_4bit.

from nl2query import MongoQuery
import pymongo # import if performing analysis using python client
keys = ['_id', 'index', 'passengerid', 'survived', 'Pclass', 'name', 'sex', 'age', 'sibsp', 'parch', 'ticket', 'fare', 'cabin', 'embarked'] #keys present in the collection to be queried.
queryfier = MongoQuery('T5', collection_keys = keys, collection_name = 'titanic')
queryfier.generate_query('''which pclass has the minimum average fare?''')

keys = ['_id', 'index', 'total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'] 
queryfier = MongoQuery('T5', collection_keys = keys, collection_name = 'titanic')
queryfier.generate_query('''find the day on which combined sales was highest''')

In the above code the keys can be found by running the following piece db.tips.find_one({}).keys()

MongoDB query using Phi2

The generate_query method takes a database schema and a textual query and returns a MongoDB query. It also accepts optional parameters to control the generation process, such as max_length, no_repeat_ngram_size, and repetition_penalty. The Phi2 model performs better than the CodeT5 model.

from nl2query import MongoQuery
schema = shipwreck = '''{
  "collections": [
    {
      "name": "shipwrecks",
      "indexes": [
        {
          "key": {
            "_id": 1
          }
        },
        {
          "key": {
            "feature_type": 1
          }
        },
        {
          "key": {
            "chart": 1
          }
        },
        {
          "key": {
            "latdec": 1,
            "londec": 1
          }
        }
      ],
      "uniqueIndexes": [],
      "document": {
        "properties": {
          "_id": {
            "bsonType": "string"
          },
          "recrd": {
            "bsonType": "string"
          },
          "vesslterms": {
            "bsonType": "string"
          },
          "feature_type": {
            "bsonType": "string"
          },
          "chart": {
            "bsonType": "string"
          },
          "latdec": {
            "bsonType": "double"
          },
          "londec": {
            "bsonType": "double"
          },
          "gp_quality": {
            "bsonType": "string"
          },
          "depth": {
            "bsonType": "string"
          },
          "sounding_type": {
            "bsonType": "string"
          },
          "history": {
            "bsonType": "string"
          },
          "quasou": {
            "bsonType": "string"
          },
          "watlev": {
            "bsonType": "string"
          },
          "coordinates": {
            "bsonType": "array",
            "items": {
              "bsonType": "double"
            }
          }
        }
      }
    }
  ],
  "version": 1
}'''

queryfier = MongoQuery('Phi2')
text = 'Find the count of shipwrecks for each unique combination of "latdec" and "longdec"'
queryfier.generate_query(schema, text, max_length = 1024)

text = 'Find the total count of shipwreck for each unique category of chart'
queryfier.generate_query(schema, text, max_length = 1024)

3. Kusto Query

Suppose you want to convert the textual question to Kusto query, follow the code below

from nl2query import KustoQuery
cols = ['conference', 'sessionid', 'session_title', 'session_type', 'owner', 'participants', 'URL', 'level', 'session_location', 'starttime', 'duration', 'time_and_duration', 'kusto_affinity']

queryfier = KustoQuery(cols, 'ConferenceSessions')
queryfier.generate_query('''find the session ids which have duration greater than 10 and having Manoj Raheja as the owner''')

4. Cypher(Neo4j) Query

Suppose you want to convert the textual question to Cypher query, follow the code below

from nl2query import CypherQuery

node_labels = {'User': ['display_name', 'uuid'], 'Comment': ['score', 'link', 'uuid']}
relationships = ['COMMENTED']
queryfier = CypherQuery(node_labels, relationships)
queryfier.generate_query('list the links of all the comments done by "jose_bacoy"')

node_labels = {'Case': ['gender', 'reportdate', 'ageunit', 'reporteroccupation', 'primaryid', 'age', 'eventDate'], 'Outcome': ['code', 'outcome']}
relationships = ['RESULTED_IN']
queryfier = CypherQuery(node_labels, relationships)
queryfier.generate_query('find the outcomes of people who are female and below the age of 32')

node_labels = {'Person': ['id', 'name', 'dob']}
relationships = []
queryfier = CypherQuery(node_labels, relationships)
res = queryfier.generate_query('find the dob of people who have "Andreia" in their name')

Changelog

Refer to the CHANGELOG.md file.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
.tox		.tox
images		images
nl2query		nl2query
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.rst		CHANGELOG.rst
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nl2query

Getting started

Example usage

1. Pandas Query

2. MongoDB Query

MongoDB query using CodeT5

MongoDB query using Phi2

3. Kusto Query

4. Cypher(Neo4j) Query

Changelog

About

Releases 5

Packages

Languages

License

Chirayu-Tripathi/nl2query

Folders and files

Latest commit

History

Repository files navigation

nl2query

Getting started

Example usage

1. Pandas Query

2. MongoDB Query

MongoDB query using CodeT5

MongoDB query using Phi2

3. Kusto Query

4. Cypher(Neo4j) Query

Changelog

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages