Dual-specification query synthesis with natural language and table sketch queries.
Dependencies:
- Docker 19.03
- NVIDIA GPU
- NVIDIA Docker
TODO: Simplify this process with Docker compose - cannot do this until issue with GPUS is resolved.
- Set the variable correctly for the most recent version.
export DQ_VERSION=0.1
- Start the Docker network (
dq-net
).
docker network create dq-net
- Run the data container (
dq-data
).
docker run --rm -dit --name dq-data -v dq-vol:/home/data chrisjbaik/duoquest-data:$DQ_VERSION
- Run the Enumerator (
dq-enum
) using one of the following instructions.
docker run --rm --gpus all -dit --name dq-enum --network dq-net -v dq-vol:/workspace/data chrisjbaik/duoquest-enum:$DQ_VERSION
# Add `--toy` for fast-starting debugging mode with decreased performance
docker run --rm --gpus all -dit --name dq-enum --network dq-net -v dq-vol:/workspace/data chrisjbaik/duoquest-enum:$DQ_VERSION --toy
- Run the task database container (
dq-task-db
).
docker run --rm -dit --name dq-task-db --network dq-net chrisjbaik/duoquest-task-db:$DQ_VERSION
- Run the autocomplete container (
dq-autocomplete
).
docker run --rm -dit --name dq-autocomplete --network dq-net redis
- Run the web interface container (
dq-web
). Note the-p
option, where the first port number indicates which port the web interface will run on the host machine. Also note theWORKERS_PER_CORE
option, which determines how many workers will run for the web server (we set it to0.1
because we have a 32-core server).
docker run --rm -dit -p 5000:80 -e WORKERS_PER_CORE="0.1" --name dq-web --network dq-net -v dq-vol:/home/data chrisjbaik/duoquest-web:$DQ_VERSION
- Run the main container (
dq-main
). The--timeout
flag indicates how many seconds each task will run before giving up.
docker run --rm -dit --name dq-main --network dq-net -v dq-vol:/home/data chrisjbaik/duoquest-main:$DQ_VERSION --timeout=60
Follow the instructions in steps 1-7 under Quickstart above. Instead of starting the dq-main
container with the default entrypoint, we run simulation experiments using the following command:
docker run --rm -dit --name dq-main --network dq-net -v dq-vol:/home/data --entrypoint="python" chrisjbaik/duoquest-main experiments.py spider dev default
The last 3 arguments indicate the dataset, subset of dataset, and type of evaluation (default
, partial
, minimal
, nlq_only
, tsq_only
, chain
), respectively.
If dq-main
is already running, shut down and remove that container using docker stop
and docker rm
if needed to ensure this container can run.
If you want to view the experiment progress in real-time, use the following:
docker logs -f dq-main
The results will automatically be saved in a results
folder within the shared volume dq-vol
.
The following container can be executed to generate a viewable result summary/analysis after running a simulation experiment:
docker run --rm -it --name dq-eval --network dq-net -v dq-vol:/home/data --entrypoint="python" chrisjbaik/duoquest-main eval.py spider dev default
Note that all arguments (including any additional arguments unmentioned above, like --timeout
) must exactly match the arguments executed when running the dq-main
container for running the experiment!
The task database has the following schema:
Tasks
Column Name | Type | Description |
---|---|---|
tid | text | task id |
db | text | database name |
nlq | text | natural language query |
nlq_with_literals | text | raw NLQ including tag markup |
tsq_proto | blob | table sketch query protobuf |
literals_proto | blob | literals protobuf |
status | text | waiting , running , done , or error |
time | integer | timestamp for task submission time |
error_msg | text | error message, if any |
Databases
Column Name | Type | Description |
---|---|---|
name | text | database name |
path | text | database path in file system |
schema_proto | blob | schema protobuf |
Results
Column Name | Type | Description |
---|---|---|
rid | integer | primary key/unique id for result |
tid | text | foreign key to task id |
query | text | candidate SQL query |
- Save the version number as a variable.
export DQ_VERSION=<version_number_here>
-
Download Spider dataset and mas_smallest.sqlite into
data/
folder. -
Build data container.
docker build -t chrisjbaik/duoquest-data:$DQ_VERSION data/
- Load/build Enumerator image.
cd enum/syntaxSQL
git submodule init # only if submodule not initialized yet
git submodule update # only if submodule not initialized yet
docker build -t chrisjbaik/duoquest-enum:$DQ_VERSION .
cd ../../
- Build task database image.
docker build -t chrisjbaik/duoquest-task-db:$DQ_VERSION -f task_db/Dockerfile .
- Build web interface image.
docker build -t chrisjbaik/duoquest-web:$DQ_VERSION -f web/Dockerfile .
- Build main image.
docker build -t chrisjbaik/duoquest-main:$DQ_VERSION .