Docker container image built with Jupyter Notebook and Tabula for PDF scraping. Includes an example notebook to help you get started.
Repository hosted on GitHub at aeksco/jupyter-tabula.
Docker container image hosted on Docker Hub at aeksco/jupyter-tabula.
docker run -it -p 8888:8888 aeksco/jupyter-tabula
The Example_01.ipynb
opens a single-page PDF and parses the table contained within.
Use the following instructions if you want to modify this Docker image and push a different copy to DockerHub.
-
Create a new Dockerhub Repository named
jupyter-tabula
. -
Build the Docker image by running the following command in the
jupyter-tabula
directory:
docker build -t jupyter-tabula .
- Test the image locally with the following command:
docker run -it --rm --pid=host -p 8888:8888 jupyter-tabula
- Tag the image and push to Dockerhub:
docker tag bb38976d03cf your_docker_hub_username/jupyter-tabula:latest
docker push your_docker_hub_username/jupyter-tabula
I would recommend this container image for testing and hacking purposes. This image is a bit heavy (2.02GB
) since it extends the tensorflow/tensorflow container image. This was done mainly to save time since the Tensorflow container already has all the relevant dependencies installed and configured.