Skip to content

The official pytorch implementation of Paper: RECOGNITION-GUIDED DIFFUSION MODEL FOR SCENE TEXT IMAGE SUPER-RESOLUTION

License

Notifications You must be signed in to change notification settings

shercoo/RGDiffSR

Repository files navigation

RGDiffSR

The official pytorch implementation of Paper: RECOGNITION-GUIDED DIFFUSION MODEL FOR SCENE TEXT IMAGE SUPER-RESOLUTION

paper

Installation

Environment preparation: (Python 3.8 PyTorch 1.7.0 Torchvision 0.8.1 pytorch_lightning 1.5.10 CUDA 11.0)

conda create -n RGDiffSR python=3.8
git clone [email protected]:shercoo/RGDiffSR.git
cd RGDiffSR
pip install -r requirements.txt

You can also refer to taming-transformers for the installation of taming-transformers library (Needed if VQGAN is applied).

Dataset preparation

Download the TextZoom dataset at TextZoom.

Model checkpoints

Download the pre-trained recognizers Aster, Moran, CRNN.

Download the checkpoints of pre-trained VQGAN and RGDiffSR at Baidu Netdisk. Password: yws3

Training

First train the latent encoder (VQGAN) model.

CUDA_VISIBLE_DEVICES=<GPU_IDs> python main.py -b configs/autoencoder/vqgan_2x.yaml -t --gpus <GPU_IDS>     

Put the pre-trained VQGAN model in checkpoints/.

CUDA_VISIBLE_DEVICES=<GPU_IDs> python main.py -b configs/latent-diffusion/sr_best.yaml -t --gpus <GPU_IDS>

Testing

Put the pre-trained RGDiffSR model in checkpoints/.

CUDA_VISIBLE_DEVICES=<GPU_IDs> python test.py -b configs/latent-diffusion/sr_test.yaml  --gpus <GPU_IDS>

You can manually modify the test dataset directory in sr_test.yaml for test on different difficulty of TextZoom dataset.

License

The model is licensed under the MIT license.

Acknowledgement

Our code is built on the latent-diffusion and TATT repositories. Thanks to their research!

Releases

No releases published

Packages

No packages published

Languages