Update (15th Janurary 2022): Paths to download data-files have been updated.
Update (27th August 2020) :
A bug related to variable image size is fixed. You can now train with variable image sizes. This will improve generations significantly.
Training is now significantly faster. Pull all changes and train as usual.
Update (26th July 2020) :
-
Pre-trained weights have been uploaded. Please refer to the Pre-trained weights section for usage.
-
The latest commit makes a few modifications to the model. Pull all changes before using the pre-trained weights.
This repository presents SRNet (Liang Wu et al), a neural network that tackles the problem of text editing in images. It marks the inception of an area of research that could automate advanced editing mechanisms in the future.
SRNet is a twin discriminator generative adversarial network that can edit text in any image while maintaining context of the background, font style and color. The demo below showcases one such use case. Movie poster editing.
L - Source ; R - Modified
This implementation of SRNet introduces two main changes.
-
Training: The original SRNet suffers from instability. The generator loss belies the instability that occurs during training. This imbalance affects skeleton (t_sk) generation the maximum. The effect manifests when the generator produces a sequence of bad t_sk generations, however instead of bouncing back, it grows worse and finally leads to mode collapse. The culprit here is the min-max loss. A textbook method to solve this problem is to let the discriminator always be ahead of the generator. The same was employed in this implementation.
-
Generator: In order to accomodate for a design constraint in the original net, I have added three extra convolution layers in the decoder_net.
Incorporating these changes improved t_sk generations dramatically and increased stability. However, this also increased training time by ~15%.
A virtual environment is the most convenient way to setup the model for training or inference. You can use virtualenv for this. The rest of this guide assumes that you are in one.
- Clone this repository:
$ git clone https://github.com/Niwhskal/SRNet.git $ cd SRNet
- Install requirements (Make sure you are on python3.xx):
$ pip3 install -r requirements.txt
This repository provides you with a bash script that circumvents the process of synthesizing the data manually as the original implementation does. The default configuration parameters set's up a dataset that is sufficient to train a robust model.
- Grant execute permission to the bash script:
$ chmod x data_script.sh
- Setup training data by executing:
$ ./data_script.sh
The bash script downloads background data and a word list, it then runs a datagenerator script that synthesizes training data. Finally, it modifies paths to enable straightforward training. A detailed description of data synthesis is provided by youdao-ai in his original datagenerator repository.
If you wish to synthesize data with different fonts, you could do so easily by adding custom .ttf files to the fonts directory before running datagen.py
. Examine the flow of data_script.sh
and change it accordingly.
- Once data is setup, you can immediately begin training:
$ python3 train.py
If you wish to resume training or use a checkpoint, update it's path and run train.py
If you are interested in experimenting, modify hyperparameters accordingly in cfg.py
In order to predict, you will need to provide a pair of inputs (The source i_s and the custom text rendered on a plain background in grayscale (i_t) -examples can be found in SRNet/custom_feed/labels
-). Place all such pairs in a folder.
- Inference can be carried out by running:
$ python3 predict.py --input_dir *data_dir* --save_dir *destination_dir* --checkpoint *path_to_ckpt*
You can download my pre-trained weights here
Some results from the example directory:
Source | Result |
---|
Code for the demo is hastily written and is quite slow. If anyone is interested in trying it out or would like to contribute to it, open an issue, submit a pull request or send me an email at [email protected]
. I can host it for you.
-
Editing Text in the Wild: An innovative idea of using GAN's in an unorthodox manner.
-
Youdao-ai's original repository: The original tensorflow implementation which helped me understand the paper from a different perspective. Also, credit to youdao for the data synthesis code. If anyone is interested in understanding the way data is synthesized for training, examine his repository.
-
SynthText project: This work provides the background dataset that is instrumental for data synthesis.
-
Streamlit docs: One of the best libraries to build and publish apps. Severely underrated.