A basic Python script to create either a TXT annotation file used to generate a TFRecords dataset, or a CSV file to use it directly as a Filename/label dataset's annotation file.
There are four arguments you can add from the command line interface.
- -n: name of the file that will be generated
- -p: path to the images folder
- -t: specify what type of format the file generated will have (TXT or CSV)
- -i: choose whether or not to add the images path (setted in False per default)
Given an image dataset located in a folder of your computer, the script can generate four different types of annotation files.
- CSV annotation file without specifying the images path
- CSV annotation file specifying the images path
- TXT annotation file without specifying the images path
- TXT annotation file specifying the images path
An example could be:
- python anno_file_gen.py -n annotations-training -p ./train -t txt -i True
You can use this script to create a TXT annotation file needed when generating a TFRecords dataset, format mainly found in datasets used to train Deep Learning models. A great example is emedvedev/attention-ocr repository.
You can also use this script to create a dataset's annotation file in CSV format. The CSV file has the structure of (filename,label). This type of file format is also used for datasets when training Deep Learning models. An example using this type of annotation file is rsommerfeld/trocr repository.