Hi! This is an auxiliary repository for a kriakiku/combat-otter project. The purpose of the repository is to simplify the training of the tesseract-ocr model for new fonts by creating ready-made commands.
π But of course you can use this solution for your own purposes not related to our application in any way.
The author of the repository used WSL 2 (installation guide) with installed Ubuntu 22.04.3 LTS (download) and Tesseract 5.
You can install the latest version of the library using the following commands:
# sudo add-apt-repository -y ppa:alex-p/tesseract-ocr-devel
# sudo apt -y update
# sudo apt install -y tesseract-ocr
Take a screenshot of the area of the screen containing the font you are exclusively interested in. Use one of the services: whatfontis, fontsquirrel.
Use the fc-scan
or fc-list
CLI utility.
You should use the fullname
value.
# fc-scan "./fonts/Stratum2 Bold.ttf"
Pattern has 23 elts (size 32)
family: "Stratum2"(s) "Stratum2 Bd"(s)
familylang: "en"(s) "en"(s)
style: "Bold"(s)
stylelang: "en"(s)
fullname: "Stratum2 Bd Bold"(s)
fullnamelang: "en"(s)
slant: 0(i)(s)
weight: 200(f)(s)
width: 100(f)(s)
foundry: "ptf"(s)
file: "./fonts/Stratum2 Bold.ttf"(s)
index: 0(i)(s)
[...]
postscriptname: "Stratum2-Bold"(s)
color: False(s)
symbol: False(s)
variable: False(s)
You will only be able to see the fonts installed in the system.
# fc-list
[...]
~/.fonts/tessfont/Stratum2 Bold.ttf: Stratum2,Stratum2 Bd:style=Bold
[...]
~/.fonts/tessfont/Stratum2 Medium.ttf: Stratum2,Stratum2 Md:style=Medium,Regular
We are talking about the pre-game lobby in CoD
- Stratum2 Bold: EN, DE, ES (Latin), ES, FR, IT, PT (Brazilian);
- Bio Sans Bold, Bio Sans Regular: PL, RU;
- ????: AR, ZN (Traditional), ZN (Simplified);
- ????: TH, KO, JA β each language has a unique font;
Download and place the font you need in the fonts
folder. Next, update the environment settings in the config.ini
file:
- Enter the font name in the environment variable (
FONT_NAME
);
You will need to generate images to retrain the model. This process will take several hours (about 195 000 files will be generated). But you can stop its execution at any time and continue by restarting the command:
# ./0.generate-training-data.sh
You may want to save files to cloud storage or share them. To do this, use the command that archives the files:
# ./0.post-archive-generated-training-data.sh
If you need the generated files for already trained fonts, you can download and uncompress them to the train
folder:
- Bio Sans Bold-GT.tar.bz2 (375 MB)
- Bio Sans-GT.tar.bz2 (393 MB)
- Stratum2 Bold-GT.tar.bz2 (325 MB)