Author: Nathan Aday / [email protected]
https://github.com/nathanaday/RealTime-OCR
Perform text detection in a variety of languages with your computer webcam using Google Tesseract OCR and OpenCV. This script achieves a real-time OCR effect by incorporating multi-threading.
Requires Tesseract to be installed (see info below). Tesseract is free and easy to install on Mac, Windows, and Linux.
Note that the frame display can take some time to pop up. After running the file with command line arguments, please be patient for a little bit and report any crashes/bugs to me.
An OpenCV video stream runs in a dedicated thread, so it's always reading live frames from the webcam and storing the most recent in memory as a class attribute. The video display can access these frames and show them in real-time. Meanwhile, pytesseract OCR has its own dedicated class, running in a dedicated thread, and it simply grabs the most recent frame from the video stream class, processes it, and returns the detected text when it is finished processing. Without any form of multi-threading or multiprocessing, a frame display loop would have to wait for the OCR text detection to complete analysis before showing a new frame, creating a large bottleneck for most implementations. Multi-threading improves the processes significantly by allowing the video stream to run in real-time while the OCR works in the background. Boxes and detected text are dispalyed as they are updated by the OCR thread. One limitation is that the OCR is still limited in processing speed and will not return a bounding box and detected text for every frame from the video stream, but for ordinary applications where the camera has several seconds to analyze stable text this lag is managable.
This script is easily modifiable to work with other camera sources, image types, and image processing methods since it is an implementation of the OpenCV video stream.
This a command-line script. The only required argument is a full path to the Tesseract executable from the Tesseract install (see DEPENDENCIES below for more info)
python Main.py -t '<full_path_to_your_tesseract_executable>' [-c ] [-v] [-sv] [-l] [-sl] [-s]
optional arguments:
-h, --help command-line argument help message
-c , --crop crop OCR area in pixels (two vals required): width height
-v , --view_mode view mode for OCR boxes display (default=1)
-sv, --show_views show the available view modes and descriptions
-l , --language code for tesseract language, use to add multiple (ex: chi_sim chi_tra)
-sl, --show_langs show list of tesseract (4.0 ) supported langs
-s, --src. SRC video source for video capture
required named arguments: -t , --tess_path path to the cmd root of tesseract install (see docs for further help)
The crop area allows OCR to be performed on a smaller frame and therefore improves speed. A box is automatically drawn around the crop so it's clear where to position text for detection.
This script implements four view modes, which stylize the way text is detected. To specify a view mode, use -v after the Main.py call
(View mode 1: Draws boxes on text with >75 confidence level)
(View mode 2: Draws red boxes on low-confidence text and green on high-confidence text)
(View mode 3: Color changes according to each word's confidence; brighter indicates higher confidence)
View mode 4: Draws a box around detected text regardless of confidence
If no view mode is specified, the OCR will run with mode 1.
To see the view options and their descriptions in the command line, evoke -sv or --show_views
In the case of multiple camera ports, the src for the desired video input can be specified with the -s command-line arguemnt. Without specification, the src defaults to 0, which for most users is a built-in webcam.
Using SRC source 0:
python Main.py -t '<full_path_to_your_tesseract_executable>' -s 0
Using SRC source 1:
python Main.py -t '<full_path_to_your_tesseract_executable>' -s 1
Tesseract can detect a variety of langauges since version 4 . A language can be specified to the OCR by appending the Main.py call with "-l "
For example, to detect simplified Chinese use:
python Main.py -t '<full_path_to_your_tesseract_executable>' -l chi_sim
Multiple languages can be simultaneously detected by appending the codes with ' '. To detect both simplified chinese and traditional chinese, use:
python Main.py -t '<full_path_to_your_tesseract_executable>' -l chi_sim chi_tra
A list of all language codes can be printed in the command line by evoking '-sl'.
python Main.py -t '<full_path_to_your_tesseract_executable>' -sl
Note, the printed list of available langauges comes from the tesseract supported languages, which should be included in an up-to-date install. However, evoking the lagnauge code at runtime will have no effect if the .traindata file for that language is nowhere in your Tesseract files.
If no language code is specified, the OCR defaults to English.
While running an OCR stream, push "c" to capture the current frame and save as a .jpeg to the working directory. A capture will also print the current detected text to the command line:
RealTime-OCR user$ REAL TIME OCR with pytesseract and CV2 “Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated.” OCR 2021-04-09 at 13:06:35-5.jpg
RealTime-OCR user$ 实时 OCR 跟 pytesseract, CV2 优美 胜 于 丑陋 , 显 明 胜 于 隐 含 。 简单 胜 于 复杂 , 复 杂 胜 于 繁复 。 扁平 胜 于 , 稀 胜 于 密集 。 可 读 性 会 起 作用 。 OCR 2021-04-09 at 12:59:54-10.jpg
This script requires:
- a Tesseract install
- the python wrapper for Teseract (pytesseract)
- OpenCV for python (CV2)
- Numpy
Instruction on how to install Tesseract on your OS are located here:
https://tesseract-ocr.github.io/tessdoc/Installation.html
To use the script, you will need the path to the exec file included in the Tesseract install, so note the install's location.
Example (on a Homebrew install): '/usr/local/Cellar/tesseract/4.1.1/bin/tesseract'
To install the Python wrapper for Tesseract, use:
pip install pytesseract
See the pytesseract docs for further help.
To install OpenCV for Python, use:
pip install opencv-python
See the opencv-python docs for further help.
For numpy, use:
pip install numpy
See the numpy docs for further help.
Tesseract should come with .traindata files that supports a wide variety of foreign languages. In any case, the repo for language files can be found here:
https://github.com/tesseract-ocr/tessdata
Main.py
- Command-line argparser and call to the OCR stream
OCR.py
- All classes and functions for multi-threaded text detection and webcam display
Linguist.py
- A few functions for handling the language codes and converting them to full language names (reads from Tesseract_Langs.txt)
Tesseract_Langs.txt
- Text file for every supported language code and the language's full name
README.md
requirements.txt
This script was written with customizability in mind. It's easy to add custom view modes, edit the pre-processing frames for the OCR, or customize the output displayed in the video capture. These changes can be made in OCR.py. To add custom command line arguments, see Main.py.
OpenCV is an incredbily robust computer vision package. Because this script already imports CV2, the OCR core could be swapped for CV2's facial recognition features, boundary detection, etc. and still achieve the seamless video display from multi-threading.
Tesseract has two additional data sets that can be configured: a fast dataset, and a best dataset.
The fast data will speed up the OCR process, but at the cost of accuracy.
THe best data is trained to produce more accurate detection, but at the cost of speed.
Questions and bugs can be posted on the project's github page or emailed to [email protected]