Using the tesseract CLI tool

Tesseract OCR has a command-line utility which is woefully under-documented. Thanks to Alexandru Nedelcu I figured out how to use it today.

To install on macOS:

brew install tesseract

To convert an image into an annotated PDF (which you can then copy and paste text out of, and which will be correctly indexed by Spotlight):

tesseract image.png output-file -l eng pdf

The second output-file argument there is the path and filename of the output - note that I didn't include a .pdf extension because Tesseract adds that automatically - so the output will be in a file called output-file.pdf.

To get out just the plain text:

tesseract image.png output-file -l eng txt

Created 2021-07-18T09:56:08-07:00 · Edit

Using the tesseract CLI tool

Related