tesseract

tesseract

tesseract is famous open source OCR library. The original is made by HP.

Install

brew is the easiest way to install tesseract

brew install tesseract

How to use

tesseract imagefile outputfile

imagefile is image file path.
Basically,
If you want to output console, use following command

tessearact imagefile stdout

pytesseract

Original tesseract is made by C, C++.
Some people create wrapper for tesseract.
For python user, we have 2 choices.

  • Call tesseract command by python
  • Use pytesseract

In this entry, I would like to introduce pytesseract
We can use pip to use pytessearct.
If you don’t have pip
For me, I use pip3

pip3 install pillow
pip3 install pytesseract

pillow

Simple Sample

from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open("image.jpg")))