tesseract
tesseract
tesseract is famous open source OCR library. The original is made by HP.
Install
brew is the easiest way to install tesseract
brew install tesseract
How to use
tesseract imagefile outputfile
imagefile is image file path.
Basically,
If you want to output console, use following command
tessearact imagefile stdout
pytesseract
Original tesseract is made by C, C++.
Some people create wrapper for tesseract.
For python user, we have 2 choices.
- Call tesseract command by python
- Use pytesseract
In this entry, I would like to introduce pytesseract
We can use pip to use pytessearct.
If you don’t have pip
For me, I use pip3
pip3 install pillow pip3 install pytesseract
pillow
Simple Sample
from PIL import Image import pytesseract print(pytesseract.image_to_string(Image.open("image.jpg")))