Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
1.14
-
None
Description
From https://github.com/apache/tika/pull/177 by Rafael Ferreira
Extend support for increased PSM options up to 13 for modern versions of Tesseract.
$ tesseract --version tesseract 3.05.00 leptonica-1.74.1 libjpeg 8d : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.8 $ tesseract --help-psm Page segmentation modes: 0 Orientation and script detection (OSD) only. 1 Automatic page segmentation with OSD. 2 Automatic page segmentation, but no OSD, or OCR. 3 Fully automatic page segmentation, but no OSD. (Default) 4 Assume a single column of text of variable sizes. 5 Assume a single uniform block of vertically aligned text. 6 Assume a single uniform block of text. 7 Treat the image as a single text line. 8 Treat the image as a single word. 9 Treat the image as a single word in a circle. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order. 12 Sparse text with OSD. 13 Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
Attachments
Issue Links
- is related to
-
TIKA-2696 Support output of Tesseract OSD output for psm mode 0
- Resolved