Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.14
    • Fix Version/s: 1.15
    • Component/s: ocr
    • Labels:
      None

      Description

      From https://github.com/apache/tika/pull/177 by Rafael Ferreira

      Extend support for increased PSM options up to 13 for modern versions of Tesseract.

      $ tesseract --version
      tesseract 3.05.00
       leptonica-1.74.1
        libjpeg 8d : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.8
      
      $ tesseract --help-psm
      Page segmentation modes:
        0    Orientation and script detection (OSD) only.
        1    Automatic page segmentation with OSD.
        2    Automatic page segmentation, but no OSD, or OCR.
        3    Fully automatic page segmentation, but no OSD. (Default)
        4    Assume a single column of text of variable sizes.
        5    Assume a single uniform block of vertically aligned text.
        6    Assume a single uniform block of text.
        7    Treat the image as a single text line.
        8    Treat the image as a single word.
        9    Treat the image as a single word in a circle.
       10    Treat the image as a single character.
       11    Sparse text. Find as much text as possible in no particular order.
       12    Sparse text with OSD.
       13    Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.
      

        Attachments

          Activity

            People

            • Assignee:
              davemeikle Dave Meikle
              Reporter:
              davemeikle Dave Meikle
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: