Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3264

Improve the per page OCR heuristics for AUTO mode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0
    • None
    • None
    • None

    Description

      We're currently using character count per page as the sole reason to run OCR in AUTO mode on PDFs.

      Let's use this issue to discuss better options.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: