Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2749

OCR on PDFs should "just work" out of the box

    XMLWordPrintableJSON

    Details

    • Type: Task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      There are now two different ways (with various parameters) to trigger OCR on inline images within PDFs. The user has to 1) understand that these are available and then 2) elect to turn one of those on.

      I think we should make OCR'ing on PDFs "just work" perhaps with a hybrid strategy between the 2 options. Users should still be allowed to configure as they wish, of course.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tallison@apache.org Tim Allison
              • Votes:
                4 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: