Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2749

OCR on PDFs should "just work" out of the box

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      There are now two different ways (with various parameters) to trigger OCR on inline images within PDFs. The user has to 1) understand that these are available and then 2) elect to turn one of those on.

      I think we should make OCR'ing on PDFs "just work" perhaps with a hybrid strategy between the 2 options. Users should still be allowed to configure as they wish, of course.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tallison Tim Allison
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: