Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3431

Using any setting other than AUTO or NO_OCR for X-Tika-PDFOcrStrategy causes remarkable performance loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Not A Bug
    • 1.26
    • None
    • tika-server
    • None

    Description

      When processing PDF document to the local Tika server using PUT request to endpoint http://localhost:9998/tika.  If the PDFOcrStrategy is set to anything other than AUTO or NO_OCR, this causes extreme slowdown in processing of the PDF file.  

       

      It doesn't matter if the PDF document has inline images or not, the slowdown happens regardless.

      Attachments

        1. wiki.pdf
          194 kB
          Sal

        Activity

          People

            Unassigned Unassigned
            sallas Sal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: