Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2290

PDFParser 'ocr' properties cannot be set via headers when using Tika JAXRS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.13, 1.14
    • Fix Version/s: 2.0, 1.15
    • Component/s: ocr, parser
    • Labels:
      None

      Description

      I have created a stackoverflow question on this topic here , but I'll reiterate the main issue.

      I am trying to use TikaJAXRS and add headers for setting PDFParser properties. Specifically the ocrStrategy property. However, when I add the header using X-Tika-PDFocrStrategy, I get an error stating that it is an invalid X-Tika-OCR header.

      After looking into the source code, I believe the issue might be with the 'fillParseContext' method in the TikaResource.java file.

      The if statement first looks for a key that starts with the OCR header prefix, and since the PDFParser's property name contains 'ocr', it is trying to find a property named 'ocrStrategy' in the OCRParser class, which doesn't exist.

        Attachments

          Activity

            People

            • Assignee:
              tallison Tim Allison
              Reporter:
              koberlag Kevin Oberlag
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: