Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-5048 Upgrade to Tika 1.15 version
  3. OAK-6414

Use Tika config to determine non indexed mimeTypes

    XMLWordPrintableJSON

    Details

    • Type: Technical task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.4, 1.8.0, 1.6.18
    • Component/s: lucene
    • Labels:
      None

      Description

      With OAK-2895 support was added to avoid loading of binary content whose mimeType have been excluded from indexing via configuring EmptyParser against them. That approach used a lazyInputStream and relied on the fact that Tika would not access the stream if none of the parser is going to touch that file.

      However as seen while upgrading to Tika 1.15 now Tika would check that the InputStream support marking or not.

      To support this change we need to change the logic on Oak side to explicit check by reading tika-config.xml to see which all mimeType have been configured with EmptyParser

        Attachments

          Activity

            People

            • Assignee:
              chetanm Chetan Mehrotra
              Reporter:
              chetanm Chetan Mehrotra
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: