Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-5048 Upgrade to Tika 1.15 version
  3. OAK-6414

Use Tika config to determine non indexed mimeTypes

    XMLWordPrintableJSON

Details

    • Technical task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.7.4, 1.8.0, 1.6.18
    • lucene
    • None

    Description

      With OAK-2895 support was added to avoid loading of binary content whose mimeType have been excluded from indexing via configuring EmptyParser against them. That approach used a lazyInputStream and relied on the fact that Tika would not access the stream if none of the parser is going to touch that file.

      However as seen while upgrading to Tika 1.15 now Tika would check that the InputStream support marking or not.

      To support this change we need to change the logic on Oak side to explicit check by reading tika-config.xml to see which all mimeType have been configured with EmptyParser

      Attachments

        Activity

          People

            chetanm Chetan Mehrotra
            chetanm Chetan Mehrotra
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: