Uploaded image for project: 'Jackrabbit Content Repository'
  1. Jackrabbit Content Repository
  2. JCR-4551

Use the normalized MediaType to check if the given MediaType should be indexed

    XMLWordPrintableJSON

Details

    Description

      We have configured some MimeTypes in the tika.config File as following ...

      <parser name="parse-office" class="org.apache.tika.parser.microsoft.OfficeParser">   
        <mime>application/msword</mime> 
        <mime>application/vnd.ms-excel</mime> 
        <mime>application/msexcel</mime> 
        <mime>application/vnd.ms-powerpoint</mime>
      </parser>
      

      As we have old Excel Files with Mimetype (application/msexcel) it should be parsed with the OfficeParser. Tika internally converts (normalizse) this MimeType with the MediaTypeRegistry to application/vnd.ms-excel.

      The NodeIndexer should also use the normalized MediaType in #isSupportedMediaType(String type)

      Otherwise the old MimeTypes will not be indexed anymore.

      Attachments

        Activity

          People

            c_koell Claus Köll
            c_koell Claus Köll
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: