Details
Description
We have configured some MimeTypes in the tika.config File as following ...
<parser name="parse-office" class="org.apache.tika.parser.microsoft.OfficeParser"> <mime>application/msword</mime> <mime>application/vnd.ms-excel</mime> <mime>application/msexcel</mime> <mime>application/vnd.ms-powerpoint</mime> </parser>
As we have old Excel Files with Mimetype (application/msexcel) it should be parsed with the OfficeParser. Tika internally converts (normalizse) this MimeType with the MediaTypeRegistry to application/vnd.ms-excel.
The NodeIndexer should also use the normalized MediaType in #isSupportedMediaType(String type)
Otherwise the old MimeTypes will not be indexed anymore.