Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Throughout our documentation and unit tests, we declare that parsers with a different namespace than org.apache.tika should come first. The problem is that the DefaultParser iterates through the list of parsers and overwrites parsers based on supported mime types.
So, if there's a custom parser com.acme.parser.PDFParser that supports application/pdf, that will be added to the map of parsers in DefaultParser first and then overwritten by org.apache.tika's PDFParser.
We should instead sort non-o.a.t. parsers last, no?