[TIKA-3750] Bug in sorting parsers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.4.1
Component/s: None
Labels:
None

Description

Throughout our documentation and unit tests, we declare that parsers with a different namespace than org.apache.tika should come first. The problem is that the DefaultParser iterates through the list of parsers and overwrites parsers based on supported mime types.

So, if there's a custom parser com.acme.parser.PDFParser that supports application/pdf, that will be added to the map of parsers in DefaultParser first and then overwritten by org.apache.tika's PDFParser.

We should instead sort non-o.a.t. parsers last, no?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tim Allison

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/May/22 16:00

Updated:: 04/May/22 21:06

Resolved:: 04/May/22 19:32