Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-1689

Parser sort order change in TIKA-1517 breaks parser override capability

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.9
    • Fix Version/s: 1.10
    • Component/s: core
    • Labels:
      None

      Description

      In Tika 1.9, the comparator used to sort parsers (in ServiceLoaderUtils) now returns them in the reverse order from how they were returned in prior versions, when the comparator was in DefaultParser. This work was done under TIKA-1517.

      This change broke one of our customizations in which we use our own parser instead of Tika's HtmlParser to process html. We use the service loader logic (creating our own META-INF/services/org.apache.tika.parser.Parser file) and rely on the order in which the list returned by DefaultParser.getDefaultParsers() is evaluated. Expecting that when Tika builds the map of mime types to parsers it first puts in entries for HtmlParser, then overwrites these with our custom parser.

      I realize relying on this is brittle. And I found a valid workaround to the problem in Tika 1.9 is to blacklist HtmlParser. However, in case this parser ordering change was not intentional, I figured I'd mention it.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              dwarren David Warren
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: