Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-288

Support override parsers in AutoDetectParser

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 0.4
    • None
    • parser
    • None

    Description

      In some situations, being able to specify an alternative parser is useful even when the general parser framework/full set of parsers is desired.

      For example, when processing HTML documents the current HtmlParser doesn't pass through all of the tags that a vertical crawler might want.

      I'm proposing an alternative constructor, something like:

      public AutoDetectParser(Map<class, Parser>)

      where class would be the class of the standard Tika parser, and Parser is the override.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kkrugler Kenneth William Krugler
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: