Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1887

Specify HTMLMapper to use in TikaParser

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.9
    • 1.10
    • parser
    • None
    • Patch Available

    Description

      The TikaParser currently relies on the default HTMLMapper used by Tika. The HTMLMapper is used in Tika to filter / normalise the HTML elements passed as SAX events. By default it uses a DefaultHtmlMapper which removes some of the input.

      This patch allows to specify which HTMLMapper implementation to use (if any).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            jnioche Julien Nioche
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment