Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2413

Parsing fetcher to respect property "parse.filter.urls"

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14
    • fetcher, parser
    • None
    • Apache Nutch release 1.13.

    Description

      In a situation when we want to:
      (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
      (2) Avoid applying the URL filters when executing this phase.

      Condition (2) can be configured when parsing is executed as a separate process by setting "parse.filter.urls" to "false".
      However, this setting ("parse.filter.urls") is ignored when we execute the fetch and parse phases together.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            snagel Sebastian Nagel
            maborec Marcos Bori
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment