Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2413

Parsing fetcher to respect property "parse.filter.urls"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.13
    • 1.14
    • fetcher, parser
    • None
    • Apache Nutch release 1.13.

    Description

      In a situation when we want to:
      (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
      (2) Avoid applying the URL filters when executing this phase.

      Condition (2) can be configured when parsing is executed as a separate process by setting "parse.filter.urls" to "false".
      However, this setting ("parse.filter.urls") is ignored when we execute the fetch and parse phases together.

      Attachments

        Activity

          People

            snagel Sebastian Nagel
            maborec Marcos Bori
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: