Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1303

Fetcher to skip queues for URLS getting repeated exceptions, based on percentage

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.4
    • Fix Version/s: None
    • Component/s: fetcher
    • Labels:
    • Patch Info:
      Patch Available

      Description

      as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good solution to skip queues with high exception value, but it is not easy to set value of fetcher.max.exceptions.per.queue when size of queues are different.
      i suggest that define a ratio instead of value, so if the ratio of exceptions per requests exceeds, then queue cleared.
      also, it is not sufficient to keep fetcher from high exceptions, value of fetcher.throughput.threshold.pages ensures that a valueable throughput of fetch can gained against slow hosts, but it clean all queues not slow queue. i suggest for this one that this factor like fetcher.max.exceptions.per.queue enforce to each queue not all of them.

        Attachments

        1. NUTCH-1303.patch
          4 kB
          behnam nikbakht

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              behnam.nikbakht behnam nikbakht
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated: