Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1303

Fetcher to skip queues for URLS getting repeated exceptions, based on percentage

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4
    • None
    • fetcher
    • Patch Available

    Description

      as described in https://issues.apache.org/jira/browse/NUTCH-769, it is a good solution to skip queues with high exception value, but it is not easy to set value of fetcher.max.exceptions.per.queue when size of queues are different.
      i suggest that define a ratio instead of value, so if the ratio of exceptions per requests exceeds, then queue cleared.
      also, it is not sufficient to keep fetcher from high exceptions, value of fetcher.throughput.threshold.pages ensures that a valueable throughput of fetch can gained against slow hosts, but it clean all queues not slow queue. i suggest for this one that this factor like fetcher.max.exceptions.per.queue enforce to each queue not all of them.

      Attachments

        1. NUTCH-1303.patch
          4 kB
          behnam nikbakht

        Activity

          People

            Unassigned Unassigned
            behnam.nikbakht behnam nikbakht
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: