Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-769

Fetcher to skip queues for URLS getting repeated exceptions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.1
    • fetcher
    • None
    • Patch Available

    Description

      As discussed on the mailing list (see http://www.mail-archive.com/nutch-user@lucene.apache.org/msg15360.html) this patch allows to clear URLs queues in the Fetcher when more than a set number of exceptions have been encountered in a row. This can speed up the fetching substantially in cases where target hosts are not responsive (as a TimeoutException would be thrown) and limits cases where a whole Fetch step is slowed down because of a few queues.

      by default the parameter fetcher.max.exceptions.per.queue has a value of -1 and is deactivated.

      Attachments

        1. NUTCH-769.patch
          3 kB
          Julien Nioche
        2. NUTCH-769-2.patch
          4 kB
          Julien Nioche

        Issue Links

          Activity

            People

              ab Andrzej Bialecki
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: