Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1284

Add site fetcher.max.crawl.delay as log output by default.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • nutchgora, 1.5
    • 1.7, 2.2
    • fetcher
    • None
    • Patch Available

    Description

      Currently, when manually scanning our log output we cannot infer which pages are governed by a crawl delay between successive fetch attempts of any given page within the site. The value should be made available as something like:

      2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching http://nutch.apache.org/ (crawl.delay=XXXms)
      

      This way we can easily and quickly determine whether the fetcher is having to use this functionality or not.

      Attachments

        1. NUTCH-1284-2.x.v1.patch
          2 kB
          Tejas Patil
        2. NUTCH-1284-trunk.v1.patch
          2 kB
          Tejas Patil
        3. NUTCH-1284.patch
          1 kB
          Tejas Patil

        Issue Links

          Activity

            People

              tejasp Tejas Patil
              lewismc Lewis John McGibbney
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: