Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-61

Adaptive re-fetch interval. Detecting umodified content

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.0.0
    • fetcher
    • None

    Description

      Currently Nutch doesn't adjust automatically its re-fetch period, no matter if individual pages change seldom or frequently. The goal of these changes is to extend the current codebase to support various possible adjustments to re-fetch times and intervals, and specifically a re-fetch schedule which tries to adapt the period between consecutive fetches to the period of content changes.

      Also, these patches implement checking if the content has changed since last fetching; protocol plugins are also changed to make use of this information, so that if content is unmodified it doesn't have to be fetched and processed.

      Attachments

        1. 20050606.diff
          73 kB
          Andrzej Bialecki
        2. 20051230.txt
          57 kB
          Andrzej Bialecki
        3. 20060227.txt
          47 kB
          Andrzej Bialecki
        4. nutch-61-417287.patch
          54 kB
          Andrzej Bialecki
        5. nutch-61-492176.patch
          68 kB
          Armel Nene
        There are no Sub-Tasks for this issue.

        Activity

          People

            ab Andrzej Bialecki
            ab Andrzej Bialecki
            Votes:
            19 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: