Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-61

Adaptive re-fetch interval. Detecting umodified content

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: fetcher
    • Labels:
      None

      Description

      Currently Nutch doesn't adjust automatically its re-fetch period, no matter if individual pages change seldom or frequently. The goal of these changes is to extend the current codebase to support various possible adjustments to re-fetch times and intervals, and specifically a re-fetch schedule which tries to adapt the period between consecutive fetches to the period of content changes.

      Also, these patches implement checking if the content has changed since last fetching; protocol plugins are also changed to make use of this information, so that if content is unmodified it doesn't have to be fetched and processed.

        Attachments

        1. 20050606.diff
          73 kB
          Andrzej Bialecki
        2. 20051230.txt
          57 kB
          Andrzej Bialecki
        3. 20060227.txt
          47 kB
          Andrzej Bialecki
        4. nutch-61-417287.patch
          54 kB
          Andrzej Bialecki
        5. nutch-61-492176.patch
          68 kB
          Armel Nene

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              ab Andrzej Bialecki
            • Votes:
              19 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved: