Description
Currently Nutch doesn't adjust automatically its re-fetch period, no matter if individual pages change seldom or frequently. The goal of these changes is to extend the current codebase to support various possible adjustments to re-fetch times and intervals, and specifically a re-fetch schedule which tries to adapt the period between consecutive fetches to the period of content changes.
Also, these patches implement checking if the content has changed since last fetching; protocol plugins are also changed to make use of this information, so that if content is unmodified it doesn't have to be fetched and processed.
Attachments
Attachments
1.
|
Apply just the applicable portions of the patch to protocol.httpclient.Http.java | Closed | Unassigned |