Details
Description
Currently, when manually scanning our log output we cannot infer which pages are governed by a crawl delay between successive fetch attempts of any given page within the site. The value should be made available as something like:
2012-02-19 12:33:33,031 INFO fetcher.Fetcher - fetching http://nutch.apache.org/ (crawl.delay=XXXms)
This way we can easily and quickly determine whether the fetcher is having to use this functionality or not.
Attachments
Attachments
Issue Links
- incorporates
-
NUTCH-1042 Fetcher.max.crawl.delay property not taken into account correctly when set to -1
- Closed