Description
An overview about fetch times and fetch intervals could be useful to configure a crawl. CrawlDbReader could easily calculate min, max and average and show it as part of the statistics job (command-line option -stats):
% bin/nutch readdb .../crawldb/ -stats ... TOTAL urls: 544910 shortest fetch interval: 7 days, 00:00:00 avg fetch interval: 7 days, 17:43:58 longest fetch interval: 10 days, 12:00:00 earliest fetch time: Wed May 25 11:42:00 CEST 2016 avg of fetch times: Sun Jun 05 18:11:00 CEST 2016 latest fetch time: Wed Jun 22 10:25:00 CEST 2016 ...
Attachments
Issue Links
- links to