Description
% bin/nutch readdb crawl/crawldb -stats -sort ... status 1 (db_unfetched): 3 nutch.apache.org : 3 status 2 (db_fetched): 2 nutch.apache.org : 2 status 6 (db_notmodified): 34 nutch.apache.org : 34 CrawlDb statistics: done % bin/nutch updatehostdb -hostdb crawl/hostdb -crawldb crawl/crawldb UpdateHostDb: hostdb: crawl/hostdb UpdateHostDb: crawldb: crawl/crawldb UpdateHostDb: starting at 2018-04-23 13:50:33 UpdateHostDb: finished at 2018-04-23 13:50:35, elapsed: 00:00:01 % bin/nutch readhostdb crawl/hostdb -get nutch.apache.org ReadHostDb: get: nutch.apache.org 0 0 0 0 0 0 0 0 0 0 0.0 1970-01-01 01:00:00
Although a HostDb record is added for "nutch.apache.org", all expected values (number of fetched/unfetched/... pages, fetch time min/max/average/percentiles, etc.) are empty or zero.
Attachments
Issue Links
- is caused by
-
NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce
- Closed
- links to