Description
% bin/nutch readdb crawldb -topN 50 crawldb_topn CrawlDb topN: starting (topN=50, min=0.0) CrawlDb db: crawl/crawldb CrawlDb topN: collecting topN scores. CrawlDbReader job did not succeed, job status:FAILED, reason: NA Exception in thread "main" java.lang.RuntimeException: CrawlDbReader job did not succeed, job status:FAILED, reason: NA at org.apache.nutch.crawl.CrawlDbReader.processTopNJob(CrawlDbReader.java:853)
The hadoop.log shows the reason
2018-04-09 10:04:16,435 WARN mapred.LocalJobRunner - job_local1653923841_0002 java.lang.Exception: java.lang.NumberFormatException: null at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.NumberFormatException: null at java.lang.Integer.parseInt(Integer.java:542) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.nutch.crawl.CrawlDbReader$CrawlDbTopNReducer.setup(CrawlDbReader.java:370) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
Caused by NUTCH-2375: the property mapred.job.reduces must be updated by mapreduce.job.reduces.
Note: Should check all occurrences of this property and similars ones (mapred.job.*).
Attachments
Issue Links
- is caused by
-
NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce
- Closed
- links to