[NUTCH-2552] CrawlDbReader -topN fails - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.15
Fix Version/s: 1.15
Component/s: crawldb
Labels:
None

Description

% bin/nutch readdb crawldb -topN 50 crawldb_topn
CrawlDb topN: starting (topN=50, min=0.0)
CrawlDb db: crawl/crawldb
CrawlDb topN: collecting topN scores.
CrawlDbReader job did not succeed, job status:FAILED, reason: NA
Exception in thread "main" java.lang.RuntimeException: CrawlDbReader job did not succeed, job status:FAILED, reason: NA
        at org.apache.nutch.crawl.CrawlDbReader.processTopNJob(CrawlDbReader.java:853)

The hadoop.log shows the reason

2018-04-09 10:04:16,435 WARN  mapred.LocalJobRunner - job_local1653923841_0002
java.lang.Exception: java.lang.NumberFormatException: null
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.NumberFormatException: null
        at java.lang.Integer.parseInt(Integer.java:542)
        at java.lang.Integer.parseInt(Integer.java:615)
        at org.apache.nutch.crawl.CrawlDbReader$CrawlDbTopNReducer.setup(CrawlDbReader.java:370)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)

Caused by ~~NUTCH-2375~~: the property mapred.job.reduces must be updated by mapreduce.job.reduces.

Note: Should check all occurrences of this property and similars ones (mapred.job.*).

Attachments

Issue Links

is caused by

NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce

Closed

links to

GitHub Pull Request #315

Activity

People

Assignee:: Sebastian Nagel

Reporter:: Sebastian Nagel

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Apr/18 08:20

Updated:: 01/Oct/19 14:29

Resolved:: 21/Apr/18 16:24