Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2552

CrawlDbReader -topN fails

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.15
    • 1.15
    • crawldb
    • None

    Description

      % bin/nutch readdb crawldb -topN 50 crawldb_topn
      CrawlDb topN: starting (topN=50, min=0.0)
      CrawlDb db: crawl/crawldb
      CrawlDb topN: collecting topN scores.
      CrawlDbReader job did not succeed, job status:FAILED, reason: NA
      Exception in thread "main" java.lang.RuntimeException: CrawlDbReader job did not succeed, job status:FAILED, reason: NA
              at org.apache.nutch.crawl.CrawlDbReader.processTopNJob(CrawlDbReader.java:853)
      

      The hadoop.log shows the reason

      2018-04-09 10:04:16,435 WARN  mapred.LocalJobRunner - job_local1653923841_0002
      java.lang.Exception: java.lang.NumberFormatException: null
              at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
      Caused by: java.lang.NumberFormatException: null
              at java.lang.Integer.parseInt(Integer.java:542)
              at java.lang.Integer.parseInt(Integer.java:615)
              at org.apache.nutch.crawl.CrawlDbReader$CrawlDbTopNReducer.setup(CrawlDbReader.java:370)
              at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
              at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
      

      Caused by NUTCH-2375: the property mapred.job.reduces must be updated by mapreduce.job.reduces.

      Note: Should check all occurrences of this property and similars ones (mapred.job.*).

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: