Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2552

CrawlDbReader -topN fails

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.15
    • Fix Version/s: 1.15
    • Component/s: crawldb
    • Labels:
      None

      Description

      % bin/nutch readdb crawldb -topN 50 crawldb_topn
      CrawlDb topN: starting (topN=50, min=0.0)
      CrawlDb db: crawl/crawldb
      CrawlDb topN: collecting topN scores.
      CrawlDbReader job did not succeed, job status:FAILED, reason: NA
      Exception in thread "main" java.lang.RuntimeException: CrawlDbReader job did not succeed, job status:FAILED, reason: NA
              at org.apache.nutch.crawl.CrawlDbReader.processTopNJob(CrawlDbReader.java:853)
      

      The hadoop.log shows the reason

      2018-04-09 10:04:16,435 WARN  mapred.LocalJobRunner - job_local1653923841_0002
      java.lang.Exception: java.lang.NumberFormatException: null
              at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
      Caused by: java.lang.NumberFormatException: null
              at java.lang.Integer.parseInt(Integer.java:542)
              at java.lang.Integer.parseInt(Integer.java:615)
              at org.apache.nutch.crawl.CrawlDbReader$CrawlDbTopNReducer.setup(CrawlDbReader.java:370)
              at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
              at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
      

      Caused by NUTCH-2375: the property mapred.job.reduces must be updated by mapreduce.job.reduces.

      Note: Should check all occurrences of this property and similars ones (mapred.job.*).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                snagel Sebastian Nagel
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: