Uploaded image for project: 'Apache Gora'
  1. Apache Gora
  2. GORA-476

Nutch 2.X GeneratorJob creates NullPointerException when using DataFileAvroStore

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.1
    • Fix Version/s: 0.9
    • Component/s: avro, gora-core
    • Labels:
      None

      Description

      When running the Nuth 2.X GeneratorJob I get the following

      2016-05-12 17:27:30,191 INFO  crawl.GeneratorJob - GeneratorJob: starting
      2016-05-12 17:27:30,191 INFO  crawl.GeneratorJob - GeneratorJob: filtering: false
      2016-05-12 17:27:30,191 INFO  crawl.GeneratorJob - GeneratorJob: normalizing: false
      2016-05-12 17:27:30,191 INFO  crawl.GeneratorJob - GeneratorJob: topN: 50000
      2016-05-12 17:27:30,319 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      2016-05-12 17:27:30,333 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
      2016-05-12 17:27:30,334 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000
      2016-05-12 17:27:30,334 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
      2016-05-12 17:27:31,012 WARN  conf.Configuration - file:/tmp/hadoop-lmcgibbn/mapred/staging/lmcgibbn997854508/.staging/job_local997854508_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
      2016-05-12 17:27:31,014 WARN  conf.Configuration - file:/tmp/hadoop-lmcgibbn/mapred/staging/lmcgibbn997854508/.staging/job_local997854508_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
      2016-05-12 17:27:31,091 WARN  conf.Configuration - file:/tmp/hadoop-lmcgibbn/mapred/local/localRunner/lmcgibbn/job_local997854508_0001/job_local997854508_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
      2016-05-12 17:27:31,094 WARN  conf.Configuration - file:/tmp/hadoop-lmcgibbn/mapred/local/localRunner/lmcgibbn/job_local997854508_0001/job_local997854508_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
      2016-05-12 17:27:31,309 INFO  crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
      2016-05-12 17:27:31,309 INFO  crawl.AbstractFetchSchedule - defaultInterval=2592000
      2016-05-12 17:27:31,309 INFO  crawl.AbstractFetchSchedule - maxInterval=7776000
      2016-05-12 17:27:31,381 WARN  mapred.LocalJobRunner - job_local997854508_0001
      java.lang.Exception: java.lang.NullPointerException
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
      Caused by: java.lang.NullPointerException
      	at org.apache.nutch.util.TableUtil.unreverseUrl(TableUtil.java:88)
      	at org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:51)
      	at org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:1)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
      	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      2016-05-12 17:27:32,107 ERROR crawl.GeneratorJob - GeneratorJob: java.lang.RuntimeException: job failed: name=[test]generate: 1463099249-21154, jobid=job_local997854508_0001
      	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
      	at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:232)
      	at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:272)
      	at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:343)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:351)
      

        Attachments

          Activity

            People

            • Assignee:
              lewismc Lewis John McGibbney
              Reporter:
              lewismc Lewis John McGibbney
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: