Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2533

Injector: NullPointerException if seed URL dir contains non-file entries

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Implemented
    • Affects Version/s: 2.3.1, 1.14
    • Fix Version/s: 2.4, 1.15
    • Component/s: injector
    • Labels:
      None

      Description

      I'm following https://wiki.apache.org/nutch/Nutch2Tutorial

       

      I've run `./nutch inject /` and I've got the following error:

      InjectorJob: starting at 2018-03-12 11:59:05
      InjectorJob: Injecting urlDir: /
      InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class.
      InjectorJob: java.lang.NullPointerException
      at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:442)
      at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:411)
      at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
      at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
      at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
      at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
      at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
      at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:115)
      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:231)
      at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:252)
      at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:275)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        Attachments

          Activity

            People

            • Assignee:
              wastl-nagel Sebastian Nagel
              Reporter:
              krzysztofmadejski Krzysztof Madejski
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: