Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2551

NullPointerException in generator

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.15
    • Fix Version/s: 1.15
    • Component/s: generator
    • Labels:
      None

      Description

      A NullPointerException is thrown during the crawl generate stage when I deploy to a hadoop cluster (but for some reason, it works fine locally).

      It looks like this is caused because the URLPartitioner class still has the old configure() method in there (which is never called, causing the normalizers field to remain null), rather than implementing the Configurable interface as detailed in the newer mapreduce API's Partitioner spec.

      Stack trace:

      java.lang.NullPointerException
       at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:76)
       at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:40)
       at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716)
       at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
       at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
       at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:553)
       at org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:546)
       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:422)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
      

       

      Oh and it might also be because a static URLPartitioner instance is being used in the Generator.Selector class... but it's only initialized in the setup() method of the Generator.Selector.SelectorMapper class! So that whole setup looks pretty weird...

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wastl-nagel Sebastian Nagel
                Reporter:
                HansBrende Hans Brende
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: