Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-937

When nutch is run on hadoop > 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.2
    • 1.4, nutchgora
    • build
    • None
    • hadoop 0.21 or cloudera hadoop 0.20.2+737

    • Patch Available

    Description

      Jobs running in on hadoop 0.21 or cloudera cdh 0.20.2+737 will fail because of missing plugins (i.e.):

      10/10/28 12:22:21 WARN mapred.JobClient: Use GenericOptionsParser for
      parsing the arguments. Applications should implement Tool for the same.
      10/10/28 12:22:22 INFO mapred.FileInputFormat: Total input paths to
      process : 1
      10/10/28 12:22:23 INFO mapred.JobClient: Running job: job_201010271826_0002
      10/10/28 12:22:24 INFO mapred.JobClient: map 0% reduce 0%
      10/10/28 12:22:39 INFO mapred.JobClient: Task Id :
      attempt_201010271826_0002_m_000000_0, Status : FAILED
      java.lang.RuntimeException: Error in configuring object
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
      at
      org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
      at
      org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:379)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at
      org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      at org.apache.hadoop.mapred.Child.main(Child.java:211)
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
      ... 9 more
      Caused by: java.lang.RuntimeException: Error in configuring object
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
      at
      org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
      at
      org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
      at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
      ... 14 more
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
      ... 17 more
      Caused by: java.lang.RuntimeException: x point
      org.apache.nutch.net.URLNormalizer not found.
      at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
      at
      org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
      ... 22 more
      10/10/28 12:22:40 INFO mapred.JobClient: Task Id :
      attempt_201010271826_0002_m_000001_0, Status : FAILED
      java.lang.RuntimeException: Error in configuring object
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
      at
      org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
      at
      org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
      at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:379)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
      at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:396)
      at
      org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      at org.apache.hadoop.mapred.Child.main(Child.java:211)
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
      ... 9 more
      Caused by: java.lang.RuntimeException: Error in configuring object
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
      at
      org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
      at
      org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
      at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
      ... 14 more
      Caused by: java.lang.reflect.InvocationTargetException
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at
      sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at
      sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at
      org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
      ... 17 more
      Caused by: java.lang.RuntimeException: x point
      org.apache.nutch.net.URLNormalizer not found.
      at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:122)
      at
      org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
      ... 22 more

      The bug is due to MAPREDUCE-967 (part of hadoop 0.21 and cdh 0.20.2+737) which modifies the way MapReduce unpacks the job's jar. The old way was to unpack the whole of it, now only classes/ and lib/ are unpacked. This way nutch is missing the plugins/ directory.

      A workaround is to force unpacking of the plugin/ directory by setting 'mapreduce.job.jar.unpack.pattern' configuration to "(?:classes/|lib/|plugins/).*"

      Attachments

        1. NUTCH-937-v1.patch
          0.5 kB
          Ferdy

        Issue Links

          Activity

            People

              jnioche Julien Nioche
              cmartella Claudio Martella
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: