Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2354

Upgrade Hadoop dependencies to 2.7.4

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.12
    • 1.14
    • injector
    • None
    • Patch Available
    • Important

    Description

      This wednesday we experienced trouble running the 1.12 injector on Hadoop 2.7.3. We operated 2.7.2 before and we had no trouble running a job.

      2017-01-18 15:36:53,005 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
      	at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:216)
      	at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:100)
      	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
      	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
      	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
      	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
      Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
              at org.apache.nutch.crawl.Injector.inject(Injector.java:383)
              at org.apache.nutch.crawl.Injector.run(Injector.java:467)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
              at org.apache.nutch.crawl.Injector.main(Injector.java:441)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
      

      Our processes retried injecting for a few minutes until we manually shut it down. Meanwhile on HDFS, our CrawlDB was gone, thanks for snapshots and/or backups we could restore it, so enable those if you haven't done so yet.

      These freak Hadoop errors can be notoriously difficult to debug but it seems we are in luck, recompile Nutch with Hadoop 2.7.3 instead 2.4.0. You are also in luck if your job file uses the old org.hadoop.mapred.* API, only jobs using the org.hadoop.mapreduce.* API seem to fail.

      Attachments

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              markus17 Markus Jelsma
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: