Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1907

nutch doesnt run under 0.20.2+228-1~karmic-cdh3b1 version of hadoop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.20.2
    • None
    • tasktracker
    • None
    • Linux 2.6.31-14-server #48-Ubuntu SMP Fri Oct 16 15:07:34 UTC 2009 x86_64 GNU/Linux

    Description

      new versions of hadoop appear to put jars in a different format now, instead of file:/a/b/c/d/job.jar, its now jar:file:/a/b/c/d/job.jar!, which breaks nutch when its trying to load its plugins. Specifically, the stack trace looks like:

      Caused by: java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer not found.
      at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:124)
      at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)

      A simple test class was written the used the URLFilters class, and the following stack trace resulted:

      10/07/01 14:25:25 INFO mapred.JobClient: Task Id : attempt_201006171624_46525_m_000000_1, Status : FAILED
      java.lang.RuntimeException: org.apache.nutch.net.URLFilter not found.
      at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:52)
      at com.maxpoint.crawl.BidSampler$BIdSMapper.setup(BidSampler.java:42)
      at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)

      Running this on an older version of hadoop works.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gonzomatic Robert Gonzalez
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: