Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2494

Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.14
    • 1.15
    • fetcher, parser
    • None
      • AWS EMR Cluster
      • AWS S3
      • Hadoop 2.2.7

    Description

      We are using nutch 1.14 in AWS EMR Cluster (Hadoop 2.2.7). trying to use S3 as main storage.

      We are using the below command.

      bin/crawl -s s3://nutch-emr-cluster/test/crawl/urls s3://nutch-emr-cluster/test/crawl 1
      

      Injector and Generator completed successfully without any error and data written perfectly into S3. But in the Fetcher and Parser steps we are getting IllegalArgumentException

      Full stacktrace

      18/01/11 07:16:52 ERROR fetcher.Fetcher: Fetcher: java.lang.IllegalArgumentException: Wrong FS: s3://nutch-emr-cluster/test/crawl/segments/20180111071602/crawl_fetch, expected: hdfs://ip-172-31-26-180.eu-west-1.compute.internal:8020
      	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:653)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
      	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
      	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
      	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1430)
      	at org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs(FetcherOutputFormat.java:55)
      	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:268)
      	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
      	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
      	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
      	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
      	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
      	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
      	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
      	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
      	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
      	at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:486)
      	at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:521)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:495)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
      
      

      Attachments

        1. NUTCH-2494.patch
          3 kB
          Ashraful Islam

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              ashraful Ashraful Islam
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: