Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5805

problem using top level s3 buckets as input/output directories

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.18.3
    • 0.21.0
    • fs/s3
    • None
    • ec2, cloudera AMI, 20 nodes

    • Reviewed

    Description

      When I specify top level s3 buckets as input or output directories, I get the following exception.

      hadoop jar subject-map-reduce.jar s3n://infocloud-input s3n://infocloud-output

      java.lang.IllegalArgumentException: Path must be absolute: s3n://infocloud-output
      at org.apache.hadoop.fs.s3native.NativeS3FileSystem.pathToKey(NativeS3FileSystem.java:246)
      at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:319)
      at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
      at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:109)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:738)
      at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
      at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.run(SubjectMRDriver.java:63)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at com.evri.infocloud.prototype.subjectmapreduce.SubjectMRDriver.main(SubjectMRDriver.java:25)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
      at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

      The workaround is to specify input/output buckets with sub-directories:

      hadoop jar subject-map-reduce.jar s3n://infocloud-input/input-subdir s3n://infocloud-output/output-subdir

      Attachments

        1. HADOOP-5805-0.patch
          1 kB
          Ian Nowland
        2. HADOOP-5805-1.patch
          1 kB
          Ian Nowland
        3. HADOOP-5805-2.patch
          1 kB
          Thomas White

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nowland Ian Nowland
            arunxarun Arun Jacob
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment