Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10218

Using brace glob pattern in S3N URL causes exception due to Path created with empty string

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.2.1
    • 3.0.1
    • fs/s3
    • None

    Description

      When using a brace glob pattern inside a S3 URL, an exception is thrown because a Path is constructed with the empty string. The simplest reproduction case I've found is:

      $ hadoop fs -ls 's3n://public-read-access-bucket/{foo,bar}'
      ls: Can not create a Path from an empty string
      

      It does not seem to make a difference whether any file exists that match the pattern. The problem only seems to affect buckets with public read access. The private buckets tried seem to work fine. When running through a Hadoop step, the following backtrace was produced:

      Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
      	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
      	at org.apache.hadoop.fs.Path.<init>(Path.java:90)
      	at org.apache.hadoop.fs.Path.<init>(Path.java:50)
      	at org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:856)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:844)
      	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:904)
      	at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1082)
      	at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1025)
      	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:989)
      	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:215)
      	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
      	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1017)
      	at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1034)
      	at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:174)
      	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:952)
      	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:905)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:415)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
      	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:905)
      	at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
      	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
      	at rubydoop.RubydoopJobRunner.run(RubydoopJobRunner.java:29)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
      	at rubydoop.RubydoopJobRunner.main(RubydoopJobRunner.java:74)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:606)
      	at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
      

      Furthermore, interestingly, the following works:

      $ hadoop fs -ls 's3n://public-read-access-bucket/{foo/,bar/}{baz,qux}'
      

      but this fails:

      $ hadoop fs -ls 's3n://public-read-access-bucket/{foo,bar}/{baz,qux}'
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            bjorne Björn Ramberg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: