Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12455

fs.Globber breaks on colon in filename; doesn't use Path's handling for colons



    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • None
    • fs
    • None


      org.apache.hadoop.fs.Globber.glob() breaks when a searched directory
      contains a file whose simple name contains a colon.

      The problem seem to be in the code currently at lines 258 and 257

      256:              // Set the child path based on the parent path.
      257:              child.setPath(new Path(candidate.getPath(),
      258:                      child.getPath().getName()));

      That last line should probably be:

                            new Path(null, null, child.getPath().getName())));

      The bug in the current code is that:
      1) child.getPath().getName() gets the simple name (last segment) of the child Path as a raw string (not necessarily the corresponding relative Path string), and
      2) that raw string is passed as Path(Path, String)'s second argument, which takes a Path string.

      When that raw string contains a colon (e.g., xxx:yyy), it looks like a Path string that specifies a scheme ("xxx") and has a relative path "yyy}"--but that combination isn't allowed, so trying to constructing a Path with it (as Path(Path, String) does inside) throws an exception, aborting the entire glob() call.

      Adding the call to Path(String, String, String) does the equivalent of converting the raw string "xxx:yyy" to the Path string "./xxx:yyy", so the part before the colon is not taken as a scheme.


        1. HADOOP-12455.patch
          2 kB
          Rich Haase

        Issue Links



              rhaase Rich Haase
              dsbos Daniel Barclay
              1 Vote for this issue
              14 Start watching this issue