Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-9774

RawLocalFileSystem.listStatus() return absolute paths when input path is relative on Windows

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0-beta, 3.0.0-alpha1
    • Fix Version/s: 2.1.1-beta
    • Component/s: fs
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      On Windows, when using RawLocalFileSystem.listStatus() to enumerate a relative path (without drive spec), e.g., "file:///mydata", the resulting paths become absolute paths, e.g., ["file://E:/mydata/t1.txt", "file://E:/mydata/t2.txt"...].
      Note that if we use it to enumerate an absolute path, e.g., "file://E:/mydata" then the we get the same results as above.

      This breaks some hive unit tests which uses local file system to simulate HDFS when testing, therefore the drive spec is removed. Then after listStatus() the path is changed to absolute path, hive failed to find the path in its map reduce job.

      You'll see the following exception:
      [junit] java.io.IOException: cannot find dir = pfile:/E:/GitHub/hive-monarch/build/ql/test/data/warehouse/src/kv1.txt in pathToPartitionInfo: [pfile:/GitHub/hive-monarch/build/ql/test/data/warehouse/src]
      [junit] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:298)

      This problem is introduced by this JIRA:
      HADOOP-8962

      Prior to the fix for HADOOP-8962 (merged in 0.23.5), the resulting paths are relative paths if the parent paths are relative, e.g., ["file:///mydata/t1.txt", "file:///mydata/t2.txt"...]

      This behavior change is a side effect of the fix in HADOOP-8962, not an intended change. The resulting behavior, even though is legitimate from a function point of view, break consistency from the caller's point of view. When the caller use a relative path (without drive spec) to do listStatus() the resulting path should be relative. Therefore, I think this should be fixed.

        Attachments

        1. HADOOP-9774.patch
          0.9 kB
          shanyu zhao
        2. HADOOP-9774-2.patch
          5 kB
          shanyu zhao
        3. HADOOP-9774-3.patch
          6 kB
          shanyu zhao
        4. HADOOP-9774-4.patch
          5 kB
          shanyu zhao
        5. HADOOP-9774-5.patch
          5 kB
          Ivan Mitic

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment