Hadoop Common
  1. Hadoop Common
  2. HADOOP-8139

Path does not allow metachars to be escaped

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: 0.23.3, 3.0.0
    • Fix Version/s: None
    • Component/s: fs
    • Labels:
      None

      Description

      Path converts "\" into "/", probably for windows support? This means it's impossible for the user to escape metachars in a path name. Glob expansion can have deadly results.

      Here are the most egregious examples. A user accidentally creates a path like "/user/me/*/file". Now they want to remove it.

      "hadoop fs -rmr -skipTrash '/user/me/\*'" becomes...
      "hadoop fs -rmr -skipTrash /user/me/*"
      • User/Admin: Nuked their home directory or any given directory
      "hadoop fs -rmr -skipTrash '\*'" becomes...
      "hadoop fs -rmr -skipTrash /*"
      • User: Deleted everything they have access to on the cluster
      • Admin: Nukes the entire cluster

      Note: FsShell is shown for illustrative purposes, however the problem is in the Path object, not FsShell.

      1. HADOOP-8139.patch
        6 kB
        Daryn Sharp
      2. HADOOP-8139.patch
        4 kB
        Daryn Sharp
      3. HADOOP-8139-2.patch
        6 kB
        Daryn Sharp
      4. HADOOP-8139-3.patch
        7 kB
        Daryn Sharp
      5. HADOOP-8139-4.patch
        12 kB
        Daryn Sharp
      6. HADOOP-8139-5.patch
        10 kB
        Daryn Sharp
      7. HADOOP-8139-6.patch
        5 kB
        Daryn Sharp

        Issue Links

          Activity

          Hide
          Suresh Srinivas added a comment -

          Daryn, are you referring to Path#normalizePath() where "
          " is replaced by "/"?

          Show
          Suresh Srinivas added a comment - Daryn, are you referring to Path#normalizePath() where " " is replaced by "/"?
          Hide
          Suresh Srinivas added a comment -
          "\\" is replaced by "/"
          
          Show
          Suresh Srinivas added a comment - "\\" is replaced by "/"
          Hide
          Daryn Sharp added a comment -

          That appears to be the culprit, but I didn't investigate deeper. A user reported blowing away all the directory contents while trying to delete a sub-directory erroneously named "*". I initially thought it was due to improper quoting by the user, until I discovered what Path does.

          Show
          Daryn Sharp added a comment - That appears to be the culprit, but I didn't investigate deeper. A user reported blowing away all the directory contents while trying to delete a sub-directory erroneously named "*". I initially thought it was due to improper quoting by the user, until I discovered what Path does.
          Hide
          Daryn Sharp added a comment -

          I'm hoping someone with more FileSystem expertise and historical knowledge will fix this bug. I'm unclear if/how we can use \ and / inter-changably. I'd think we should just remove it.

          Show
          Daryn Sharp added a comment - I'm hoping someone with more FileSystem expertise and historical knowledge will fix this bug. I'm unclear if/how we can use \ and / inter-changably. I'd think we should just remove it.
          Hide
          Doug Cutting added a comment -

          I suspect that simply removing this will break things on Windows. Windows directory listings return paths with backslashes as the directory separator, but URIs expect slashes as directory separators, and Java on Windows permits slashes as directory separators.

          An approach to fixing this might be to change RawLocalFileSystem#listStatus() to replace backslash with slash on Windows, then remove this in Path#normalizePath(). Slash is reserved in Windows path names, so this should be safe there.

          There are probably other places that would need fixing too. On Windows, paths from the classpath and environment will contain backslashes that confuse the directory parsing of Path and URI. So we'd need to find each place such a path enters and perform substitution there.

          Show
          Doug Cutting added a comment - I suspect that simply removing this will break things on Windows. Windows directory listings return paths with backslashes as the directory separator, but URIs expect slashes as directory separators, and Java on Windows permits slashes as directory separators. An approach to fixing this might be to change RawLocalFileSystem#listStatus() to replace backslash with slash on Windows, then remove this in Path#normalizePath(). Slash is reserved in Windows path names, so this should be safe there. There are probably other places that would need fixing too. On Windows, paths from the classpath and environment will contain backslashes that confuse the directory parsing of Path and URI. So we'd need to find each place such a path enters and perform substitution there.
          Hide
          Daryn Sharp added a comment -

          I too had thought about RawLocalFileSystem being the place to hide the translation. Does windows resolve paths with forward slashes? If not, there are interesting edge cases like paths containing "useless" escapes. The \ will now look like a windows directory sep so I guess \ gets stripped and then / turned into \. Hopefully nobody expects a path with a component name like "foo\bar" since it works in other filesystems...

          I was under the impression that hadoop in its current state didn't work on windows, although I know an effort is underway to fix it. If it doesn't work, could we consider punting, remove the \ stripping, and let those leading the windows effort think about how to fix it correctly?

          Show
          Daryn Sharp added a comment - I too had thought about RawLocalFileSystem being the place to hide the translation. Does windows resolve paths with forward slashes? If not, there are interesting edge cases like paths containing "useless" escapes. The \ will now look like a windows directory sep so I guess \ gets stripped and then / turned into \. Hopefully nobody expects a path with a component name like "foo\bar" since it works in other filesystems... I was under the impression that hadoop in its current state didn't work on windows, although I know an effort is underway to fix it. If it doesn't work, could we consider punting, remove the \ stripping, and let those leading the windows effort think about how to fix it correctly?
          Hide
          Doug Cutting added a comment -

          > Does windows resolve paths with forward slashes?

          Java on Windows does.

          > I was under the impression that hadoop in its current state didn't work on windows [ ... ]

          It has worked in the past and may still work with Cygwin for shell scripts.

          > could we consider punting, remove the \ stripping, and let those leading the windows effort think about how to fix it correctly?

          I suspect there are folks who manage to run Hadoop on Windows with Cygwin today that such a change would break. This would be a regression.

          Show
          Doug Cutting added a comment - > Does windows resolve paths with forward slashes? Java on Windows does. > I was under the impression that hadoop in its current state didn't work on windows [ ... ] It has worked in the past and may still work with Cygwin for shell scripts. > could we consider punting, remove the \ stripping, and let those leading the windows effort think about how to fix it correctly? I suspect there are folks who manage to run Hadoop on Windows with Cygwin today that such a change would break. This would be a regression.
          Hide
          Daryn Sharp added a comment -

          Any recommendations? Maybe only convert \ to / if the OS is windows? Windows will still have the bug of not being able to escape metas, but unix will work as expected.

          Show
          Daryn Sharp added a comment - Any recommendations? Maybe only convert \ to / if the OS is windows? Windows will still have the bug of not being able to escape metas, but unix will work as expected.
          Hide
          Daryn Sharp added a comment -

          What about something like this?

             private String normalizePath(String path) {
               // remove double slashes & backslashes
               path = StringUtils.replace(path, "//", "/");
          -    path = StringUtils.replace(path, "\\", "/");
          +    if (WINDOWS) {
          +      path = StringUtils.replace(path, "\\", "/");
          +      path = StringUtils.replace(path, "^", "\\");
          +    }
          
          Show
          Daryn Sharp added a comment - What about something like this? private String normalizePath( String path) { // remove double slashes & backslashes path = StringUtils.replace(path, " //" , "/" ); - path = StringUtils.replace(path, "\\" , "/" ); + if (WINDOWS) { + path = StringUtils.replace(path, "\\" , "/" ); + path = StringUtils.replace(path, "^" , "\\" ); + }
          Hide
          Suresh Srinivas added a comment -

          I suspect there are folks who manage to run Hadoop on Windows with Cygwin today that such a change would break. This would be a regression.

          I am not sure how your change this addresses comment from Doug. See the comment below.

          Here are the scenarios to consider (sorry I do not have access to Windows PC).

          1. On windows terminal paths use '/' and WINDOWS is true.
          2. On Cygwin terminal paths use '\' and WINDOWS is (probably) true. In this case the patch might introduce unnecessary behavior?
          Show
          Suresh Srinivas added a comment - I suspect there are folks who manage to run Hadoop on Windows with Cygwin today that such a change would break. This would be a regression. I am not sure how your change this addresses comment from Doug. See the comment below. Here are the scenarios to consider (sorry I do not have access to Windows PC). On windows terminal paths use '/' and WINDOWS is true. On Cygwin terminal paths use '\' and WINDOWS is (probably) true. In this case the patch might introduce unnecessary behavior?
          Hide
          Doug Cutting added a comment -

          > Any recommendations? Maybe only convert \ to / if the OS is windows?

          It'd be nice to fix the problem for Windows users too.

          Why not try fixing it in RawLocalFileSystem?

          Thinking further, perhaps this is not a bug. Perhaps the user error is assuming that backslash is an escape character in paths. The escape character in URIs is %. Does this work with 'rm %2A'?

          Show
          Doug Cutting added a comment - > Any recommendations? Maybe only convert \ to / if the OS is windows? It'd be nice to fix the problem for Windows users too. Why not try fixing it in RawLocalFileSystem? Thinking further, perhaps this is not a bug. Perhaps the user error is assuming that backslash is an escape character in paths. The escape character in URIs is %. Does this work with 'rm %2A'?
          Hide
          Daryn Sharp added a comment -

          It'd be nice to fix the problem for Windows users too.

          True, which I believe it would it in the small snippet above. For windows only, \ is converted to /, and the windows escape char ^ is converted to \.

          Why not try fixing it in RawLocalFileSystem?

          I think that could work, but windows users who might be relying on "\my\hdfs\path" to work will be in for an incompatible surprise. While it might be the right long-term approach, I think presents more risk than conditionalizing the \ to / for now.

          Thinking further, perhaps this is not a bug. Perhaps the user error is assuming that backslash is an escape character in paths. The escape character in URIs is %. Does this work with 'rm %2A'?

          No, it doesn't work because paths are not subject to percent encoding. Path has hybrid behavior of a URI and a File. Path is really only using a URI to hold the scheme and authority. The rest of the URI is considered a literal path. Percent encoding would require many non-alphanumeric characters to be escaped, including almost all of the glob metachars.

          Changing Path to require percent encoding would be extremely disruptive and incompatible.

          Show
          Daryn Sharp added a comment - It'd be nice to fix the problem for Windows users too. True, which I believe it would it in the small snippet above. For windows only, \ is converted to /, and the windows escape char ^ is converted to \. Why not try fixing it in RawLocalFileSystem? I think that could work, but windows users who might be relying on "\my\hdfs\path" to work will be in for an incompatible surprise. While it might be the right long-term approach, I think presents more risk than conditionalizing the \ to / for now. Thinking further, perhaps this is not a bug. Perhaps the user error is assuming that backslash is an escape character in paths. The escape character in URIs is %. Does this work with 'rm %2A'? No, it doesn't work because paths are not subject to percent encoding. Path has hybrid behavior of a URI and a File. Path is really only using a URI to hold the scheme and authority. The rest of the URI is considered a literal path. Percent encoding would require many non-alphanumeric characters to be escaped, including almost all of the glob metachars. Changing Path to require percent encoding would be extremely disruptive and incompatible.
          Hide
          Daryn Sharp added a comment -

          Here are the scenarios to consider (sorry I do not have access to Windows PC).

          1. On windows terminal paths use '/' and WINDOWS is true.
          2. On Cygwin terminal paths use '\' and WINDOWS is (probably) true. In this case the patch might introduce unnecessary behavior?

          Good questions. The pre-existing WINDOWS flag is based on System.getProperty("os.name").startsWith("Windows"); so it should be true for both cygwin and non-cygwin.

          If the "^" to "\" substitution is removed, then no behavior changes for windows. It just won't be able to quote metachars. I'm fine with that.

          Show
          Daryn Sharp added a comment - Here are the scenarios to consider (sorry I do not have access to Windows PC). On windows terminal paths use '/' and WINDOWS is true. On Cygwin terminal paths use '\' and WINDOWS is (probably) true. In this case the patch might introduce unnecessary behavior? Good questions. The pre-existing WINDOWS flag is based on System.getProperty("os.name").startsWith("Windows"); so it should be true for both cygwin and non-cygwin. If the "^" to "\" substitution is removed, then no behavior changes for windows. It just won't be able to quote metachars. I'm fine with that.
          Hide
          Doug Cutting added a comment -

          > For windows only, \ is converted to /, and the windows escape char ^ is converted to \.

          Ah. Now I get it. I like this approach! +1

          Show
          Doug Cutting added a comment - > For windows only, \ is converted to /, and the windows escape char ^ is converted to \. Ah. Now I get it. I like this approach! +1
          Hide
          Daryn Sharp added a comment -

          I haven't run the full suite of tests, but I want to get the patch up for comments this evening. I did have to make a couple of lines to the glob parser due to directly bugs found during testing this patch. I'll make it a separate jira if there are objections to including it.

          Glob will try to build a regexp for each path component. If it doesn't see an unescaped shell metachar, then it falls back to using the raw path component string. In the case of quoted metachars, the quoting is never removed. I fixed that.

          Ironically, the glob quoting would only work if there was also an unquoted metachar. This forced the use of a regexp where the unstripped quoting was valid.

          Show
          Daryn Sharp added a comment - I haven't run the full suite of tests, but I want to get the patch up for comments this evening. I did have to make a couple of lines to the glob parser due to directly bugs found during testing this patch. I'll make it a separate jira if there are objections to including it. Glob will try to build a regexp for each path component. If it doesn't see an unescaped shell metachar, then it falls back to using the raw path component string. In the case of quoted metachars, the quoting is never removed. I fixed that. Ironically, the glob quoting would only work if there was also an unquoted metachar. This forced the use of a regexp where the unstripped quoting was valid.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517501/HADOOP-8139.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.fs.TestPath
          org.apache.hadoop.ipc.TestRPCCallBenchmark
          org.apache.hadoop.fs.viewfs.TestViewFsTrash

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/684//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/684//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517501/HADOOP-8139.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.TestPath org.apache.hadoop.ipc.TestRPCCallBenchmark org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/684//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/684//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Sorry, wrong patch. Finally learned that I can't select my patch file, leave the dialog open, sometime later overwrite the patch & hit submit, and then expect it to pick up the modified file...

          Show
          Daryn Sharp added a comment - Sorry, wrong patch. Finally learned that I can't select my patch file, leave the dialog open, sometime later overwrite the patch & hit submit, and then expect it to pick up the modified file...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517573/HADOOP-8139.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.fs.viewfs.TestViewFsTrash

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/686//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/686//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517573/HADOOP-8139.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/686//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/686//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Hadoop currently allows both paths with '\' or '/' as separators and some times '\' is used as an escape character. This results in issues when the escape characters is interprted as separator (as this jira describes).

          What we need is (as you have proposed):

          1. Paths with '\' as separator only use '^' as escape character.
          2. Paths with '/' as separator only use '\' as escape character.

          But the main problem is to decipher the intended separator in a path.

          Some possible directions for solving this problem:

          1. Disallow '\' as separator in paths. I believe, Hadoop should choose a format for the path and not changed based on the OS. This is my preferred solution - has the following issues.
            • This is backward incompatible. I am guessing the impact of this on existing users is minimmal.
          2. Allow both separators, the approach you have described.
            • An application that uses '\' based path, works from both windows and non windows system, before this change. With this change, an application needs to use the right path based on the OS. This could be very confusing to the users.
            • The behavior on Windows on Cygwin and without Cygwin could be very different.
          Show
          Suresh Srinivas added a comment - Hadoop currently allows both paths with '\' or '/' as separators and some times '\' is used as an escape character. This results in issues when the escape characters is interprted as separator (as this jira describes). What we need is (as you have proposed): Paths with '\' as separator only use '^' as escape character. Paths with '/' as separator only use '\' as escape character. But the main problem is to decipher the intended separator in a path. Some possible directions for solving this problem: Disallow '\' as separator in paths. I believe, Hadoop should choose a format for the path and not changed based on the OS. This is my preferred solution - has the following issues. This is backward incompatible. I am guessing the impact of this on existing users is minimmal. Allow both separators, the approach you have described. An application that uses '\' based path, works from both windows and non windows system, before this change. With this change, an application needs to use the right path based on the OS. This could be very confusing to the users. The behavior on Windows on Cygwin and without Cygwin could be very different.
          Hide
          Daryn Sharp added a comment -

          Hadoop currently allows both paths with '\' or '/' as separators and some times '\' is used as an escape character.

          Yes, hadoop currently allows both as separators, which means currently '\' never works as an escape as intended.

          Disallow '\' as separator in paths. I believe, Hadoop should choose a format for the path and not changed based on the OS.

          I completely agree! I originally suggested "punting" the windows issue, but Doug has valid compatibility concerns.

          Allow both separators [...] an application needs to use the right path based on the OS. This could be very confusing to the users.

          Agreed. I don't like inconsistency.

          The behavior on Windows on Cygwin and without Cygwin could be very different.

          I thought hadoop currently requires cygwin, so do we need to worry about non-cygwin behavior right now?

          My goal is the most minimal non-invasive change to minimize risk. If you and Doug reach a consensus, I'll implement either approach.

          Show
          Daryn Sharp added a comment - Hadoop currently allows both paths with '\' or '/' as separators and some times '\' is used as an escape character. Yes, hadoop currently allows both as separators, which means currently '\' never works as an escape as intended. Disallow '\' as separator in paths. I believe, Hadoop should choose a format for the path and not changed based on the OS. I completely agree! I originally suggested "punting" the windows issue, but Doug has valid compatibility concerns. Allow both separators [...] an application needs to use the right path based on the OS. This could be very confusing to the users. Agreed. I don't like inconsistency. The behavior on Windows on Cygwin and without Cygwin could be very different. I thought hadoop currently requires cygwin, so do we need to worry about non-cygwin behavior right now? My goal is the most minimal non-invasive change to minimize risk. If you and Doug reach a consensus, I'll implement either approach.
          Hide
          Doug Cutting added a comment -

          > Disallow '\' as separator in paths. I believe, Hadoop should choose a format for the path and not changed based on the OS.

          I'm fine with this approach too. It'll be more work to get things working on Windows again this way, and that work will have to be done. So it's not so much a back-compatibility issue as it will introduce regressions on Windows that will need to be fixed by someone.

          > The behavior on Windows on Cygwin and without Cygwin could be very different.

          I don't think Cygwin has much to do with it. Cygwin's mostly just used to run shell scripts. Directory listings from java.io.File, classpath entries, resource paths, etc. are all the same with or without Cygwin, since these come from the JVM.

          Show
          Doug Cutting added a comment - > Disallow '\' as separator in paths. I believe, Hadoop should choose a format for the path and not changed based on the OS. I'm fine with this approach too. It'll be more work to get things working on Windows again this way, and that work will have to be done. So it's not so much a back-compatibility issue as it will introduce regressions on Windows that will need to be fixed by someone. > The behavior on Windows on Cygwin and without Cygwin could be very different. I don't think Cygwin has much to do with it. Cygwin's mostly just used to run shell scripts. Directory listings from java.io.File, classpath entries, resource paths, etc. are all the same with or without Cygwin, since these come from the JVM.
          Hide
          Daryn Sharp added a comment -

          Changed patch to eliminate conversion of \ to /. Updated a couple tests in TestPath to no longer expects the conversion. Thanks to Doug & Suresh for discussion on the issue.

          Show
          Daryn Sharp added a comment - Changed patch to eliminate conversion of \ to /. Updated a couple tests in TestPath to no longer expects the conversion. Thanks to Doug & Suresh for discussion on the issue.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517612/HADOOP-8139-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/689//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/689//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517612/HADOOP-8139-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/689//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/689//console This message is automatically generated.
          Hide
          Doug Cutting added a comment -

          Do you plan to make any changes to RawLocalFileSystem or test anything on Windows? If not, since this introduces regressions on Windows, you should probably at least file some issues to fix these. As regressions, these should probably be blockers on releases that include this change. These might be sub-tasks of HADOOP-8079. Should we link this issue to HADOOP-8079?

          Show
          Doug Cutting added a comment - Do you plan to make any changes to RawLocalFileSystem or test anything on Windows? If not, since this introduces regressions on Windows, you should probably at least file some issues to fix these. As regressions, these should probably be blockers on releases that include this change. These might be sub-tasks of HADOOP-8079 . Should we link this issue to HADOOP-8079 ?
          Hide
          Suresh Srinivas added a comment -

          ... this introduces regressions on Windows

          Doug, not sure what you mean by this. Hadoop can still use through java File Windows right? All we are changing is, the file paths to/through Hadoop now are based on forward slash.

          Show
          Suresh Srinivas added a comment - ... this introduces regressions on Windows Doug, not sure what you mean by this. Hadoop can still use through java File Windows right? All we are changing is, the file paths to/through Hadoop now are based on forward slash.
          Hide
          Doug Cutting added a comment -

          If a directory containing backslashes (from RawLocalFileSystem, CLASSPATH, or somewhere) is turned into a Path and someone tries to access the parent directory of that Path then that not work correctly unless backslash is interpreted as a directory separator. Lots of, e.g., MapReduce unit tests depend on this in order to pass on Windows. That's why the substitution you're removing was added.

          Show
          Doug Cutting added a comment - If a directory containing backslashes (from RawLocalFileSystem, CLASSPATH, or somewhere) is turned into a Path and someone tries to access the parent directory of that Path then that not work correctly unless backslash is interpreted as a directory separator. Lots of, e.g., MapReduce unit tests depend on this in order to pass on Windows. That's why the substitution you're removing was added.
          Hide
          Daryn Sharp added a comment -

          I'm sorry, I misunderstood what I was expected to do. We'd like to get this into 23.2 so I've been trying to avoid an in-depth change. Adding a windows blocker is undesirable too.

          Apparently windows NT has long internally handled / as \, plus the JVM will implicitly do the conversion. I read it on the internet, so it must be true! However, for File objects returned from operations, I'm not sure if it's smart enough to use the same delimiter used in the input path.

          Here's some options. Doug, I think you are essentially proposing #2?

          1. Add a config option to sub \ to /
            • Pros: backward-compatibility for those that want it (ie. no quoting), or Unix and windows can quote.
            • Cons: none?
          2. No subbing of \ to / (latest patch)
            • Pros: Unix and windows can quote metas
            • Cons: File objects returned from Windows might not convert to Path. Requires changes to RLFS
          3. Use the WINDOWS conditional for subbing \ to /
            • Pros: Unix can quote metas.
            • Cons: Unix can't use windows-style paths to access hdfs. Windows can't quota metas. Not a regression because it can't do that today, but now inconsistent with unix feature.
          4. Use a WINDOWS conditional for subbing \ to /, and ^ to \ (prior patch)
            • Pros: Unix and windows can quota metas
            • Cons: The quoting requires different/inconsistent chars that are system-dependent. Unix can't use windows-style paths.

          A new Path ctor might make this easy to fix:

          Path(File file) {
            (File.separatorChar == Path.SEPARATOR_CHAR)
              ? this(file.toString())
              : this(file.toString().replace(File.separatorChar, Path.SEPARATOR_CHAR));
            }
          }
          
          Show
          Daryn Sharp added a comment - I'm sorry, I misunderstood what I was expected to do. We'd like to get this into 23.2 so I've been trying to avoid an in-depth change. Adding a windows blocker is undesirable too. Apparently windows NT has long internally handled / as \, plus the JVM will implicitly do the conversion. I read it on the internet, so it must be true! However, for File objects returned from operations, I'm not sure if it's smart enough to use the same delimiter used in the input path. Here's some options. Doug, I think you are essentially proposing #2? Add a config option to sub \ to / Pros: backward-compatibility for those that want it (ie. no quoting), or Unix and windows can quote. Cons: none? No subbing of \ to / (latest patch) Pros: Unix and windows can quote metas Cons: File objects returned from Windows might not convert to Path. Requires changes to RLFS Use the WINDOWS conditional for subbing \ to / Pros: Unix can quote metas. Cons: Unix can't use windows-style paths to access hdfs. Windows can't quota metas. Not a regression because it can't do that today, but now inconsistent with unix feature. Use a WINDOWS conditional for subbing \ to /, and ^ to \ (prior patch) Pros: Unix and windows can quota metas Cons: The quoting requires different/inconsistent chars that are system-dependent. Unix can't use windows-style paths. A new Path ctor might make this easy to fix: Path(File file) { (File.separatorChar == Path.SEPARATOR_CHAR) ? this (file.toString()) : this (file.toString().replace(File.separatorChar, Path.SEPARATOR_CHAR)); } }
          Hide
          Doug Cutting added a comment -

          I don't understand the motivation for (1) over (3). When would a non-windows user want subbing? When would a Windows user not require it?

          (2) will probably require changes to more than RawLocalFileSystem. You won't know unless you try to run things on Windows. Last time I worked on this (years ago) I was surprised at the number of places that paths with backslashes crept in. This is probably the best long-term approach, but it may be a lot of work to stomp all the bugs.

          (4) Is simplest, but system-dependent escapes would be unfortunate.

          I don't see a clear winner. I prefer (2) or (4) to (1) or (3): if the ability to escape paths is a critical feature for Unix users then it's probably a critical feature for Windows users.

          Show
          Doug Cutting added a comment - I don't understand the motivation for (1) over (3). When would a non-windows user want subbing? When would a Windows user not require it? (2) will probably require changes to more than RawLocalFileSystem. You won't know unless you try to run things on Windows. Last time I worked on this (years ago) I was surprised at the number of places that paths with backslashes crept in. This is probably the best long-term approach, but it may be a lot of work to stomp all the bugs. (4) Is simplest, but system-dependent escapes would be unfortunate. I don't see a clear winner. I prefer (2) or (4) to (1) or (3): if the ability to escape paths is a critical feature for Unix users then it's probably a critical feature for Windows users.
          Hide
          Daryn Sharp added a comment -

          When would a non-windows user want subbing? When would a Windows user not require it?

          I'm just presenting the options, but if we consider hadoop to be generally URI-based, then no subbing should ever occur for URIs. MS deprecated their non-standard use of \ in URIs back in 2006. A valid use case may be C:\dir\file. It's easy to identify the drive letter and convert to file:///C:/dir/file – official file URI per MS support docs. However, once you start using relative paths, it becomes murky whether \ is a dir sep or quoting char. If we want consistent paths and quoting that are system-independent, then we should mandate /. I think if we run external paths through File (at least for windows), it might normalize for us, but I need to find myself a windows machine to test.

          All said, I agree #2 is the ideal for consistency. #4 or #3 is an immediate remedy for unix. #3 would not alter the existing windows behavior, giving us time to figure out how we want to improve it. I'm somewhat concerned that if we let the cat out the bag with a "^" on windows (#4) we are going to have nasty cross-platform script compatibility in the future.

          I'll investigate #2 tomorrow morning, but you've concerned me that it may be harder than it looks, and I don't have the means to stomp down windows bugs. What would you be willing to accept in 23.2 if #2 proves difficult?

          Show
          Daryn Sharp added a comment - When would a non-windows user want subbing? When would a Windows user not require it? I'm just presenting the options, but if we consider hadoop to be generally URI-based, then no subbing should ever occur for URIs. MS deprecated their non-standard use of \ in URIs back in 2006. A valid use case may be C:\dir\file. It's easy to identify the drive letter and convert to file:///C:/dir/file – official file URI per MS support docs. However, once you start using relative paths, it becomes murky whether \ is a dir sep or quoting char. If we want consistent paths and quoting that are system-independent, then we should mandate /. I think if we run external paths through File (at least for windows), it might normalize for us, but I need to find myself a windows machine to test. All said, I agree #2 is the ideal for consistency. #4 or #3 is an immediate remedy for unix. #3 would not alter the existing windows behavior, giving us time to figure out how we want to improve it. I'm somewhat concerned that if we let the cat out the bag with a "^" on windows (#4) we are going to have nasty cross-platform script compatibility in the future. I'll investigate #2 tomorrow morning, but you've concerned me that it may be harder than it looks, and I don't have the means to stomp down windows bugs. What would you be willing to accept in 23.2 if #2 proves difficult?
          Hide
          Daryn Sharp added a comment -

          #2 doesn't look easy. So how about a more natural hybrid solution? If it's windows and it's a local filesystem URI, do the os-specific subbing. For all other URIs, the standard policy of / is dir sep, and \ is quote is used. The patch will be a bit more complex but I don't think it will be too hard.

          Show
          Daryn Sharp added a comment - #2 doesn't look easy. So how about a more natural hybrid solution? If it's windows and it's a local filesystem URI, do the os-specific subbing. For all other URIs, the standard policy of / is dir sep, and \ is quote is used. The patch will be a bit more complex but I don't think it will be too hard.
          Hide
          Doug Cutting added a comment -

          > If it's windows and it's a local filesystem URI, do the os-specific subbing.

          You can't always tell whether it's a local filesystem URI. Lots of relative paths are constructed that do not contain a scheme.

          Show
          Doug Cutting added a comment - > If it's windows and it's a local filesystem URI, do the os-specific subbing. You can't always tell whether it's a local filesystem URI. Lots of relative paths are constructed that do not contain a scheme.
          Hide
          Doug Cutting added a comment -

          > on windows (#4) we are going to have nasty cross-platform script compatibility in the future

          Command-line scripts on Windows tend to be .bat files while shell scripts are used on Unix. What would be required for this to be a problem is a cross-platform scripting environment that doesn't treat backslash as a directory separator on Windows, right? Do you know of any such?

          Show
          Doug Cutting added a comment - > on windows (#4) we are going to have nasty cross-platform script compatibility in the future Command-line scripts on Windows tend to be .bat files while shell scripts are used on Unix. What would be required for this to be a problem is a cross-platform scripting environment that doesn't treat backslash as a directory separator on Windows, right? Do you know of any such?
          Hide
          Suresh Srinivas added a comment -

          If it's windows and it's a local filesystem URI, do the os-specific subbing.

          I already had raised a few issues with this approach. Let me state it again:

          1. Applications with paths using / as separator could be used from/on Unix/Windows:
            • Some times \ becomes escape character and some times separator. That means the problem stated in this jira some times happens and some times not! We could argue that it is better than what we have today - that is - every time \ is misinterpreted as path separator. Still, this will be very confusing for the users. The only way to around this, I can see is, all users running apps on Unix must use / and windows use \ as separators, which is even more confusing.
          2. Applications with paths using \ as separator could be used from/on Unix/Windows:
            • Here on windows, things works. The same path will not work at all for Unix!

          I feel we should consider removing support for paths using \ as separator. Some of the arguments around RawLocalFileSystem, I have hard time following, may be because I have not seen them.

          Show
          Suresh Srinivas added a comment - If it's windows and it's a local filesystem URI, do the os-specific subbing. I already had raised a few issues with this approach. Let me state it again: Applications with paths using / as separator could be used from/on Unix/Windows: Some times \ becomes escape character and some times separator. That means the problem stated in this jira some times happens and some times not! We could argue that it is better than what we have today - that is - every time \ is misinterpreted as path separator. Still, this will be very confusing for the users. The only way to around this, I can see is, all users running apps on Unix must use / and windows use \ as separators, which is even more confusing. Applications with paths using \ as separator could be used from/on Unix/Windows: Here on windows, things works. The same path will not work at all for Unix! I feel we should consider removing support for paths using \ as separator. Some of the arguments around RawLocalFileSystem, I have hard time following, may be because I have not seen them.
          Hide
          Daryn Sharp added a comment -

          You can't always tell whether it's a local filesystem URI. Lots of relative paths are constructed that do not contain a scheme.

          Yes, but I might be able to pull it with some creative path manipulations where ultimately RLFS on windows converts ^ to \...

          Show
          Daryn Sharp added a comment - You can't always tell whether it's a local filesystem URI. Lots of relative paths are constructed that do not contain a scheme. Yes, but I might be able to pull it with some creative path manipulations where ultimately RLFS on windows converts ^ to \...
          Hide
          Daryn Sharp added a comment -

          But I also agree with Suresh that it sure would be easier to say windows has to use / too.

          Show
          Daryn Sharp added a comment - But I also agree with Suresh that it sure would be easier to say windows has to use / too.
          Hide
          Doug Cutting added a comment -

          > I feel we should consider removing support for paths using \ as separator.

          I think this is a good approach but it will break things on Windows and we should file blockers for these regressions or fix them here.

          Show
          Doug Cutting added a comment - > I feel we should consider removing support for paths using \ as separator. I think this is a good approach but it will break things on Windows and we should file blockers for these regressions or fix them here.
          Hide
          Daryn Sharp added a comment -

          Completely untested, requesting comments. Add Path(File) ctor that converts \ to / on windows. RLFS with convert ^ to \ when globbing.

          Show
          Daryn Sharp added a comment - Completely untested, requesting comments. Add Path(File) ctor that converts \ to / on windows. RLFS with convert ^ to \ when globbing.
          Hide
          Daryn Sharp added a comment -

          I was rushing to meeting, so allow me to re-summarize what (I think) I did.

          • Existing Path ctors require the path to have a dir sep of / because they expected to be URIs
          • Path(File) can instantiate path objects for the local fs. For unix, it does the same as Path(file.toString()), so no change. For windows, it allows \ paths by rewriting the File to sub \ to /.
          • RLFS will rewrite windows paths to sub ^ with \ in order to preserve windows quoting semantics. Unix is unchanged.

          This means that windows users can use \ paths only with File objects, and convert to Path with new Path(new File("\foo\bar")) which turns into /foo/bar. However, new Path("\foo\bar") is presumed to be a URI so it remains \foo\bar. RLFS for windows will allow ^* to be rewritten into \ * so the normal globbing code will operate as expected.

          In short:

          1. Behavior is consistent between unix and windows, unless windows uses File objects to obtain local semantics
          2. Unix allows \ as a meta-quote.
          3. Windows RLFS paths may use \ and ^ semantics, while other paths follow the URI semantics of / and \.
          Show
          Daryn Sharp added a comment - I was rushing to meeting, so allow me to re-summarize what (I think) I did. Existing Path ctors require the path to have a dir sep of / because they expected to be URIs Path(File) can instantiate path objects for the local fs. For unix, it does the same as Path(file.toString()), so no change. For windows, it allows \ paths by rewriting the File to sub \ to /. RLFS will rewrite windows paths to sub ^ with \ in order to preserve windows quoting semantics. Unix is unchanged. This means that windows users can use \ paths only with File objects, and convert to Path with new Path(new File("\foo\bar")) which turns into /foo/bar . However, new Path("\foo\bar") is presumed to be a URI so it remains \foo\bar . RLFS for windows will allow ^* to be rewritten into \ * so the normal globbing code will operate as expected. In short: Behavior is consistent between unix and windows, unless windows uses File objects to obtain local semantics Unix allows \ as a meta-quote. Windows RLFS paths may use \ and ^ semantics, while other paths follow the URI semantics of / and \.
          Hide
          Doug Cutting added a comment -

          This patch looks like a good start.

          • Someone should test this on Windows before it is committed. I suspect there will be several other places besides RawLocalFileSystem where Windows paths with backslashes enter the system.
          • I like the new Path(File) constructor. Rather than using toString() maybe it could use toURI()?
          • Rather than removing the
            new Path("c:\\a\\b")

            test, you might change it to something like

            for (File root : File.listRoots())
              assertEquals(???, new Path(new File(root, "a"+File.separator+"b")));
            
          Show
          Doug Cutting added a comment - This patch looks like a good start. Someone should test this on Windows before it is committed. I suspect there will be several other places besides RawLocalFileSystem where Windows paths with backslashes enter the system. I like the new Path(File) constructor. Rather than using toString() maybe it could use toURI()? Rather than removing the new Path( "c:\\a\\b" ) test, you might change it to something like for (File root : File.listRoots()) assertEquals(???, new Path( new File(root, "a" +File.separator+ "b" )));
          Hide
          Doug Cutting added a comment -

          Note that even when things are made to work again on Windows after this change it will still be an incompatible change, since programs that created Paths with backslashes won't work the same. This may be well warranted, but it should still be documented as an incompatible change.

          Show
          Doug Cutting added a comment - Note that even when things are made to work again on Windows after this change it will still be an incompatible change, since programs that created Paths with backslashes won't work the same. This may be well warranted, but it should still be documented as an incompatible change.
          Hide
          Daryn Sharp added a comment -

          Thanks for the feedback, Doug. I'm glad we appear to be zeroing in on a solution.

          I suspect there will be several other places besides RawLocalFileSystem where Windows paths with backslashes enter the system.

          Do you have any idea what or where these may lurk...? Once a user has instantiated a path via a file, further path manipulations should work. I'll have to think about it some more.

          I like the new Path(File) constructor. Rather than using toString() maybe it could use toURI()?

          I tried that first, but it fully qualifies the path. It uses the actual current working directory, instead of the one set in the filesystem. It also percent encodes the path, so I had to use URLDecoder. Messy.

          Re: the tests, good idea. I didn't update the test because I was trying to throw up a proof of concept to see if it was a reasonable approach before I spent too much time on it.

          Show
          Daryn Sharp added a comment - Thanks for the feedback, Doug. I'm glad we appear to be zeroing in on a solution. I suspect there will be several other places besides RawLocalFileSystem where Windows paths with backslashes enter the system. Do you have any idea what or where these may lurk...? Once a user has instantiated a path via a file, further path manipulations should work. I'll have to think about it some more. I like the new Path(File) constructor. Rather than using toString() maybe it could use toURI()? I tried that first, but it fully qualifies the path. It uses the actual current working directory, instead of the one set in the filesystem. It also percent encodes the path, so I had to use URLDecoder. Messy. Re: the tests, good idea. I didn't update the test because I was trying to throw up a proof of concept to see if it was a reasonable approach before I spent too much time on it.
          Hide
          Daryn Sharp added a comment -

          How this change to Path(String): if it's drive-letter:\, then it's converted as if it's a file. The path remembers it's file-based so the 2-arg ctors can convert the child path too. Not perfect, but as about as close to backwards-compatible as reasonably possible.

          Show
          Daryn Sharp added a comment - How this change to Path(String): if it's drive-letter:\, then it's converted as if it's a file. The path remembers it's file-based so the 2-arg ctors can convert the child path too. Not perfect, but as about as close to backwards-compatible as reasonably possible.
          Hide
          Doug Cutting added a comment -

          > Do you have any idea what or where these may lurk...?

          It's hard to know without testing on Windows. Maybe there won't be that any or many. I find a number of places that call new Path(file.getCanonicalPath()) which might cause problems or might not, depending on what's done with the Path returned.

          Show
          Doug Cutting added a comment - > Do you have any idea what or where these may lurk...? It's hard to know without testing on Windows. Maybe there won't be that any or many. I find a number of places that call new Path(file.getCanonicalPath()) which might cause problems or might not, depending on what's done with the Path returned.
          Hide
          Doug Cutting added a comment -

          > if it's drive-letter:\, then it's converted as if it's a file

          That might work if most of the cases are fully-qualified with a drive letter, which they probably are.

          Show
          Doug Cutting added a comment - > if it's drive-letter:\, then it's converted as if it's a file That might work if most of the cases are fully-qualified with a drive letter, which they probably are.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517769/HADOOP-8139-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.fs.TestPath

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/695//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/695//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517769/HADOOP-8139-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.TestPath +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/695//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/695//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Ok, only change to unix is quotable meta-chars.

          Windows rules, please confirm:

          • c:/... - no conversion
          • c:\... - is converted
          • scheme:... - no conversion
          • \... - is converted (correct??)
          • /... - not converted
          • multi-arg ctors: first path determines how next path is handled; ie. handled the same

          I've added some more tests, although I think more are needed as suggested by Doug. I've tested as much as I can by flipping WINDOWS to true in the source. It dies on cygwin.

          Unfortunately I do not have access to a windows machine. I will be eternally grateful if somone within the community will help me test on windows.

          Show
          Daryn Sharp added a comment - Ok, only change to unix is quotable meta-chars. Windows rules, please confirm: c:/... - no conversion c:\... - is converted scheme:... - no conversion \... - is converted (correct??) /... - not converted multi-arg ctors: first path determines how next path is handled; ie. handled the same I've added some more tests, although I think more are needed as suggested by Doug. I've tested as much as I can by flipping WINDOWS to true in the source. It dies on cygwin. Unfortunately I do not have access to a windows machine. I will be eternally grateful if somone within the community will help me test on windows.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517815/HADOOP-8139-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.fs.viewfs.TestViewFsTrash

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/696//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/696//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517815/HADOOP-8139-4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/696//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/696//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Daryn, why are we trying to provide incomplete support for back slash based paths? Why not just remove support for it. I thought we decided on this approach and agreed that it is backward incompatible change.

          Show
          Suresh Srinivas added a comment - Daryn, why are we trying to provide incomplete support for back slash based paths? Why not just remove support for it. I thought we decided on this approach and agreed that it is backward incompatible change.
          Hide
          Daryn Sharp added a comment -

          Yes, I think we all agreed on the incompatible change, but Doug wanted additional blockers that I interpreted as making this jira unfit for 23.2. Re:

          Doug: I think this is a good approach but it will break things on Windows and we should file blockers for these regressions or fix them here.

          The latest patch should work in the vast majority of cases. I believe the only case it arguably remains incompatible is new Path("dir", "dir2\foo"). The first component of the path does not look like a native windows path so it is interpreted as dir/dir2\foo with the assumption it's a normal URI.

          I need clarification and agreement from you and Doug on what is sufficient to warrant a possible followup jira, but avoid a blocker. Thanks.

          Show
          Daryn Sharp added a comment - Yes, I think we all agreed on the incompatible change, but Doug wanted additional blockers that I interpreted as making this jira unfit for 23.2. Re: Doug: I think this is a good approach but it will break things on Windows and we should file blockers for these regressions or fix them here. The latest patch should work in the vast majority of cases. I believe the only case it arguably remains incompatible is new Path("dir", "dir2\foo") . The first component of the path does not look like a native windows path so it is interpreted as dir/dir2\foo with the assumption it's a normal URI. I need clarification and agreement from you and Doug on what is sufficient to warrant a possible followup jira, but avoid a blocker. Thanks.
          Hide
          Doug Cutting added a comment -

          This should be coordinated with HADOOP-8079, no? Windows is a supported platform for 1.0. HADOOP-8079 plans to further improve Windows support in 1.0 and 0.24. I don't know what the intended degree of Windows support is for 0.23.

          Show
          Doug Cutting added a comment - This should be coordinated with HADOOP-8079 , no? Windows is a supported platform for 1.0. HADOOP-8079 plans to further improve Windows support in 1.0 and 0.24. I don't know what the intended degree of Windows support is for 0.23.
          Hide
          Daryn Sharp added a comment -

          I believe this patch should not introduce incompatibilities with current behavior.

          Had to remove Path(File) logic from RLFS because hadoop can return home dir paths, etc with / in them. Forcing them to parse as local is inappropriate in those cases. The "intelligence" when parsing windows paths should now handle the paths correctly. Moved ^ handling into FileSystem when the path is being converted from native to ensure consistency in path handling.

          Minor change to not parse windows path as native if it contains a /. Corrects inconsistency where the user may be trying to use URIs and quoting the first char of the path.

          I was able to test by manually flipping the value of Path.WINDOWS. The final test case of course failed when trying to call cygwin.

          Please let me know if this is sufficient for 23.2 w/o a blocker. Thanks!

          Show
          Daryn Sharp added a comment - I believe this patch should not introduce incompatibilities with current behavior. Had to remove Path(File) logic from RLFS because hadoop can return home dir paths, etc with / in them. Forcing them to parse as local is inappropriate in those cases. The "intelligence" when parsing windows paths should now handle the paths correctly. Moved ^ handling into FileSystem when the path is being converted from native to ensure consistency in path handling. Minor change to not parse windows path as native if it contains a /. Corrects inconsistency where the user may be trying to use URIs and quoting the first char of the path. I was able to test by manually flipping the value of Path.WINDOWS. The final test case of course failed when trying to call cygwin. Please let me know if this is sufficient for 23.2 w/o a blocker. Thanks!
          Hide
          Suresh Srinivas added a comment -

          This should be coordinated with HADOOP-8079, no?

          Why? Once the decision is made to support only / based paths, HADOOP-8079 will work also with that path format right?

          Given that this is a blocker for 0.23, and can cause accidental deletion of data, we need to fix this ASAP. I propose going with earlier solution proposed by Daryn - substituting "
          " with "/", only if it is Windows. Given that this is an incomplete fix, lets do this change in a new jira. We could keep this jira open for more complete solution/testing.

          Show
          Suresh Srinivas added a comment - This should be coordinated with HADOOP-8079 , no? Why? Once the decision is made to support only / based paths, HADOOP-8079 will work also with that path format right? Given that this is a blocker for 0.23, and can cause accidental deletion of data, we need to fix this ASAP. I propose going with earlier solution proposed by Daryn - substituting " " with "/", only if it is Windows. Given that this is an incomplete fix, lets do this change in a new jira. We could keep this jira open for more complete solution/testing.
          Hide
          Doug Cutting added a comment -

          It looks reasonable to me, but someone should probably run tests on Windows before it is committed, don't you think?

          Show
          Doug Cutting added a comment - It looks reasonable to me, but someone should probably run tests on Windows before it is committed, don't you think?
          Hide
          Alexander Stojanovic added a comment -

          We are only using backslashes () for escaping meta-characters and not as alternative path-separators. So, things should work in a uniform manner.

          Show
          Alexander Stojanovic added a comment - We are only using backslashes () for escaping meta-characters and not as alternative path-separators. So, things should work in a uniform manner.
          Hide
          Doug Cutting added a comment -

          > Why? Once the decision is made to support only / based paths, HADOOP-8079 will work also with that path format right?

          Since the folks working on that will have to pick up the pieces then perhaps they ought to be consulted first, or at least warned?

          Show
          Doug Cutting added a comment - > Why? Once the decision is made to support only / based paths, HADOOP-8079 will work also with that path format right? Since the folks working on that will have to pick up the pieces then perhaps they ought to be consulted first, or at least warned?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518078/HADOOP-8139-5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.fs.viewfs.TestViewFsTrash

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/701//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/701//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518078/HADOOP-8139-5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsTrash +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/701//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/701//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Alexander filed HADOOP-8079, so I think he's been warned. Alexander, if I understand you correctly, the only change needed to Path is to completely remove the \ to / substitution?

          Show
          Daryn Sharp added a comment - Alexander filed HADOOP-8079 , so I think he's been warned. Alexander, if I understand you correctly, the only change needed to Path is to completely remove the \ to / substitution?
          Hide
          Daryn Sharp added a comment -

          I think what it's coming down to is HADOOP-8139-2.patch is the desired implementation. Alexander, please confirm? Doug, is this acceptable?

          We have internal customer testing blocked by this jira, so we please need to find an acceptable closure today.

          Show
          Daryn Sharp added a comment - I think what it's coming down to is HADOOP-8139 -2.patch is the desired implementation. Alexander, please confirm? Doug, is this acceptable? We have internal customer testing blocked by this jira, so we please need to find an acceptable closure today.
          Hide
          Doug Cutting added a comment -

          > Doug, is this acceptable?

          I trust Alexander on this one. If he's okay with it I'm okay with it.

          Show
          Doug Cutting added a comment - > Doug, is this acceptable? I trust Alexander on this one. If he's okay with it I'm okay with it.
          Hide
          Suresh Srinivas added a comment -

          It looks reasonable to me, but someone should probably run tests on Windows before it is committed, don't you think?

          If it is the simpler version of the patch, it would be good run some tests on windows. However, if windows is not available for testing, I am okay with it being not tested on Windows. From what I know for a while windows tests have not been run on Hadoop.

          Show
          Suresh Srinivas added a comment - It looks reasonable to me, but someone should probably run tests on Windows before it is committed, don't you think? If it is the simpler version of the patch, it would be good run some tests on windows. However, if windows is not available for testing, I am okay with it being not tested on Windows. From what I know for a while windows tests have not been run on Hadoop.
          Hide
          Doug Cutting added a comment -

          > From what I know for a while windows tests have not been run on Hadoop.

          If tests have not been run in a while is that a license to knowingly break things and make incompatible changes?

          Show
          Doug Cutting added a comment - > From what I know for a while windows tests have not been run on Hadoop. If tests have not been run in a while is that a license to knowingly break things and make incompatible changes?
          Hide
          Suresh Srinivas added a comment -

          If tests have not been run in a while is that a license to knowingly break things and make incompatible changes?

          Doug, I am confused by this statement.

          The last proposal was to add {{ if (windows) }} check for the code that replaces "
          " with "/". For a larger change of removing support for "
          " altogether, I agree, we need more elaborate tests. So for that one line change, I was suggesting, no windows testing is needed, given no one seems to have windows machines.

          Show
          Suresh Srinivas added a comment - If tests have not been run in a while is that a license to knowingly break things and make incompatible changes? Doug, I am confused by this statement. The last proposal was to add {{ if (windows) }} check for the code that replaces " " with "/". For a larger change of removing support for " " altogether, I agree, we need more elaborate tests. So for that one line change, I was suggesting, no windows testing is needed, given no one seems to have windows machines.
          Hide
          Suresh Srinivas added a comment -

          > code that replaces "
          code that replaces backslash

          Show
          Suresh Srinivas added a comment - > code that replaces " code that replaces backslash
          Hide
          Daryn Sharp added a comment -

          Per request by Suresh, posting a patch that fixes unix, and leaves windows unchanged.

          Show
          Daryn Sharp added a comment - Per request by Suresh, posting a patch that fixes unix, and leaves windows unchanged.
          Hide
          Doug Cutting added a comment -

          Sorry, Suresh, I thought we were discussing just removing the line.

          Wrapping it in 'if (WINDOWS)

          { ... }

          ' fixes the issue for Unix but not for Windows, so a follow up issue for Windows should probably be filed, no? I'll let others determine its priority, but if this is a blocker for Unix users, that one should probably be a blocker for Windows users. Perhaps not for 0.23.2, which is still a beta, but probably for the next stable release, which I expect should support Windows, since prior stable releases do. Does that make sense?

          Show
          Doug Cutting added a comment - Sorry, Suresh, I thought we were discussing just removing the line. Wrapping it in 'if (WINDOWS) { ... } ' fixes the issue for Unix but not for Windows, so a follow up issue for Windows should probably be filed, no? I'll let others determine its priority, but if this is a blocker for Unix users, that one should probably be a blocker for Windows users. Perhaps not for 0.23.2, which is still a beta, but probably for the next stable release, which I expect should support Windows, since prior stable releases do. Does that make sense?
          Hide
          Suresh Srinivas added a comment -

          Wrapping it in 'if (WINDOWS) { ... }' fixes the issue for Unix but not for Windows, so a follow up issue for Windows should probably be filed, no?

          Agreed. I was actually thinking of doing the simple change in a new jira and make this jira blocker for 0.23. That way the history of these discussions are preserved.

          Show
          Suresh Srinivas added a comment - Wrapping it in 'if (WINDOWS) { ... }' fixes the issue for Unix but not for Windows, so a follow up issue for Windows should probably be filed, no? Agreed. I was actually thinking of doing the simple change in a new jira and make this jira blocker for 0.23. That way the history of these discussions are preserved.
          Hide
          Suresh Srinivas added a comment -

          I created HADOOP-8164. This jira will be used for removing the back slash based path support, once the required testing on Windows is done.

          Show
          Suresh Srinivas added a comment - I created HADOOP-8164 . This jira will be used for removing the back slash based path support, once the required testing on Windows is done.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12518120/HADOOP-8139-6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/703//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/703//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518120/HADOOP-8139-6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/703//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/703//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          What do people feel is the next step? Should this jira be resolved as won't fix since Alexander seems to be eliminating support for windows native paths?

          Show
          Daryn Sharp added a comment - What do people feel is the next step? Should this jira be resolved as won't fix since Alexander seems to be eliminating support for windows native paths?
          Hide
          Doug Cutting added a comment -

          > Should this jira be resolved as won't fix [ ... ]

          Can meta-characters be escaped on Windows now? If not, then an issue should remain open indicating this, no? We could perhaps create a new sub-issue of HADOOP-8164, or mark this as a dependency of that one.

          Show
          Doug Cutting added a comment - > Should this jira be resolved as won't fix [ ... ] Can meta-characters be escaped on Windows now? If not, then an issue should remain open indicating this, no? We could perhaps create a new sub-issue of HADOOP-8164 , or mark this as a dependency of that one.
          Hide
          Suresh Srinivas added a comment -

          Should this jira be resolved as won't fix

          This bug should still be fixed. As discussed earlier, we need to remove support for back slash based paths, make the changes needed in RLFS and test it on Windows. Alexander is currently working on 1.0 experimental branch. Not sure if he will be able to do this in 23 time frame.

          Show
          Suresh Srinivas added a comment - Should this jira be resolved as won't fix This bug should still be fixed. As discussed earlier, we need to remove support for back slash based paths, make the changes needed in RLFS and test it on Windows. Alexander is currently working on 1.0 experimental branch. Not sure if he will be able to do this in 23 time frame.
          Hide
          Alexander Stojanovic added a comment -

          @doug and @ daryn - Does the presence of * in paths work as expected in Linux/Unix? This seems to be the parallel issue being partially masked by the discussion of \ vs. / in path specification.

          Show
          Alexander Stojanovic added a comment - @doug and @ daryn - Does the presence of * in paths work as expected in Linux/Unix? This seems to be the parallel issue being partially masked by the discussion of \ vs. / in path specification.
          Hide
          Doug Cutting added a comment -

          Hadoop supports globbing in some places, like the command line 'rm'. The glob pattern is passed in as a Path. The glob patterns are defined in the javadoc:

          http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)

          Show
          Doug Cutting added a comment - Hadoop supports globbing in some places, like the command line 'rm'. The glob pattern is passed in as a Path. The glob patterns are defined in the javadoc: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path )
          Hide
          Daryn Sharp added a comment -

          @Alexander - Yes, nearly every FsShell command supports unix style globs. "tail" is the only exception that comes to mind. The problem is the subbing of \ to / rendered it impossible to quote metachars. For instance, \ * turned into /*!

          One of the problems I encountered with trying to modify RLFS is it seems, based only on reading, that local files may be returned with \ or with /. If true, we probably don't want to blindly convert \ to /.

          If we don't want the as-backwards-compatible-as-possible solution I tried to implement, we can try modifying RLFS to sub \ to / during File->Path conversion if File.separatorChar != Path.SEPARATOR. RLFS will also need to override globStatus to convert \ to ^ for quoting of metachars.

          I think this is doable as long as we are willing to sacrifice c:\dir\... in 23.

          Show
          Daryn Sharp added a comment - @Alexander - Yes, nearly every FsShell command supports unix style globs. "tail" is the only exception that comes to mind. The problem is the subbing of \ to / rendered it impossible to quote metachars. For instance, \ * turned into /*! One of the problems I encountered with trying to modify RLFS is it seems, based only on reading, that local files may be returned with \ or with /. If true, we probably don't want to blindly convert \ to /. If we don't want the as-backwards-compatible-as-possible solution I tried to implement, we can try modifying RLFS to sub \ to / during File->Path conversion if File.separatorChar != Path.SEPARATOR. RLFS will also need to override globStatus to convert \ to ^ for quoting of metachars. I think this is doable as long as we are willing to sacrifice c:\dir\... in 23.
          Hide
          Daryn Sharp added a comment -

          I do not have the resources or knowledge necessary to test on windows. I hope a windows user will find one of my patches useful.

          Show
          Daryn Sharp added a comment - I do not have the resources or knowledge necessary to test on windows. I hope a windows user will find one of my patches useful.

            People

            • Assignee:
              Unassigned
              Reporter:
              Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development