Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.0.0-alpha1
-
None
-
Reviewed
Description
The current codebase relies on the Unix readlink command to determine the target of a symlink on the local file system. winutils currently does not support this functionality on Windows. Adding the command to winutils will prevent the need to use GnuWin32 or Cygwin for readlink support.
Attachments
Attachments
- HADOOP-9483.patch
- 13 kB
- Arpit Agarwal
- HADOOP-9483.patch
- 13 kB
- Arpit Agarwal
- HADOOP-9483.007.patch
- 12 kB
- Arpit Agarwal
- HADOOP-9483.005.patch
- 14 kB
- Arpit Agarwal
- HADOOP-9483.004.patch
- 14 kB
- Arpit Agarwal
- HADOOP-9483.003.patch
- 13 kB
- Arpit Agarwal
Issue Links
- is depended upon by
-
HADOOP-9527 Add symlink support to LocalFileSystem on Windows
- Closed
- is part of
-
HADOOP-8562 Enhancements to support Hadoop on Windows Server and Windows Azure environments
- Closed
- relates to
-
HADOOP-9043 disallow in winutils creating symlinks with forwards slashes
- Closed
Activity
The trunk code still has a Unix command dependency on the readlink command for determining the target of a symlink. (See RawLocalFs#readLink.) This is the source of some of the test failures.
I propose the addition of a new "winutils readlink" command. The logic of this command would be:
- Call CreateFile with OPEN_EXISTING and FILE_FLAG_OPEN_REPARSE_POINT.
- Call GetFileInformationByHandle.
- Check the BY_HANDLE_FILE_INFORMATION for the presence of FILE_ATTRIBUTE_REPARSE_POINT.
- If not a reparse point, exit with code 1 and print nothing to stdout. (This is what Unix readlink does.)
- Call DeviceIoControl with FSCTL_GET_REPARSE_POINT.
- Get the REPARSE_DATA_BUFFER structure.
- Check if ReparseTag is IO_REPARSE_TAG_SYMLINK.
- If not IO_REPARSE_TAG_SYMLINK, then...?
- Get target from SymbolicLinkReparseBuffer.
- Print target to stdout and exit with code 0. (This is what Unix readlink does.)
Feedback on the above proposal from Ivan:
Thanks Chris for the writeup on readLink! The approach overall looks good. I have a few comments that should help with further progress:
1. There is no need to support junction points. Let's only focus on symlinks. We've seen problems with junction points on Windows for paths longer than 126 chars (IIRC) and finally decided to stay out of supporting them.
2. In winutils we already have SymbolicLinkCheck function which checks if the given path is a symlink. See if you can reconcile this with the new code.
3. DeviceIoControl is one way to retrieve the symlink target path. GetFinalPathNameByHandle provides the similar functionality and might be easier to use. See which one suits you better. (it is fine to assume that we’re running on Vista or later Windows OS).
Feedback on the above proposal from Chuan:
On the readlink implementation I agree with the general approach Chris has described.
Some logic is already implemented in libwinutils as Ivan mentioned.
I also think GetFinalPathNameByHandle() may be easier to use than the DeviceIoControl().
Ivan and Chuan, thank you for the feedback. A couple of follow-up questions:
First, regarding DeviceIoControl vs. GetFinalPathNameByHandle, I have not been able to determine from reading documentation if the "final" path name would also get canonicalized/made absolute. The behavior of Unix readlink is that it does not make the returned value absolute/canonicalized, so a relative path in a symlink would be preserved as is. For example:
chris@Chriss-MacBook-Pro:ttys000] ~ > pwd /Users/chris [chris@Chriss-MacBook-Pro:ttys000] ~ > mkdir testlink [chris@Chriss-MacBook-Pro:ttys000] ~ > cd testlink [chris@Chriss-MacBook-Pro:ttys000] testlink > ln -sfn .././hello [chris@Chriss-MacBook-Pro:ttys000] testlink > ll total 8 lrwxr-xr-x 1 chris staff 10B Apr 17 10:45 hello@ -> .././hello [chris@Chriss-MacBook-Pro:ttys000] testlink > readlink hello .././hello
Link to docs on GetFinalPathNameByHandle: http://msdn.microsoft.com/en-us/library/windows/desktop/aa364962(v=vs.85).aspx
If you're not sure of the answer, then I'll experiment with the 2 functions to determine the behavior.
Second, when we say that we should not support junction points, does that mean that winutils readlink should fail fast and return an error if it's being called on a junction point?
First, regarding DeviceIoControl vs. GetFinalPathNameByHandle, I have not been able to determine from reading documentation if the "final" path name would also get canonicalized/made absolute. The behavior of Unix readlink is that it does not make the returned value absolute/canonicalized, so a relative path in a symlink would be preserved as is.
Good question. I'm also not sure, you'll have to test this out.
Second, when we say that we should not support junction points, does that mean that winutils readlink should fail fast and return an error if it's being called on a junction point?
Thanks, failing early would be nice to have functionality (although not strictly needed imo). There is an existing winutils API that checks whether the given path is a junction point, so this should be easy to add.
MSDN says:
A final path is the path that is returned when a path is fully resolved. For example, for a symbolic link named "C:\tmp\mydir" that points to "D:\yourdir", the final path would be "D:\yourdir".
The string that is returned by this function uses the \\?\ syntax.
So I think it will be full path prefixed with "\\?\".
Sorry for not responding on this Jira earlier. Thanks for the detailed research into this issue Chris. You've laid out the options very clearly.
3. DeviceIoControl is one way to retrieve the symlink target path. GetFinalPathNameByHandle provides the similar functionality and might be easier to use. See which one suits you better. (it is fine to assume that we’re running on Vista or later Windows OS).
Using GetFinakPathNameByHandle is a lossy operation. It retrieves a UNC path which cannot be converted back to a drive letter path in all cases. I don't think we want to start dealing with UNC paths and even the relative-->absolute path conversion seems problematic for us.
FSCTL_GET_REPARSE_POINT retrieves literal link target without normalizing/canonicalizing it. It is closest to the behavior of the POSIX readlink.
Chris, if you haven't started writing code I can take a crack at it.
Arpit, thanks for the additional research on the APIs. It makes sense.
I haven't started any code for this yet, so please feel free to start. I've reassigned the issue to you.
Attaching a patch for "winutils readlink".
I used the FSCTL to get the closest behavior to POSIX readlink.
The initial call to GetFileInformationByHandle is not required as the results of the FSCTL are sufficient to distinguish between sym links and other file types.
Chris's initial research was very helpful.
Thanks Arpit for preparing this patch! This is great. I have some mainly minor comments below, otherwise, this is almost good to go.
readlink.c:
1. Is there an MSDN link you can add here as a reference:
// The Windows SDK does not include the definition of REPARSE_DATA_BUFFER. To // avoid adding a dependency on the WDK we define the structure here.
2. To be able to handle input paths longer then MAX_PATH I think you'll have to add a long path prefix before calling into CreateFile API. We already have an API available in winutils to convert to a long path, check what we do on other places.
3. Can you please place open and closed curly braces on a new line to be consistent with the rest of the code?
4. Would you mind using hFile instead of h (or something else more descriptive)
HANDLE h = INVALID_HANDLE_VALUE;
5. Should this be if argc != 2?
if (argc == 1) { goto Cleanup; }
6. Can you please use LocalAlloc for memory allocation to be consistent with the rest of the codebase
pPrintName = (PWCHAR) malloc(printNameLength);
7. Please use fwprintf(stdout, ...) instead for consistency.
printf("%ls", pPrintName);
8. Can you use EXIT_SUCCESS and EXIT_FAILURE constants instead? I would also rather use a winerror code dword throughout the function as an indicator that something failed (instead of the 'succeeded' boolean) and at the function end check if it is equal to ERROR_SUCCESS or not.
return (succeeded ? 0 : 1);
other:
9. Should we integrate the readlink functionality with RawLocalFs in this Jira?
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12581318/HADOOP-9483.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2501//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2501//console
This message is automatically generated.
Thanks for the quick review Ivan!
I have addressed all your feedback except for the below.
2. To be able to handle input paths longer then MAX_PATH I think you'll have to add a long path prefix before calling into CreateFile API. We already have an API available in winutils to convert to a long path, check what we do on other places.
I am not sure it is possible to get a path longer than MAX_PATH as input to Readlink since winutils is invoked via the shell. The situation you have handled elsewhere is when the path is passed in via JNI. Please let me know if I am wrong.
Should this be if argc != 2?
Deliberate for consistency with POSIX readlink. All extra arguments are ignored.
I would also rather use a winerror code dword throughout the function as an indicator that something failed (instead of the 'succeeded' boolean) and at the function end check if it is equal to ERROR_SUCCESS or not
It seems to complicates the state, especially since we don't really care about the exact error code for most Win32 calls here. However I don't feel too strongly about it so let me know what you think.
9. Should we integrate the readlink functionality with RawLocalFs in this Jira?
That is intended to be fixed in HADOOP-9527.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12581337/HADOOP-9483.003.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2502//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2502//console
This message is automatically generated.
Thanks Arpit for addressing the comments!
2. I am not sure it is possible to get a path longer than MAX_PATH as input to Readlink since winutils is invoked via the shell. The situation you have handled elsewhere is when the path is passed in via JNI. Please let me know if I am wrong.
Yes, we are doing this for both JNI and command line calls, please fix this.
5. Deliberate for consistency with POSIX readlink. All extra arguments are ignored.
We do a strong param checks today with all other winutils cmd line options. Let's do the same here, otherwise, people might think that the parameter is actually doing something for them.
8. It seems to complicates the state, especially since we don't really care about the exact error code for most Win32 calls here. However I don't feel too strongly about it so let me know what you think.
Fine then. I would have done it the other way around (it can help when you're stepping thru with a debugger, if you don't know about !gle ), but it's not a big deal.
9. That is intended to be fixed in
HADOOP-9527.
Sounds good.
Two additional comments:
10. I see what you're trying to do with the for loop. However, this won't work when you get ERROR_MORE_DATA back, right? I say, let's just allocate a 1024 size array on stack, and if the this is not enough, have the function fail. If in the future this turns out to be a problem, we can fix it (what is very unlikely to happen). Make sense?
11. Can you add a comment in code on CreateFile flags you are passing (or a reference)
Addressed your comments.
10. I see what you're trying to do with the for loop. However, this won't work when you get ERROR_MORE_DATA back, right? I say, let's just allocate a 1024 size array on stack, and if the this is not enough, have the function fail. If in the future this turns out to be a problem, we can fix it (what is very unlikely to happen). Make sense?
It works when the return code is ERROR_MORE_DATA. It can be confirmed with a buffer larger than sizeof(REPARSE_DATA_BUFFER) but not large enough for the path strings. Since we are handling long paths it makes all the more sense to handle this case.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12581476/HADOOP-9483.004.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2507//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2507//console
This message is automatically generated.
Thanks Arpit, looks good. Sorry, just realized that I missed the below comment, otherwise I am +1
Should we allocate an array of size printNameLength+1?
printName = (PWCHAR) LocalAlloc(LMEM_FIXED, printNameLength);
Make TestWinUtils changes consistent with HADOOP-9043 to keep the rebase trivial.
Latest patch looks good, +1
I verified that TestWinUtils passes on Windows. Thanks Arpit!
+1 for the patch. Looks good!
One minor thing: my latest patch on HADOOP-9043 puts the assumeTrue call into a @Before method so that the individual tests don't need to check it. Whichever of these patches gets committed first, the other one can rebase easily.
Thanks for the reviewing, Ivan and Chris.
Chris - I noticed that and I tried to restructure the patch but the conflict seems unavoidable so I'll leave it as it is and rebase later.
Committers, please consider committing HADOOP-9043 first.
+1 for the patch.
Thank you Arpit. Thanks to Chris and Ivan for the reviews.
Integrated in Hadoop-trunk-Commit #3712 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3712/)
HADOOP-9483. Adding a file missed in previous commit r1478592 (Revision 1478633)
HADOOP-9483. winutils support for readlink command. Contributed by Arpit Agarwal. (Revision 1478592)
Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478633
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/readlink.c
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478592
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/main.c
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
Integrated in Hadoop-Yarn-trunk #202 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/202/)
HADOOP-9483. Adding a file missed in previous commit r1478592 (Revision 1478633)
HADOOP-9483. winutils support for readlink command. Contributed by Arpit Agarwal. (Revision 1478592)
Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478633
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/readlink.c
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478592
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/main.c
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
Integrated in Hadoop-Hdfs-trunk #1391 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1391/)
HADOOP-9483. Adding a file missed in previous commit r1478592 (Revision 1478633)
HADOOP-9483. winutils support for readlink command. Contributed by Arpit Agarwal. (Revision 1478592)
Result = FAILURE
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478633
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/readlink.c
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478592
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/main.c
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
Integrated in Hadoop-Mapreduce-trunk #1418 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1418/)
HADOOP-9483. Adding a file missed in previous commit r1478592 (Revision 1478633)
HADOOP-9483. winutils support for readlink command. Contributed by Arpit Agarwal. (Revision 1478592)
Result = SUCCESS
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478633
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/readlink.c
suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1478592
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/main.c
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/winutils/winutils.vcxproj
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
This came from discussion on
HADOOP-9043. I'm pasting the relevant comments from that jira into this one. There are test failures on Windows in TestLocalFSFileContextSymlink due to lack of readlink support.