Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11935

Provide optional native implementation of stat syscall.

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: fs, native
    • Labels:
      None

      Description

      Currently, RawLocalFileSystem.DeprecatedRawLocalFileStatus#loadPermissionInfo is implemented as forking an ls command and parsing the output. This was observed to be a bottleneck in YARN-3491. This issue proposes an optional native implementation of a stat syscall through JNI. We would maintain the existing code as a fallback for systems where the native code is not available.

        Issue Links

          Activity

          Hide
          cnauroth Chris Nauroth added a comment -

          It might turn out that what we really want is lstat. We need to review the existing FileSystem semantics and make sure we preserve that behavior.

          http://linux.die.net/man/2/stat

          Show
          cnauroth Chris Nauroth added a comment - It might turn out that what we really want is lstat . We need to review the existing FileSystem semantics and make sure we preserve that behavior. http://linux.die.net/man/2/stat
          Show
          gopalv Gopal V added a comment - Chris Nauroth : http://docs.oracle.com/javase/7/docs/api/java/nio/file/LinkOption.html#NOFOLLOW_LINKS?
          Hide
          cnauroth Chris Nauroth added a comment -

          Gopal V, thanks for the pointer to getting lstat-like behavior through java.nio.file.

          The last time I tested, the java.nio.file classes were incapable of returning a POSIX permissions view on Windows. It would throw an unchecked exception. I suspect we'll still need a native code path for Windows, where we can call through to our existing libwinutils implementation of the mapping from NTFS ACLs to POSIX permissions.

          Maybe we can get away without a native code path for Linux though. Thanks for the suggestion.

          Show
          cnauroth Chris Nauroth added a comment - Gopal V , thanks for the pointer to getting lstat-like behavior through java.nio.file. The last time I tested, the java.nio.file classes were incapable of returning a POSIX permissions view on Windows. It would throw an unchecked exception. I suspect we'll still need a native code path for Windows, where we can call through to our existing libwinutils implementation of the mapping from NTFS ACLs to POSIX permissions. Maybe we can get away without a native code path for Linux though. Thanks for the suggestion.
          Hide
          cmccabe Colin P. McCabe added a comment -

          +1 for using java.nio.file on Linux. On Windows we can use winutils like always. The nice thing is, we won't need multiple code paths. The native library is mandatory on Windows, and nio will always be available on Linux.

          Show
          cmccabe Colin P. McCabe added a comment - +1 for using java.nio.file on Linux. On Windows we can use winutils like always. The nice thing is, we won't need multiple code paths. The native library is mandatory on Windows, and nio will always be available on Linux.
          Hide
          zxu zhihai xu added a comment -

          It looks like java.nio.file doesn't support sticky bit in PosixFilePermission.
          But org.apache.hadoop.fs.permission.FsPermission supports sticky bit.

          Show
          zxu zhihai xu added a comment - It looks like java.nio.file doesn't support sticky bit in PosixFilePermission . But org.apache.hadoop.fs.permission.FsPermission supports sticky bit .
          Hide
          zxu zhihai xu added a comment -

          Since java.nio.file doesn't support sticky bit and NativeIO.POSIX#getFstat supports sticky bit, I compared the performance between NativeIO.POSIX and the current shell-based implementation. I find it can speed up 3-4x using NativeIO.POSIX with the attached preliminary patch.

          Show
          zxu zhihai xu added a comment - Since java.nio.file doesn't support sticky bit and NativeIO.POSIX#getFstat supports sticky bit, I compared the performance between NativeIO.POSIX and the current shell-based implementation. I find it can speed up 3-4x using NativeIO.POSIX with the attached preliminary patch.
          Hide
          aw Allen Wittenauer added a comment -
           forking an ls command and parsing the output. 
          

          Yikes. This is not portable at all when usernames > 8 characters are in play.

          Show
          aw Allen Wittenauer added a comment - forking an ls command and parsing the output. Yikes. This is not portable at all when usernames > 8 characters are in play.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Yikes. [the ls command] is not portable at all when usernames > 8 characters are in play.

          Actually, we don't use the ls command-- we use the stat command. It works fine with usernames greater than 8 characters.

          keter:/home/cmccabe $ stat -Lc "%s,%F,%Y,%X,%a,%U,%G,%N" /tmp/t
          708166,regular file,1434479406,1434479406,644,supercalifragalistic,users,‘/tmp/t’
          

          Parsing is messy, and forking is slow. I was hoping that we could move to Java7 nio. Sadly, it looks like due to the sticky bit, we won't be able to. But using JNI for stat would still be a great idea when libhadoop.so is present.

          Show
          cmccabe Colin P. McCabe added a comment - Yikes. [the ls command] is not portable at all when usernames > 8 characters are in play. Actually, we don't use the ls command-- we use the stat command. It works fine with usernames greater than 8 characters. keter:/home/cmccabe $ stat -Lc "%s,%F,%Y,%X,%a,%U,%G,%N" /tmp/t 708166,regular file,1434479406,1434479406,644,supercalifragalistic,users,‘/tmp/t’ Parsing is messy, and forking is slow. I was hoping that we could move to Java7 nio. Sadly, it looks like due to the sticky bit, we won't be able to. But using JNI for stat would still be a great idea when libhadoop.so is present.
          Hide
          cnauroth Chris Nauroth added a comment -

          I just deleted a comment I entered that was really meant for HADOOP-12603. (Sorry for email spam to the watchers.)

          Show
          cnauroth Chris Nauroth added a comment - I just deleted a comment I entered that was really meant for HADOOP-12603 . (Sorry for email spam to the watchers.)
          Hide
          alanburlison Alan Burlison added a comment -

          stat isn't a POSIX command, it is part of GNU coreutils.

          Show
          alanburlison Alan Burlison added a comment - stat isn't a POSIX command, it is part of GNU coreutils.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user makefu opened a pull request:

          https://github.com/apache/hadoop/pull/81

          Shell.java: fix absolute path `/bin/ls`

          This commit uses `ls` from PATH instead of relying on `ls` being stored in `/bin/`. The only file according to the POSIX standard which must be stored in `/bin/` is `sh`.
          This fixes issues plaguing distributions like NixOS which dynamically stitch together a PATH.

          This fix is loosly related to https://issues.apache.org/jira/browse/HADOOP-11935 and more specifically related to https://mail-archives.apache.org/mod_mbox/hadoop-user/201512.mbox/%3CCAH2nEUgsSeoJTJkD4T8z=D18ECnRgQ3Qvz861gU5+NPSsNgT=A@mail.gmail.com%3E

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/makefu/hadoop patch-1

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/hadoop/pull/81.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #81


          commit 4e8f93d8eecce551b1eebd531c71e3c7144c9210
          Author: Felix Richter <github@syntax-fehler.de>
          Date: 2016-02-29T08:08:06Z

          Shell.java: fix absolute path `/bin/ls`

          This commit uses `ls` from PATH instead of relying on `ls` being stored in `/bin/`. The only file according to the POSIX standard which must be stored in `/bin/` is `sh`.
          This fixes issues plaguing distributions like NixOS which dynamically stitch together a PATH.

          This fix is loosly related to https://issues.apache.org/jira/browse/HADOOP-11935 and more specifically related to https://mail-archives.apache.org/mod_mbox/hadoop-user/201512.mbox/%3CCAH2nEUgsSeoJTJkD4T8z=D18ECnRgQ3Qvz861gU5+NPSsNgT=A@mail.gmail.com%3E


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user makefu opened a pull request: https://github.com/apache/hadoop/pull/81 Shell.java: fix absolute path `/bin/ls` This commit uses `ls` from PATH instead of relying on `ls` being stored in `/bin/`. The only file according to the POSIX standard which must be stored in `/bin/` is `sh`. This fixes issues plaguing distributions like NixOS which dynamically stitch together a PATH. This fix is loosly related to https://issues.apache.org/jira/browse/HADOOP-11935 and more specifically related to https://mail-archives.apache.org/mod_mbox/hadoop-user/201512.mbox/%3CCAH2nEUgsSeoJTJkD4T8z=D18ECnRgQ3Qvz861gU5+NPSsNgT=A@mail.gmail.com%3E You can merge this pull request into a Git repository by running: $ git pull https://github.com/makefu/hadoop patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/81.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #81 commit 4e8f93d8eecce551b1eebd531c71e3c7144c9210 Author: Felix Richter <github@syntax-fehler.de> Date: 2016-02-29T08:08:06Z Shell.java: fix absolute path `/bin/ls` This commit uses `ls` from PATH instead of relying on `ls` being stored in `/bin/`. The only file according to the POSIX standard which must be stored in `/bin/` is `sh`. This fixes issues plaguing distributions like NixOS which dynamically stitch together a PATH. This fix is loosly related to https://issues.apache.org/jira/browse/HADOOP-11935 and more specifically related to https://mail-archives.apache.org/mod_mbox/hadoop-user/201512.mbox/%3CCAH2nEUgsSeoJTJkD4T8z=D18ECnRgQ3Qvz861gU5+NPSsNgT=A@mail.gmail.com%3E
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/hadoop/pull/81

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/hadoop/pull/81

            People

            • Assignee:
              Unassigned
              Reporter:
              cnauroth Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:

                Development