Hadoop Common
  1. Hadoop Common
  2. HADOOP-4106

add time, permission and user attribute support to fuse-dfs

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added time, permission and user attribute support to libhdfs.

      Description

      add:

      dfs_chown
      dfs_utime
      dfs_chmod

      Change open to have its own FS on writes (should we do this on reads too??) and use it for writes and disconnect when closing the file
      Chane mkdir to open the FS itself and then close it
      also added comments for dfs_access (which needs FileSystem support/libhdfs support) and I added the dfs_symlink and truncate since these 3 are the only 3 things left as far as functionality.

      1. HADOOP-4106.txt
        37 kB
        Pete Wyckoff
      2. HADOOP-4106.txt
        36 kB
        Pete Wyckoff
      3. HADOOP-4106.txt
        19 kB
        Pete Wyckoff
      4. HADOOP-4106.txt
        12 kB
        Pete Wyckoff

        Issue Links

          Activity

          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #611 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/611/ )
          Hide
          Zheng Shao added a comment -

          Committed. Thanks Pete.
          Committed revision 696777.

          Show
          Zheng Shao added a comment - Committed. Thanks Pete. Committed revision 696777.
          Hide
          Craig Macdonald added a comment -

          +1. Commit.

          C

          Show
          Craig Macdonald added a comment - +1. Commit. C
          Hide
          Pete Wyckoff added a comment -

          I am basically +1 on committing this w/o access, so could others vote +1 or -1 on committing this w/o being blocked by having the access method? (ie HADOOP-4108).

          Again the problem with this patch is people will get an I/O error when accessing a file they don't have permission to rather than a permission error. w/o this patch, there are no access privileges as fuse always connects as the super user.

          So, I think the patch is an improvement.

          peter

          Show
          Pete Wyckoff added a comment - I am basically +1 on committing this w/o access, so could others vote +1 or -1 on committing this w/o being blocked by having the access method? (ie HADOOP-4108 ). Again the problem with this patch is people will get an I/O error when accessing a file they don't have permission to rather than a permission error. w/o this patch, there are no access privileges as fuse always connects as the super user. So, I think the patch is an improvement. peter
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12390211/HADOOP-4106.txt
          against trunk revision 696130.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12390211/HADOOP-4106.txt against trunk revision 696130. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3280/console This message is automatically generated.
          Hide
          Pete Wyckoff added a comment -

          Does the fuse-dfs currently let me read/write files that I dont have permission to do so?

          Yes, it does unconditionally. So, this patch will improve things. the only thing is you will get an IO error rather than a correct error message saying you don't have permissions. So, in that respect, this patch is an improvement. Albeit, getting an IO error rather than a permission error is a little weird. I guess the question is whether it's preferable to let people see files unconditionally or get an IO error when they do??

          I think getting 4108 committed by Friday isn't possible.

          An alternative would be to assume -EPERM when libhdfs gives an error, but that would be pretty hacky.

          Show
          Pete Wyckoff added a comment - Does the fuse-dfs currently let me read/write files that I dont have permission to do so? Yes, it does unconditionally. So, this patch will improve things. the only thing is you will get an IO error rather than a correct error message saying you don't have permissions. So, in that respect, this patch is an improvement. Albeit, getting an IO error rather than a permission error is a little weird. I guess the question is whether it's preferable to let people see files unconditionally or get an IO error when they do?? I think getting 4108 committed by Friday isn't possible. An alternative would be to assume -EPERM when libhdfs gives an error, but that would be pretty hacky.
          Hide
          Allen Wittenauer added a comment -

          Now that I think about it, access() may not be necessary for chown()/chmod().

          In particular, I'm thinking of a bug that happened in Solaris cpio (or was it pax?) eons ago where a user had a privilege assigned that allowed that user to chown() files. But that user couldn't actually read the files. So doing a raw chown() with no access check was the right thing to do.

          If Hadoop were to mature enough to have a role system, doing access() followed by chown()/chmod() would be the wrong thing to do.

          So, yes, I think it is ok to do chown()/chmod() without access() and just let the EPERM error or whatever Hadoop is throwing back to the user.

          Of course, this begs the question about what FUSE will do under roles, but that's a different discussion for a different day.

          Show
          Allen Wittenauer added a comment - Now that I think about it, access() may not be necessary for chown()/chmod(). In particular, I'm thinking of a bug that happened in Solaris cpio (or was it pax?) eons ago where a user had a privilege assigned that allowed that user to chown() files. But that user couldn't actually read the files. So doing a raw chown() with no access check was the right thing to do. If Hadoop were to mature enough to have a role system, doing access() followed by chown()/chmod() would be the wrong thing to do. So, yes, I think it is ok to do chown()/chmod() without access() and just let the EPERM error or whatever Hadoop is throwing back to the user. Of course, this begs the question about what FUSE will do under roles, but that's a different discussion for a different day.
          Hide
          Pete Wyckoff added a comment -

          I am assuming the Hadoop API will prevent this and the user will get an IO error - (which has happened to me ), but I should add a unit test for this and see if there's a way to return -EPERM.

          Show
          Pete Wyckoff added a comment - I am assuming the Hadoop API will prevent this and the user will get an IO error - (which has happened to me ), but I should add a unit test for this and see if there's a way to return -EPERM.
          Hide
          Allen Wittenauer added a comment - - edited

          I haven't looked at the patch, but how do you determine if the user is permitted to change the ownership/permission of a file without access() being implemented?

          Show
          Allen Wittenauer added a comment - - edited I haven't looked at the patch, but how do you determine if the user is permitted to change the ownership/permission of a file without access() being implemented?
          Hide
          Craig Macdonald added a comment -

          I dont see how this patch can be committed without HADOOP-4104. We know that it is blocked by it.

          Wrt HADOOP-4108, I'm not sure. You need to get (a) a consensus, (b) an acceptable patch, and (c) the patch committed by Friday.
          I think the correction question is whether HADOOP-4108 should be a blocker for this patch.

          Does the fuse-dfs currently let me read/write files that I dont have permission to do so?

          C

          Show
          Craig Macdonald added a comment - I dont see how this patch can be committed without HADOOP-4104 . We know that it is blocked by it. Wrt HADOOP-4108 , I'm not sure. You need to get (a) a consensus, (b) an acceptable patch, and (c) the patch committed by Friday. I think the correction question is whether HADOOP-4108 should be a blocker for this patch. Does the fuse-dfs currently let me read/write files that I dont have permission to do so? C
          Hide
          Pete Wyckoff added a comment -

          Hi Craig,

          I could add a #define HDFS_ACCESS_T with a call to an as yet unimplemented libhdfs API. It is a little confusing i agree to show permissions and be capable of changing them but the FS doesn't respect them.

          Do you want HADOOP-4104 to be a blocker, or do you want to +1 this patch? I'm fine either way, as in some sense you are right it is confusing.

          thanks, pete

          Show
          Pete Wyckoff added a comment - Hi Craig, I could add a #define HDFS_ACCESS_T with a call to an as yet unimplemented libhdfs API. It is a little confusing i agree to show permissions and be capable of changing them but the FS doesn't respect them. Do you want HADOOP-4104 to be a blocker, or do you want to +1 this patch? I'm fine either way, as in some sense you are right it is confusing. thanks, pete
          Hide
          Pete Wyckoff added a comment -

          Yes, getting everything else in other than actually respecting the ACLs is a little bit of a hollow victory

          I'm not even sure there's complete consensus on https://issues.apache.org/jira/browse/HADOOP-4108. The fuse-dfs change is trivial as should the libhdfs change. Don't know if someone can get 4108 in on time for the 19 feature freeze on Friday I think.

          Show
          Pete Wyckoff added a comment - Yes, getting everything else in other than actually respecting the ACLs is a little bit of a hollow victory I'm not even sure there's complete consensus on https://issues.apache.org/jira/browse/HADOOP-4108 . The fuse-dfs change is trivial as should the libhdfs change. Don't know if someone can get 4108 in on time for the 19 feature freeze on Friday I think.
          Hide
          Pete Wyckoff added a comment -

          added

          AC_FUNC_GETGROUPS
          AC_TYPE_GETGROUPS

          to and check GETGROUPS_T in fuse_dfs.c as Craig suggested

          Show
          Pete Wyckoff added a comment - added AC_FUNC_GETGROUPS AC_TYPE_GETGROUPS to and check GETGROUPS_T in fuse_dfs.c as Craig suggested
          Hide
          Craig Macdonald added a comment - - edited

          Looks good.

          1. What are the side-effects of not having dfs_access implemented?
          2. Should HAVE_GETGROUPS be defined by the autoconf macro AC_FUNC_GETGROUPS?

          C

          Show
          Craig Macdonald added a comment - - edited Looks good. 1. What are the side-effects of not having dfs_access implemented? 2. Should HAVE_GETGROUPS be defined by the autoconf macro AC_FUNC_GETGROUPS? C
          Hide
          Pete Wyckoff added a comment - - edited

          fixes all the problems and I added all the groups using Craig's other patch.

          ant test-contrib -Dfusedfs=1 -Dlibhdfs=1 -Dtestcase=TestFuseDFS

          passes

          (with hadoop-4014 patch applied - the patch for libhdfs to expose these apis).

          Show
          Pete Wyckoff added a comment - - edited fixes all the problems and I added all the groups using Craig's other patch. ant test-contrib -Dfusedfs=1 -Dlibhdfs=1 -Dtestcase=TestFuseDFS passes (with hadoop-4014 patch applied - the patch for libhdfs to expose these apis).
          Hide
          Craig Macdonald added a comment -

          Pete,

          1.
          Is it for connectAsUser or chown that you are having the issue?

          See my attachment for HADOOP-3536. This worked OK for me on Linux, and MacOSX IIRC.

          2. Also, I was thinking about the gid_t uid = context->gid;

          Surely this only gives the default group of the user? How would this work in my setting, where my default group represents my status, while most files are protected on a groups which are assigned on a project level basis?

          In the RPC mechanism used by NFS, the following struct is used to pass authentication information from client to server (source: https://trac.eecs.iu-bremen.de/projects/scotty/browser/branches/tubs/compat/rpc/auth_unix.h?rev=885):

          struct authunix_parms {
                  u_long   aup_time;
                  char    *aup_machname;
                  int      aup_uid;
                  int      aup_gid;
                  u_int    aup_len;
                  int     *aup_gids;
          };
          

          The point s that a process accessing a file in the NFS client has a uid and a gid. However, the NFS client supplements the gid with the other gids that the user belongs to. We should do likewise.

          Show
          Craig Macdonald added a comment - Pete, 1. Is it for connectAsUser or chown that you are having the issue? See my attachment for HADOOP-3536 . This worked OK for me on Linux, and MacOSX IIRC. 2. Also, I was thinking about the gid_t uid = context->gid; Surely this only gives the default group of the user? How would this work in my setting, where my default group represents my status, while most files are protected on a groups which are assigned on a project level basis? In the RPC mechanism used by NFS, the following struct is used to pass authentication information from client to server (source: https://trac.eecs.iu-bremen.de/projects/scotty/browser/branches/tubs/compat/rpc/auth_unix.h?rev=885): struct authunix_parms { u_long aup_time; char *aup_machname; int aup_uid; int aup_gid; u_int aup_len; int *aup_gids; }; The point s that a process accessing a file in the NFS client has a uid and a gid. However, the NFS client supplements the gid with the other gids that the user belongs to. We should do likewise.
          Hide
          Pete Wyckoff added a comment -

          I can't make chown work yet. having trouble with getpwuid - returning:

                        struct passwd {
                              char    *pw_name;      /* user name */
                              char    *pw_passwd;    /* user password */
                              uid_t   pw_uid;        /* user id */
                              gid_t   pw_gid;        /* group id */
                              char    *pw_gecos;     /* real name */
                              char    *pw_dir;       /* home directory */
                              char    *pw_shell;     /* shell program */
                        };
          

          and pw_name is returning the password field in the /etc/passwd file instead of the login name.

          could use help if anyone knows about these pwd.h apis.

          Show
          Pete Wyckoff added a comment - I can't make chown work yet. having trouble with getpwuid - returning: struct passwd { char *pw_name; /* user name */ char *pw_passwd; /* user password */ uid_t pw_uid; /* user id */ gid_t pw_gid; /* group id */ char *pw_gecos; /* real name */ char *pw_dir; /* home directory */ char *pw_shell; /* shell program */ }; and pw_name is returning the password field in the /etc/passwd file instead of the login name. could use help if anyone knows about these pwd.h apis.
          Hide
          Pete Wyckoff added a comment -

          fix getuid/getgid issues and some bugs. Still doesn't pass the utimes test of touching a file and seeing the modification time updated. I'm not sure why as touch is definitely ending up calling utime, but the mtime doesn't seem to get updated.

          probably just needs another short debugging session to complete the patch.

          Show
          Pete Wyckoff added a comment - fix getuid/getgid issues and some bugs. Still doesn't pass the utimes test of touching a file and seeing the modification time updated. I'm not sure why as touch is definitely ending up calling utime, but the mtime doesn't seem to get updated. probably just needs another short debugging session to complete the patch.
          Hide
          Pete Wyckoff added a comment -

          yes, duh, I knew that but for some reason I must have been thinking or not thinking rather. I don't know why I did that

          I was thinking I could, but that DFS should have it anyway really and might be better to implement all the rules there. If the DFS community doesn't want to do it (although it is a posix api) or cannot do it soon, I will implement it in libhdfs??

          Show
          Pete Wyckoff added a comment - yes, duh, I knew that but for some reason I must have been thinking or not thinking rather. I don't know why I did that I was thinking I could, but that DFS should have it anyway really and might be better to implement all the rules there. If the DFS community doesn't want to do it (although it is a posix api) or cannot do it soon, I will implement it in libhdfs??
          Hide
          Craig Macdonald added a comment -

          Pete,

          • I dont think doConnectAsUser is correct. You are using the filesystem of the user running fuse_dfs, not the filesystem of the user making an IO operation through fuse_dfs. You should use the fuse_context to determine the uid of the calling user, convert to user/group names, and then pass these to libhdfs.
          fuse_context * context = fuse_get_context ();
          uid_t uid = context->uid;
          gid_t uid = context->gid;
          
          • Could dfs_access be implemented by checking hdfsFileInfo?
          Show
          Craig Macdonald added a comment - Pete, I dont think doConnectAsUser is correct. You are using the filesystem of the user running fuse_dfs, not the filesystem of the user making an IO operation through fuse_dfs. You should use the fuse_context to determine the uid of the calling user, convert to user/group names, and then pass these to libhdfs. fuse_context * context = fuse_get_context (); uid_t uid = context->uid; gid_t uid = context->gid; Could dfs_access be implemented by checking hdfsFileInfo?
          Hide
          Pete Wyckoff added a comment -

          without implementing posix access method, fuse still won't respect permissions. I have opened a JIRA for the FileSystem to implement it and another for fuse to then implement it.

          Show
          Pete Wyckoff added a comment - without implementing posix access method, fuse still won't respect permissions. I have opened a JIRA for the FileSystem to implement it and another for fuse to then implement it.
          Hide
          Pete Wyckoff added a comment -

          This is the code I wrote and compiles and is what I was thinking. May need some debugging and definitely needs unit tests but I though I would post what I have so far.

          Show
          Pete Wyckoff added a comment - This is the code I wrote and compiles and is what I was thinking. May need some debugging and definitely needs unit tests but I though I would post what I have so far.

            People

            • Assignee:
              Pete Wyckoff
              Reporter:
              Pete Wyckoff
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development