Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: HDFS ACLs (HDFS-4685)
    • Fix Version/s: HDFS ACLs (HDFS-4685)
    • Component/s: namenode
    • Labels:
      None

      Description

      This patch will include the core logic for modification of ACLs. This includes support for all user-facing APIs that modify ACLs. This will cover access ACLs, default ACLs, automatic mask calculations, automatic inference of unprovided default ACL entries, and validation to prevent creation of an invalid ACL.

      1. HDFS-5673.5.patch
        77 kB
        Chris Nauroth
      2. HDFS-5673.4.patch
        72 kB
        Chris Nauroth
      3. HDFS-5673.3.patch
        75 kB
        Chris Nauroth
      4. HDFS-5673.2.patch
        77 kB
        Chris Nauroth
      5. HDFS-5673.1.patch
        96 kB
        Chris Nauroth

        Issue Links

          Activity

          Hide
          Chris Nauroth added a comment -

          I'm attaching the patch that implements the core ACL modification logic. This turned out to be some fairly tricky logic. It's probably the most complex logic in the HDFS ACLs project (certainly a lot more complex than the permission enforcement is going to be).

          Acl - This class is back. This time, it's private to HDFS, intended for use inside the namenode. It's mostly the same as the old version that we used to have in Common before the API changes, but with the addition of a few more helper methods.

          AclTransformation - This class defines the 5 different operations that can change an ACL. There are quite a few edge cases to consider related to validation rules, mask calculation and inferring default entries if not specified. I needed multiple readings of the POSIX ACL docs to fully understand all of the edge cases.

          http://users.suse.com/~agruen/acl/linux-acls/online/

          TestAclTransformation - Approximately 2/3 of this patch is tests. For every test in this suite, I've run the same scenario against Linux setfacl/getfacl to confirm that the AclTransformation code yields the same result. I ran a coverage report, and I saw 100% coverage of the new code in this patch.

          Show
          Chris Nauroth added a comment - I'm attaching the patch that implements the core ACL modification logic. This turned out to be some fairly tricky logic. It's probably the most complex logic in the HDFS ACLs project (certainly a lot more complex than the permission enforcement is going to be). Acl - This class is back. This time, it's private to HDFS, intended for use inside the namenode. It's mostly the same as the old version that we used to have in Common before the API changes, but with the addition of a few more helper methods. AclTransformation - This class defines the 5 different operations that can change an ACL. There are quite a few edge cases to consider related to validation rules, mask calculation and inferring default entries if not specified. I needed multiple readings of the POSIX ACL docs to fully understand all of the edge cases. http://users.suse.com/~agruen/acl/linux-acls/online/ TestAclTransformation - Approximately 2/3 of this patch is tests. For every test in this suite, I've run the same scenario against Linux setfacl/getfacl to confirm that the AclTransformation code yields the same result. I ran a coverage report, and I saw 100% coverage of the new code in this patch.
          Hide
          Chris Nauroth added a comment -

          One more note. From looking at this patch alone, it might not be obvious why I chose to wrap the logic in function objects. The reason for this is that all of the ACL modification methods in FSNamesystem are going to have the same overall workflow: 1) acquire write lock, 2) get existing ACL from inode, 3) apply modification, 4) set new ACL on inode, 5) log edit, 6) release write lock. I'm anticipating that the common workflow can go into a helper method, and each ACL modification method can just pass in a different AclTransformation functor.

          Show
          Chris Nauroth added a comment - One more note. From looking at this patch alone, it might not be obvious why I chose to wrap the logic in function objects. The reason for this is that all of the ACL modification methods in FSNamesystem are going to have the same overall workflow: 1) acquire write lock, 2) get existing ACL from inode, 3) apply modification, 4) set new ACL on inode, 5) log edit, 6) release write lock. I'm anticipating that the common workflow can go into a helper method, and each ACL modification method can just pass in a different AclTransformation functor.
          Hide
          Haohui Mai added a comment -

          Acl:

          Acl.Builder can initialize the entries field along with the declaration.
          The constructor of Acl does not need to copy the list.

          AclTransformation:

          It seems to me that you can pass in a wrapper for Acl to indicate that the Acl list is well-formed. For example, you can have the following class:

          class AclList {
            boolean wellFormed;
            List<AclEntry> acls;
          }
          

          and put Preconditions.check(AclList.wellFormed) in the functions of AclTransformation. Of course you might set the variable wellFormed incorrectly, but it is much easier to debug in the other places (and potentially formally verifying the logic as well).

          I'm not sure that why AclTransformation needs to be a functor. It makes sense if you need to iterate through the ACL list for multiple times, otherwise making them as a function will be simpler and more efficient.

          I think the code is probably overly optimized. For example, in filterAclEntriesByAclSpec(), I would probably write:

          List<AclEntry> entries;
          for (AclEntry existingEntry : existingAcl.getEntries()) {
            if (aclSpec.indexOf(existingEntry) != -1) {
              entries.add();
            } else {
              entries.delete();
            }
          }
          sort(entries);
          return new Acl.Builder().setEntries(entries).build();
          

          Obviously it is turning an O ( n ) algorithm to an O(nlogn) algorithm, but n is fairly small in this case (n < 32), and it is much simpler to read and to reason about. The code is security sensitive therefore I would favor of trading some efficiency to simplicity and readability.

          Show
          Haohui Mai added a comment - Acl: Acl.Builder can initialize the entries field along with the declaration. The constructor of Acl does not need to copy the list. AclTransformation: It seems to me that you can pass in a wrapper for Acl to indicate that the Acl list is well-formed. For example, you can have the following class: class AclList { boolean wellFormed; List<AclEntry> acls; } and put Preconditions.check(AclList.wellFormed) in the functions of AclTransformation. Of course you might set the variable wellFormed incorrectly, but it is much easier to debug in the other places (and potentially formally verifying the logic as well). I'm not sure that why AclTransformation needs to be a functor. It makes sense if you need to iterate through the ACL list for multiple times, otherwise making them as a function will be simpler and more efficient. I think the code is probably overly optimized. For example, in filterAclEntriesByAclSpec(), I would probably write: List<AclEntry> entries; for (AclEntry existingEntry : existingAcl.getEntries()) { if (aclSpec.indexOf(existingEntry) != -1) { entries.add(); } else { entries.delete(); } } sort(entries); return new Acl.Builder().setEntries(entries).build(); Obviously it is turning an O ( n ) algorithm to an O(nlogn) algorithm, but n is fairly small in this case (n < 32), and it is much simpler to read and to reason about. The code is security sensitive therefore I would favor of trading some efficiency to simplicity and readability.
          Hide
          Chris Nauroth added a comment -

          Thanks for the review, Haohui. I'm going to get to work on a v2 patch.

          Acl.Builder can initialize the entries field along with the declaration.

          There are 2 constructors right now: one with specified capacity (used by product code, always set to 32) and one without capacity (convenience for test code). I'll change to a single constructor that always uses 32 capacity.

          The constructor of Acl does not need to copy the list.

          If we don't copy the list, then the immutability guarantee is weaker. After building, a caller could go back and add more entries through the builder, which would sneak in and mutate the list. I'd prefer to keep the copy.

          It seems to me that you can pass in a wrapper for Acl to indicate that the Acl list is well-formed.

          I'll investigate this. If I understand correctly, we'd convert an untrusted ACL spec to an instance of the well-formed object immediately, and that's what we would pass around between the internal helper methods. The type signatures would enforce that we're only passing around properly sorted lists, so the precondition checks might not even be required at that point.

          I'm not sure that why AclTransformation needs to be a functor.

          This was intended to avoid code duplication in all of the ACL modification methods in FSNamesystem. In the current HDFS-5658 patch, we're starting to see some of that code duplication related to holding the namesystem lock and checking permissions. With a functor, we'd be able to have a single helper method in FSNameystem implementing that workflow, and each ACL modification method can pass in a different functor instance.

          I think the code is probably overly optimized... Obviously it is turning an O ( n ) algorithm to an O(nlogn) algorithm, but n is fairly small...

          I believe that's actually O(N^2), or perhaps more accurately O(N*M) if you want to track size of the existing ACL and size of the ACL spec separately as N and M respectively. Still, I understand your point. I had an earlier version of the code that did multiple iterations. I found that it didn't really simplify things though. It seems the complexity is driven not by the iteration, but rather the rules for ACLs around automatic mask calculations, copying of some of the default entries from the access entries if unspecified, and enforcing the validation. I'll play around with this a little more though and see if I can find a balance.

          Show
          Chris Nauroth added a comment - Thanks for the review, Haohui. I'm going to get to work on a v2 patch. Acl.Builder can initialize the entries field along with the declaration. There are 2 constructors right now: one with specified capacity (used by product code, always set to 32) and one without capacity (convenience for test code). I'll change to a single constructor that always uses 32 capacity. The constructor of Acl does not need to copy the list. If we don't copy the list, then the immutability guarantee is weaker. After building, a caller could go back and add more entries through the builder, which would sneak in and mutate the list. I'd prefer to keep the copy. It seems to me that you can pass in a wrapper for Acl to indicate that the Acl list is well-formed. I'll investigate this. If I understand correctly, we'd convert an untrusted ACL spec to an instance of the well-formed object immediately, and that's what we would pass around between the internal helper methods. The type signatures would enforce that we're only passing around properly sorted lists, so the precondition checks might not even be required at that point. I'm not sure that why AclTransformation needs to be a functor. This was intended to avoid code duplication in all of the ACL modification methods in FSNamesystem . In the current HDFS-5658 patch, we're starting to see some of that code duplication related to holding the namesystem lock and checking permissions. With a functor, we'd be able to have a single helper method in FSNameystem implementing that workflow, and each ACL modification method can pass in a different functor instance. I think the code is probably overly optimized... Obviously it is turning an O ( n ) algorithm to an O(nlogn) algorithm, but n is fairly small... I believe that's actually O(N^2), or perhaps more accurately O(N*M) if you want to track size of the existing ACL and size of the ACL spec separately as N and M respectively. Still, I understand your point. I had an earlier version of the code that did multiple iterations. I found that it didn't really simplify things though. It seems the complexity is driven not by the iteration, but rather the rules for ACLs around automatic mask calculations, copying of some of the default entries from the access entries if unspecified, and enforcing the validation. I'll play around with this a little more though and see if I can find a balance.
          Hide
          Haohui Mai added a comment -

          If we don't copy the list, then the immutability guarantee is weaker.

          I guess you can use the same idea of building a wrapper for well-formed AclList to avoid the copy. You can do the copy when constructing the wrapper. That way the wrapper consolidate the validation logic and the immutability guarantee is retained.

          In the current HDFS-5658 patch, we're starting to see some of that code duplication related to holding the namesystem lock and checking permission, ...

          The code you referred is around 10 lines per function, I think it is okay to have some redundancy. On the other hand, The AclTransformation instances are not purely functional – they do have state. It seems that the particular implementation of AclTransformation is losing both the bright side of functional and imperative programming model. Therefore I think it is okay to have some duplication here, it does not make the overall architecture worse. If this is really a concern, it can be fixed easily with other techniques, e.g., aspectJ.

          I believe that's actually O(N^2), or perhaps more accurately O(N*M)

          It seems to me that it requires no more than a merge sort. Using binary search should give you O(nlogn).

          Show
          Haohui Mai added a comment - If we don't copy the list, then the immutability guarantee is weaker. I guess you can use the same idea of building a wrapper for well-formed AclList to avoid the copy. You can do the copy when constructing the wrapper. That way the wrapper consolidate the validation logic and the immutability guarantee is retained. In the current HDFS-5658 patch, we're starting to see some of that code duplication related to holding the namesystem lock and checking permission, ... The code you referred is around 10 lines per function, I think it is okay to have some redundancy. On the other hand, The AclTransformation instances are not purely functional – they do have state. It seems that the particular implementation of AclTransformation is losing both the bright side of functional and imperative programming model. Therefore I think it is okay to have some duplication here, it does not make the overall architecture worse. If this is really a concern, it can be fixed easily with other techniques, e.g., aspectJ. I believe that's actually O(N^2), or perhaps more accurately O(N*M) It seems to me that it requires no more than a merge sort. Using binary search should give you O(nlogn).
          Hide
          Chris Nauroth added a comment -

          Thanks for the feedback, Haohui. I'm uploading version 2 of the patch. Summary of changes:

          1. I ditched the Acl class again. Now that we're going to use inode features, this class is redundant with AclFeature provided on HDFS-5658. The main reason for having the Acl class was to bundle together the ACL entries with the sticky bit. The storage described in the design doc would have reused the permission bits for another purpose, and thus wiped out the current storage for sticky bit. This is no longer relevant, because the inode feature allows us to keep using the permission bits as is. As a side effect, the feedback about list initialization and copying no longer applies.
          2. I moved AclTransformation up in the package hierarchy to org.apache.hadoop.hdfs.server.namenode. There is no need for a separate .acl sub-package anymore now that Acl is gone.
          3. filterAclEntriesByAclSpec, mergeAclEntries and replaceAclEntries are re-implemented using multi-iteration algorithms. Hopefully this code is more readable now. filterAclEntriesByAclSpec is O(NlogN). mergeAclEntries is O(2NlogN). replaceAclEntries is still O(N). (All ACL spec operations still start with an O(NlogN) sort.)
          4. No more functors.

          All tests still pass, and coverage still shows 100% except for the unused private constructor in AclTransformation.

          Show
          Chris Nauroth added a comment - Thanks for the feedback, Haohui. I'm uploading version 2 of the patch. Summary of changes: I ditched the Acl class again. Now that we're going to use inode features, this class is redundant with AclFeature provided on HDFS-5658 . The main reason for having the Acl class was to bundle together the ACL entries with the sticky bit. The storage described in the design doc would have reused the permission bits for another purpose, and thus wiped out the current storage for sticky bit. This is no longer relevant, because the inode feature allows us to keep using the permission bits as is. As a side effect, the feedback about list initialization and copying no longer applies. I moved AclTransformation up in the package hierarchy to org.apache.hadoop.hdfs.server.namenode . There is no need for a separate .acl sub-package anymore now that Acl is gone. filterAclEntriesByAclSpec , mergeAclEntries and replaceAclEntries are re-implemented using multi-iteration algorithms. Hopefully this code is more readable now. filterAclEntriesByAclSpec is O(NlogN). mergeAclEntries is O(2NlogN). replaceAclEntries is still O(N). (All ACL spec operations still start with an O(NlogN) sort.) No more functors. All tests still pass, and coverage still shows 100% except for the unused private constructor in AclTransformation .
          Hide
          Chris Nauroth added a comment -

          Looking at the current state of HDFS-5618 and HDFS-5619, it appears that filterExtendedAclEntries in this patch is not required. Instead, we simply remove the whole AclFeature from the inode. Haohui Mai, do you agree? If yes, then I'll remove filterExtendedAclEntries from this patch.

          Show
          Chris Nauroth added a comment - Looking at the current state of HDFS-5618 and HDFS-5619 , it appears that filterExtendedAclEntries in this patch is not required. Instead, we simply remove the whole AclFeature from the inode. Haohui Mai , do you agree? If yes, then I'll remove filterExtendedAclEntries from this patch.
          Hide
          Chris Nauroth added a comment -

          Here is patch version 3 to remove filterExtendedAclEntries, as per prior comment. It would have been required in the original design, but it's irrelevant now that we can just drop the whole ACL feature from the inode.

          Show
          Chris Nauroth added a comment - Here is patch version 3 to remove filterExtendedAclEntries , as per prior comment. It would have been required in the original design, but it's irrelevant now that we can just drop the whole ACL feature from the inode.
          Hide
          Haohui Mai added a comment -

          Thanks Chris.

          On a high level, I would like to see the code to separate (1) the required functionality and (2) optimization. Correct me if I'm wrong. For example, It seems to me that the merge function can look like the following:

          merge(a,b) {
            // required functionality
            list c = merge list a and b.
            sort(c);
            // optimization
            optimize(c);
          }
          
          optimize(d) {
            list d;
            foreach (entry in c) {
              if (the current entry can be merged (e.g., duplication, different mask) with the previous one) {
                 update previous entry in d;
              } else {
                 add the entry into d;
              }
            }
          }
          

          The advantage of this approach is to make reasoning much easier. The required functionality is relatively straight forward. To reason about optimization, one only needs to check that it provides semantically equivalent results. Furthermore, optimization can break down into several passes to improve readability.

          Show
          Haohui Mai added a comment - Thanks Chris. On a high level, I would like to see the code to separate (1) the required functionality and (2) optimization. Correct me if I'm wrong. For example, It seems to me that the merge function can look like the following: merge(a,b) { // required functionality list c = merge list a and b. sort(c); // optimization optimize(c); } optimize(d) { list d; foreach (entry in c) { if (the current entry can be merged (e.g., duplication, different mask) with the previous one) { update previous entry in d; } else { add the entry into d; } } } The advantage of this approach is to make reasoning much easier. The required functionality is relatively straight forward. To reason about optimization, one only needs to check that it provides semantically equivalent results. Furthermore, optimization can break down into several passes to improve readability.
          Hide
          Chris Nauroth added a comment -

          Actually, the required functionality has a lot of tricky edge cases. Most of what's here is required functionality rather than an attempt to optimize the ACL. Duplicate entries must cause an error. A named mask or named other entry must cause an error. A mask must be calculated automatically only if required, and only if there was no mask provided manually in the ACL spec, and this decision is distinct per scope. Deletion of mask where it is required must cause an error. If there is a default ACL, then the entries for owner, group and other must be present. However, they may be copied from the access ACL if unspecified, but this should only be done if at least one default ACL entry has been provided.

          It sounds like the gist is that you want to see if multiple iterations can make it more readable. We could potentially try something like: 1) merge, 2) insert access mask if needed, 3) copy default entries if needed, 4) insert default mask if needed, 5) sort, 6) validate, where each of those 6 steps is a separate iteration. I'll try that.

          The nice thing is that we have a big test suite that I've already validated against Linux setfacl. There is relatively little risk in experimenting with different approaches in the code.

          Show
          Chris Nauroth added a comment - Actually, the required functionality has a lot of tricky edge cases. Most of what's here is required functionality rather than an attempt to optimize the ACL. Duplicate entries must cause an error. A named mask or named other entry must cause an error. A mask must be calculated automatically only if required, and only if there was no mask provided manually in the ACL spec, and this decision is distinct per scope. Deletion of mask where it is required must cause an error. If there is a default ACL, then the entries for owner, group and other must be present. However, they may be copied from the access ACL if unspecified, but this should only be done if at least one default ACL entry has been provided. It sounds like the gist is that you want to see if multiple iterations can make it more readable. We could potentially try something like: 1) merge, 2) insert access mask if needed, 3) copy default entries if needed, 4) insert default mask if needed, 5) sort, 6) validate, where each of those 6 steps is a separate iteration. I'll try that. The nice thing is that we have a big test suite that I've already validated against Linux setfacl. There is relatively little risk in experimenting with different approaches in the code.
          Hide
          Chris Nauroth added a comment -

          I'm attaching version 4 of the patch. This approach uses multiple iterations. I also removed AclSpecTransformationState and MaskCalculator. The net effect is increased code duplication inside each AclTransformation method, but perhaps it's more reaable.

          Haohui Mai, can you let me know whether you find version 3 or version 4 more readable? Thanks.

          Show
          Chris Nauroth added a comment - I'm attaching version 4 of the patch. This approach uses multiple iterations. I also removed AclSpecTransformationState and MaskCalculator . The net effect is increased code duplication inside each AclTransformation method, but perhaps it's more reaable. Haohui Mai , can you let me know whether you find version 3 or version 4 more readable? Thanks.
          Hide
          Haohui Mai added a comment -

          Thanks Chris. The latest version is much more readable to me. It seems to me that the code is essentially doing the following:

          1. promoteDefaultEntries:

          foreach AclEntryType:
            if there's an {{ACCESS}} entry without a name and no default entry for the type:
              promote it as a default entry.
          

          (However, it seems that for some reasons the code keeps the original entry as well)

          2. calculateMask:

          foreach scope:
            if there exists an entry without a name and no mask entry:
              add a mask entry whose permission equals to or(forall entries.perm)
          

          The code can be further simplified when the ACL entries are sorted. Recall the ACL entries are sorted by scope, name, type, and finally permissions.
          For example:

          promoteDefaultEntries(entries):
            pivot = the index of first entry with type ACCESS
            while (pivot < entries.length && entries[pivot].name() != null)
              if there is no corresponding default entry in entries[0..pivot - 1]:
                entries[pivot].type = DEFAULT
          
          calculateMask(entries):
            pivot = the index of first entry with type ACCESS
            calculateMaskForScope(entries, 0, pivot - 1, DEFAULT)
            calculateMaskForScope(entries, pivot, entries.lengt, ACCESS)
            sort(entries)
          
          calculateMaskForScope(entries, start, end, type):
            if entries[start].type != MASK:
               perm = or(forall entries[start..end].perm())
               entries.append(newMaskEntry(type, perm)) 
          
          merge(existing, new):
            out = concat(existing, new)
            sort(out)
            remove duplicate in out
            validate(out)
            promoteDefaultEntries(out)
            calculateMask(out)
            validate(out)
            return out
          

          That way you don't need to keep track on things like scopeDirty / providedMask. I also propose to move the compareTo() function AclEntry out to a new comparator inside AclTransformationLogic.

          Questions:

          1. Is it possible to have multiple entries whose type equals to ACCESS, and empty names? If the answer is yes, copyDefaultsIfNeeded will always pick up the last entry. Is it expected?
          2. How are you going to use these functions? It is not immediately clear to me that you can implement the APIs using these functions directly.
          Show
          Haohui Mai added a comment - Thanks Chris. The latest version is much more readable to me. It seems to me that the code is essentially doing the following: 1. promoteDefaultEntries: foreach AclEntryType: if there's an {{ACCESS}} entry without a name and no default entry for the type: promote it as a default entry. (However, it seems that for some reasons the code keeps the original entry as well) 2. calculateMask: foreach scope: if there exists an entry without a name and no mask entry: add a mask entry whose permission equals to or(forall entries.perm) The code can be further simplified when the ACL entries are sorted. Recall the ACL entries are sorted by scope, name, type, and finally permissions. For example: promoteDefaultEntries(entries): pivot = the index of first entry with type ACCESS while (pivot < entries.length && entries[pivot].name() != null ) if there is no corresponding default entry in entries[0..pivot - 1]: entries[pivot].type = DEFAULT calculateMask(entries): pivot = the index of first entry with type ACCESS calculateMaskForScope(entries, 0, pivot - 1, DEFAULT) calculateMaskForScope(entries, pivot, entries.lengt, ACCESS) sort(entries) calculateMaskForScope(entries, start, end, type): if entries[start].type != MASK: perm = or(forall entries[start..end].perm()) entries.append(newMaskEntry(type, perm)) merge(existing, new ): out = concat(existing, new ) sort(out) remove duplicate in out validate(out) promoteDefaultEntries(out) calculateMask(out) validate(out) return out That way you don't need to keep track on things like scopeDirty / providedMask. I also propose to move the compareTo() function AclEntry out to a new comparator inside AclTransformationLogic. Questions: Is it possible to have multiple entries whose type equals to ACCESS, and empty names? If the answer is yes, copyDefaultsIfNeeded will always pick up the last entry. Is it expected? How are you going to use these functions? It is not immediately clear to me that you can implement the APIs using these functions directly.
          Hide
          Chris Nauroth added a comment -

          Thanks, Haohui. In that case, let's work towards getting some variant of the v4 patch committed. I'll get to work on a v5. Some of the simplifications you proposed might not be feasible due to a few complexities in the requirements. See details below. Bottom line: if I can get the unit tests passing with these simplifications, then we can keep them.

          promoteDefaultEntries...

          There are a couple of additional complexities to the logic: 1) This promotion must only be triggered if the ACL spec has specified some kind of change to the default ACL. A change to only the access ACL must not trigger promotion. 2) Promotion only occurs for the base entries that are missing from the default ACL. It must not overwrite a default entry that the user provided in the ACL spec. This is why the logic is tracking both access entries and default entries. Example: if user specifies default owner entry and default group entry, but no default other entry, then the logic must use the provided default owner and default group entries and then copy the access ACL owner entry to the default ACL. 3) The access mask is an access entry with no name, but the logic must never promote it to a default mask.

          calculateMask...

          A few of the challenges with this logic: 1) Automatic mask calculation must not trigger for the default ACL if the ACL spec is changing only the access ACL (and vice versa). 2) The logic must know whether a mask entry is coming from the existing ACL or being added by the ACL spec. The version in the ACL spec must override the automatic calculation. 3) In the case of filterAclEntriesByAclSpec, the logic must have knowledge that some kind of change was made to either the access ACL or default ACL. The list of ACL entries alone isn't sufficient to know that, because absence of something in that list does not necessarily imply deletion. All of these requirements drove the state tracking related to scopeDirty/maskDirty/providedMask.

          I also propose to move the compareTo() function AclEntry out to a new comparator inside AclTransformationLogic.

          Yes, we can do that. Just so I understand the motivation, is this to reduce the footprint of code shipped client side, particularly code that is sensitive to server side implementation details? (If so, then I agree with you.)

          Is it possible to have multiple entries whose type equals to ACCESS, and empty names? If the answer is yes, copyDefaultsIfNeeded will always pick up the last entry. Is it expected?

          I didn't quite follow this question, because ACCESS is a scope, and the types are USER, GROUP, MASK and OTHER. Within any ACL, the combination of scope + type + name is unique for each entry. Attempting to create entries with duplicate scope + type + name in the same ACL would get rejected during the validation checks for duplicate entries. Therefore, for any invocation of copyDefaultsIfNeeded, I'd expect each of those code paths to execute at most once.

          How are you going to use these functions?

          Each FSNamesystem method that modifies an ACL will get the current ACL from the inode feature, pass the existing ACL + the ACL spec to the corresponding AclTransformation method, and then set the resulting new ACL back on the inode feature. FSNamesystem#removeAcl is unique in that it can simply remove the ACL feature. There will be some integration with the existing FSNamesystem#setPermission too. If calling setPermission on an inode with an ACL, then it will convert the client-supplied FsPermission to a minimal ACL (containing exactly the 3 base entries), and then call AclTransformation#mergeAclEntries to combine that minimal ACL with the existing ACL. Additionally, FSNamesystem#removeAclEntries, FSNamesystem#removeDefaultAcl, and FSNamesystem#setAcl might result in reducing back to a minimal ACL that can be represented as a 16-bit FsPermission. We'll detect these scenarios and handle it by removing the ACL feature from the inode. That's a summary of everything I can think of right now related to how these functions will be used.

          Show
          Chris Nauroth added a comment - Thanks, Haohui. In that case, let's work towards getting some variant of the v4 patch committed. I'll get to work on a v5. Some of the simplifications you proposed might not be feasible due to a few complexities in the requirements. See details below. Bottom line: if I can get the unit tests passing with these simplifications, then we can keep them. promoteDefaultEntries... There are a couple of additional complexities to the logic: 1) This promotion must only be triggered if the ACL spec has specified some kind of change to the default ACL. A change to only the access ACL must not trigger promotion. 2) Promotion only occurs for the base entries that are missing from the default ACL. It must not overwrite a default entry that the user provided in the ACL spec. This is why the logic is tracking both access entries and default entries. Example: if user specifies default owner entry and default group entry, but no default other entry, then the logic must use the provided default owner and default group entries and then copy the access ACL owner entry to the default ACL. 3) The access mask is an access entry with no name, but the logic must never promote it to a default mask. calculateMask... A few of the challenges with this logic: 1) Automatic mask calculation must not trigger for the default ACL if the ACL spec is changing only the access ACL (and vice versa). 2) The logic must know whether a mask entry is coming from the existing ACL or being added by the ACL spec. The version in the ACL spec must override the automatic calculation. 3) In the case of filterAclEntriesByAclSpec , the logic must have knowledge that some kind of change was made to either the access ACL or default ACL. The list of ACL entries alone isn't sufficient to know that, because absence of something in that list does not necessarily imply deletion. All of these requirements drove the state tracking related to scopeDirty/maskDirty/providedMask. I also propose to move the compareTo() function AclEntry out to a new comparator inside AclTransformationLogic. Yes, we can do that. Just so I understand the motivation, is this to reduce the footprint of code shipped client side, particularly code that is sensitive to server side implementation details? (If so, then I agree with you.) Is it possible to have multiple entries whose type equals to ACCESS, and empty names? If the answer is yes, copyDefaultsIfNeeded will always pick up the last entry. Is it expected? I didn't quite follow this question, because ACCESS is a scope, and the types are USER, GROUP, MASK and OTHER. Within any ACL, the combination of scope + type + name is unique for each entry. Attempting to create entries with duplicate scope + type + name in the same ACL would get rejected during the validation checks for duplicate entries. Therefore, for any invocation of copyDefaultsIfNeeded , I'd expect each of those code paths to execute at most once. How are you going to use these functions? Each FSNamesystem method that modifies an ACL will get the current ACL from the inode feature, pass the existing ACL + the ACL spec to the corresponding AclTransformation method, and then set the resulting new ACL back on the inode feature. FSNamesystem#removeAcl is unique in that it can simply remove the ACL feature. There will be some integration with the existing FSNamesystem#setPermission too. If calling setPermission on an inode with an ACL, then it will convert the client-supplied FsPermission to a minimal ACL (containing exactly the 3 base entries), and then call AclTransformation#mergeAclEntries to combine that minimal ACL with the existing ACL. Additionally, FSNamesystem#removeAclEntries , FSNamesystem#removeDefaultAcl , and FSNamesystem#setAcl might result in reducing back to a minimal ACL that can be represented as a 16-bit FsPermission . We'll detect these scenarios and handle it by removing the ACL feature from the inode. That's a summary of everything I can think of right now related to how these functions will be used.
          Hide
          Chris Nauroth added a comment -

          I just realized that the proposed pseudo-code for promoteDefaultEntries appears to be mutating an access entry into a default entry. Just to clarify, the actual requirement is not mutation, but rather copying. The access entry must be copied to a new entry with scope set to default, but the copied access entry must be retained without change. For this reason, I plan to keep using "copy" in the method name instead of "promote".

          Show
          Chris Nauroth added a comment - I just realized that the proposed pseudo-code for promoteDefaultEntries appears to be mutating an access entry into a default entry. Just to clarify, the actual requirement is not mutation, but rather copying. The access entry must be copied to a new entry with scope set to default, but the copied access entry must be retained without change. For this reason, I plan to keep using "copy" in the method name instead of "promote".
          Hide
          Haohui Mai added a comment -

          for promoteDefaultEntries and calculateMask, what I'm essentially saying is that you can take the advantage of the fact that the ACL list has an order to simplify the code, rather than scanning through the list.

          is this to reduce the footprint of code shipped client side, particularly code that is sensitive to server side implementation details?

          Yes.

          because ACCESS is a scope, and the types are USER, GROUP, MASK and OTHER..

          I'm talking about the following code:

          for (AclEntry entry: aclBuilder) {
           if (entry.getScope() == AclEntryScope.ACCESS) {
              if (entry.getType() == AclEntryType.USER && entry.getName() == null) {
                userEntry = entry;
              }
              ...
          }
          

          I wonder whether userEntry will be assigned twice. Maybe I miss something, I'm unaware of the corresponding checks.

          If calling setPermission on an inode with an ACL, then it will convert the client-supplied FsPermission to a minimal ACL (containing exactly the 3 base entries) ...

          I'm wondering whether this is the right way to approach it. Although it can simplify the enforcement of the permission, there are a couple disadvantages of always using ACLs for FsPermission, including (1) increaseing the memory footprint and serialization costs, (2) calling removeAcl() might incidentally remove the FsPermission.

          Based on my understanding, Linux separates the logic of ACLs and FsPermission, and it has a special bit to indicate whether ACLs are enabled. In my opinion this separates the reasoning between ACLs and FsPermissions. It seems to me that it is a much simpler approach comparing to merging and optimizing ACLs, due to various requirements set by the POSIX drafts.

          Show
          Haohui Mai added a comment - for promoteDefaultEntries and calculateMask , what I'm essentially saying is that you can take the advantage of the fact that the ACL list has an order to simplify the code, rather than scanning through the list. is this to reduce the footprint of code shipped client side, particularly code that is sensitive to server side implementation details? Yes. because ACCESS is a scope, and the types are USER, GROUP, MASK and OTHER.. I'm talking about the following code: for (AclEntry entry: aclBuilder) { if (entry.getScope() == AclEntryScope.ACCESS) { if (entry.getType() == AclEntryType.USER && entry.getName() == null ) { userEntry = entry; } ... } I wonder whether userEntry will be assigned twice. Maybe I miss something, I'm unaware of the corresponding checks. If calling setPermission on an inode with an ACL, then it will convert the client-supplied FsPermission to a minimal ACL (containing exactly the 3 base entries) ... I'm wondering whether this is the right way to approach it. Although it can simplify the enforcement of the permission, there are a couple disadvantages of always using ACLs for FsPermission, including (1) increaseing the memory footprint and serialization costs, (2) calling removeAcl() might incidentally remove the FsPermission. Based on my understanding, Linux separates the logic of ACLs and FsPermission, and it has a special bit to indicate whether ACLs are enabled. In my opinion this separates the reasoning between ACLs and FsPermissions. It seems to me that it is a much simpler approach comparing to merging and optimizing ACLs, due to various requirements set by the POSIX drafts.
          Hide
          Chris Nauroth added a comment -

          I wonder whether userEntry will be assigned twice. Maybe I miss something, I'm unaware of the corresponding checks.

          Scope + type + name is unique. The only way that assignment could execute twice is if a user tried to supply multiple access owner entries in the ACL spec. If that happens, then the whole request would get rejected anyway by the duplicate entry check in AclTransformation#buildAndValidateAcl.

          ...there are a couple disadvantages of always using ACLs for FsPermission...

          To clarify, we will not always translate FsPermission to List<AclEntry>, especially not at enforcement time. That was never the intention. If you look at my HDFS-5612 patch, you'll see that enforcement retains exactly the existing code path for checking permissions against an FsPermission. A separate code path is called only if the ACL feature is defined on the inode. In the case of a namespace that never uses ACLs, the ACL enforcement code path will never run.

          The conversion I discussed above is only required during execution of FSNamesystem#setPermission, and only if the target inode already has an ACL. (Once again, this is consistent with the design approach that namespaces that don't use ACLs don't pay a resource cost for the feature.) It's going to involve creating a small temporary list of 3 ACL entries and then calling the merge function. This happens while holding the write lock, so there aren't going to be multiple concurrent RPC threads allocating more objects. There is no serialization cost, because it's a temporary variable that doesn't transit the RPC interface.

          Based on my understanding, Linux separates the logic of ACLs and FsPermission, and it has a special bit to indicate whether ACLs are enabled.

          This is correct, and our implementation plan for HDFS is in agreement with it. The only thing I'd add is that when a user runs chmod on an inode that already has an ACL, then special handling is required, hence the translation to a minimal ACL and running the merge function. chmod can change the group permissions, and since group permissions contribute to the ACL's mask entry, we need to recalculate the mask during chmod calls.

          Show
          Chris Nauroth added a comment - I wonder whether userEntry will be assigned twice. Maybe I miss something, I'm unaware of the corresponding checks. Scope + type + name is unique. The only way that assignment could execute twice is if a user tried to supply multiple access owner entries in the ACL spec. If that happens, then the whole request would get rejected anyway by the duplicate entry check in AclTransformation#buildAndValidateAcl . ...there are a couple disadvantages of always using ACLs for FsPermission... To clarify, we will not always translate FsPermission to List<AclEntry>, especially not at enforcement time. That was never the intention. If you look at my HDFS-5612 patch, you'll see that enforcement retains exactly the existing code path for checking permissions against an FsPermission . A separate code path is called only if the ACL feature is defined on the inode. In the case of a namespace that never uses ACLs, the ACL enforcement code path will never run. The conversion I discussed above is only required during execution of FSNamesystem#setPermission , and only if the target inode already has an ACL. (Once again, this is consistent with the design approach that namespaces that don't use ACLs don't pay a resource cost for the feature.) It's going to involve creating a small temporary list of 3 ACL entries and then calling the merge function. This happens while holding the write lock, so there aren't going to be multiple concurrent RPC threads allocating more objects. There is no serialization cost, because it's a temporary variable that doesn't transit the RPC interface. Based on my understanding, Linux separates the logic of ACLs and FsPermission, and it has a special bit to indicate whether ACLs are enabled. This is correct, and our implementation plan for HDFS is in agreement with it. The only thing I'd add is that when a user runs chmod on an inode that already has an ACL, then special handling is required, hence the translation to a minimal ACL and running the merge function. chmod can change the group permissions, and since group permissions contribute to the ACL's mask entry, we need to recalculate the mask during chmod calls.
          Hide
          Haohui Mai added a comment -

          If that happens, then the whole request would get rejected anyway by the duplicate entry check in AclTransformation#buildAndValidateAcl.

          If I understand correctly, buildAndValidateAcl is executed at the very end of the function. However, the constraints discussed above seem to be a pre-condition of copyDefaultsIfNeeded, that is, the condition has to be met in order for copyDefaultsIfNeeded to be executed correctly. It might be worthwhile to add the check in copyDefaultsIfNeeded – a simple way to check it is to ensure userEntry has to be null during the assignment.

          For FsPermission, I don't think I have enough information to conclude, I'm okay with the current code, we can revisit it in the next patch.

          Show
          Haohui Mai added a comment - If that happens, then the whole request would get rejected anyway by the duplicate entry check in AclTransformation#buildAndValidateAcl. If I understand correctly, buildAndValidateAcl is executed at the very end of the function. However, the constraints discussed above seem to be a pre-condition of copyDefaultsIfNeeded , that is, the condition has to be met in order for copyDefaultsIfNeeded to be executed correctly. It might be worthwhile to add the check in copyDefaultsIfNeeded – a simple way to check it is to ensure userEntry has to be null during the assignment. For FsPermission, I don't think I have enough information to conclude, I'm okay with the current code, we can revisit it in the next patch.
          Hide
          Chris Nauroth added a comment -

          Here is patch version 5 with the following changes:

          1. AclEntry no longer implements Comparable. Instead, AclTransformation defines its own Comparator.
          2. AclTransformation#copyDefaultsIfNeeded has been recoded to use binary search on sorted sub-lists of the access entries and the default entries.

          I could not find a way to incorporate the feedback on mask calculation or merging and still maintain correctness. We cannot reliably remove duplicates, because after concatenating and sorting, we no longer know which one is correct to remove. The ACL spec entry needs to override the existing entry, but we've lost that information. We'd also end up losing relevant information for mask calculation as stated in my earlier comment.

          Show
          Chris Nauroth added a comment - Here is patch version 5 with the following changes: AclEntry no longer implements Comparable . Instead, AclTransformation defines its own Comparator . AclTransformation#copyDefaultsIfNeeded has been recoded to use binary search on sorted sub-lists of the access entries and the default entries. I could not find a way to incorporate the feedback on mask calculation or merging and still maintain correctness. We cannot reliably remove duplicates, because after concatenating and sorting, we no longer know which one is correct to remove. The ACL spec entry needs to override the existing entry, but we've lost that information. We'd also end up losing relevant information for mask calculation as stated in my earlier comment.
          Hide
          Chris Nauroth added a comment -

          It might be worthwhile to add the check in copyDefaultsIfNeeded – a simple way to check it is to ensure userEntry has to be null during the assignment.

          Now that the method has been recoded to use the binary search approach, this check isn't exactly applicable. The binary search returns exactly one item, and it's non-deterministic which one if there are duplicates. That means that checking for a duplicate in copyDefaultsIfNeeded would require some kind of forward and reverse scan from the binary search hit. Considering edge cases like array out of bounds, I think this ends up being too awkward to be worthwhile. I'm confident that we're catching these conditions in buildAndValidateAcl due to unit tests that assert an exception when the user tries to supply duplicate entries in the ACL spec.

          Show
          Chris Nauroth added a comment - It might be worthwhile to add the check in copyDefaultsIfNeeded – a simple way to check it is to ensure userEntry has to be null during the assignment. Now that the method has been recoded to use the binary search approach, this check isn't exactly applicable. The binary search returns exactly one item, and it's non-deterministic which one if there are duplicates. That means that checking for a duplicate in copyDefaultsIfNeeded would require some kind of forward and reverse scan from the binary search hit. Considering edge cases like array out of bounds, I think this ends up being too awkward to be worthwhile. I'm confident that we're catching these conditions in buildAndValidateAcl due to unit tests that assert an exception when the user tries to supply duplicate entries in the ACL spec.
          Hide
          Haohui Mai added a comment -

          okay. I think the patch looks good then. +1.

          Show
          Haohui Mai added a comment - okay. I think the patch looks good then. +1.
          Hide
          Chris Nauroth added a comment -

          I've committed this to the HDFS-4685 feature branch. Big thanks to Haohui for the thorough code reviews.

          Show
          Chris Nauroth added a comment - I've committed this to the HDFS-4685 feature branch. Big thanks to Haohui for the thorough code reviews.

            People

            • Assignee:
              Chris Nauroth
              Reporter:
              Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development