Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: HDFS ACLs (HDFS-4685)
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      The AclManager can maintain a Global ACL Set to store all distinct ACLs in use by the file system. All inodes that have the same ACL entries can share the same ACL instance.

      1. HDFS-5620.1.patch
        5 kB
        Chris Nauroth
      2. aclRamSizingEstimates
        3 kB
        Chris Nauroth
      3. aclRamSizingEstimates.png
        40 kB
        Chris Nauroth

        Issue Links

          Activity

          Hide
          Haohui Mai added a comment -

          I don't think we'll need to implement this one right now. An approach similar to HDFS-5793 (i.e., deduplicating during loading / saving FSImage) might be sufficient for most of the real-world use cases.

          Show
          Haohui Mai added a comment - I don't think we'll need to implement this one right now. An approach similar to HDFS-5793 (i.e., deduplicating during loading / saving FSImage) might be sufficient for most of the real-world use cases.
          Hide
          Chris Nauroth added a comment -

          Hi, Haohui. Thanks for pointing out HDFS-5793. It looks to me like that patch was focused on serialization optimization. The scope I intended for HDFS-5620 was to de-duplicate the AclFeature instances in memory. For example, assuming 10 inodes that all have an ACL with the exact same 10 entries, the current HDFS-4685 codebase results in 10 instances of AclFeature and 100 instances of AclEntry. I'd prefer to reduce that to 1 instance of AclFeature and 10 instances of AclEntry, with the 1 AclFeature shared by all 10 inodes. AclEntry instances are immutable, so it's safe to share them. My use of the word "storage" in the jira title might have been misleading, so I've changed it to "memory".

          It seems like the serialization optimizations in HDFS-5793 aren't directly applicable to this jira, but let me know if I'm missing something. As far as ACL serialization, I agree that we don't need to optimize that on the feature branch. The current code does a full serialization of all entries for each inode in fsimage or OP_SET_ACL in edits. We expect far fewer inodes propotionally to have an ACL (unlike PermissionStatus, which is present on every inode), so we're likely to see less performance improvement and storage reduction from optimizing ACL serialization. However, if we want to do it later, then it looks like HDFS-5793 is a great approach. I expect the ACL entries could share the same string table for user and group names.

          FWIW, I expect HDFS-5620 won't take much work. I'm attaching a prototype patch using the Guava interner. Quick manual tests using jmap -histo:live show that the instances are de-duplicating as I would expect. I need to put it through more testing though.

          Show
          Chris Nauroth added a comment - Hi, Haohui. Thanks for pointing out HDFS-5793 . It looks to me like that patch was focused on serialization optimization. The scope I intended for HDFS-5620 was to de-duplicate the AclFeature instances in memory. For example, assuming 10 inodes that all have an ACL with the exact same 10 entries, the current HDFS-4685 codebase results in 10 instances of AclFeature and 100 instances of AclEntry . I'd prefer to reduce that to 1 instance of AclFeature and 10 instances of AclEntry , with the 1 AclFeature shared by all 10 inodes. AclEntry instances are immutable, so it's safe to share them. My use of the word "storage" in the jira title might have been misleading, so I've changed it to "memory". It seems like the serialization optimizations in HDFS-5793 aren't directly applicable to this jira, but let me know if I'm missing something. As far as ACL serialization, I agree that we don't need to optimize that on the feature branch. The current code does a full serialization of all entries for each inode in fsimage or OP_SET_ACL in edits. We expect far fewer inodes propotionally to have an ACL (unlike PermissionStatus , which is present on every inode), so we're likely to see less performance improvement and storage reduction from optimizing ACL serialization. However, if we want to do it later, then it looks like HDFS-5793 is a great approach. I expect the ACL entries could share the same string table for user and group names. FWIW, I expect HDFS-5620 won't take much work. I'm attaching a prototype patch using the Guava interner. Quick manual tests using jmap -histo:live show that the instances are de-duplicating as I would expect. I need to put it through more testing though.
          Hide
          Chris Nauroth added a comment -

          I'm resolving this issue as won't fix. Haohui Mai was right to question whether or not we really need this optimization.

          Now that ACLs implementation is almost done, I can get a more accurate measurement of the memory impact using jmap. I'm including a plot that estimates memory consumption by ACLs in a presumed 50 GB NameNode, trying to store up to 100 million inodes. I plotted 3 different usage scenarios:

          1. max usage: All inodes have an ACL, and each one has the maximum ACL entries.
          2. mid usage: Half the inodes have an ACL, and each ACL has 10 entries.
          3. low usage: 10% of inodes have an ACL, and each ACL has 2 entries.

          For each scenario, I plotted the unoptimized memory consumption and also the optimized memory consumption, assuming that only 20% of the ACLs are unique. Each scenario uses a different line color. The unoptimized version uses a solid line, and the optimized version uses a dashed line. I'm also attaching the GnuPlot script I wrote to generate this.

          The plot demonstrates that ACL usage really needs to get quite high (very large number of inodes with a very high proportion of them having ACLs) before this memory optimization starts to provide a benefit.

          I am +1 for skipping this for now. We can always resurrect this patch if we observe a need for it based on real world usage of ACLs. I'm also going to put this information into a new revision of the design doc on HDFS-4685.

          BTW, if we do resurrect this patch, then I probably wouldn't commit the exact HDFS-5620.1.patch that I put together quickly as a prototype. The Guava interner has a spinlock inside of it, which we don't really need, because this would only be accessed under the namesystem write lock anyway.

          Show
          Chris Nauroth added a comment - I'm resolving this issue as won't fix. Haohui Mai was right to question whether or not we really need this optimization. Now that ACLs implementation is almost done, I can get a more accurate measurement of the memory impact using jmap. I'm including a plot that estimates memory consumption by ACLs in a presumed 50 GB NameNode, trying to store up to 100 million inodes. I plotted 3 different usage scenarios: max usage: All inodes have an ACL, and each one has the maximum ACL entries. mid usage: Half the inodes have an ACL, and each ACL has 10 entries. low usage: 10% of inodes have an ACL, and each ACL has 2 entries. For each scenario, I plotted the unoptimized memory consumption and also the optimized memory consumption, assuming that only 20% of the ACLs are unique. Each scenario uses a different line color. The unoptimized version uses a solid line, and the optimized version uses a dashed line. I'm also attaching the GnuPlot script I wrote to generate this. The plot demonstrates that ACL usage really needs to get quite high (very large number of inodes with a very high proportion of them having ACLs) before this memory optimization starts to provide a benefit. I am +1 for skipping this for now. We can always resurrect this patch if we observe a need for it based on real world usage of ACLs. I'm also going to put this information into a new revision of the design doc on HDFS-4685 . BTW, if we do resurrect this patch, then I probably wouldn't commit the exact HDFS-5620 .1.patch that I put together quickly as a prototype. The Guava interner has a spinlock inside of it, which we don't really need, because this would only be accessed under the namesystem write lock anyway.

            People

            • Assignee:
              Chris Nauroth
              Reporter:
              Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development