I'm resolving this issue as won't fix. Haohui Mai was right to question whether or not we really need this optimization.
Now that ACLs implementation is almost done, I can get a more accurate measurement of the memory impact using jmap. I'm including a plot that estimates memory consumption by ACLs in a presumed 50 GB NameNode, trying to store up to 100 million inodes. I plotted 3 different usage scenarios:
- max usage: All inodes have an ACL, and each one has the maximum ACL entries.
- mid usage: Half the inodes have an ACL, and each ACL has 10 entries.
- low usage: 10% of inodes have an ACL, and each ACL has 2 entries.
For each scenario, I plotted the unoptimized memory consumption and also the optimized memory consumption, assuming that only 20% of the ACLs are unique. Each scenario uses a different line color. The unoptimized version uses a solid line, and the optimized version uses a dashed line. I'm also attaching the GnuPlot script I wrote to generate this.
The plot demonstrates that ACL usage really needs to get quite high (very large number of inodes with a very high proportion of them having ACLs) before this memory optimization starts to provide a benefit.
I am +1 for skipping this for now. We can always resurrect this patch if we observe a need for it based on real world usage of ACLs. I'm also going to put this information into a new revision of the design doc on
BTW, if we do resurrect this patch, then I probably wouldn't commit the exact
HDFS-5620.1.patch that I put together quickly as a prototype. The Guava interner has a spinlock inside of it, which we don't really need, because this would only be accessed under the namesystem write lock anyway.