Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12130

Optimizing permission check for getContentSummary



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-beta1
    • namenode
    • None


      Currently, getContentSummary takes two phases to complete:

      • phase1. check the permission of the entire subtree. If any subdirectory does not have READ_EXECUTE, an access control exception is thrown and getContentSummary terminates here (unless it's super user).
      • phase2. If phase1 passed, it will then traverse the entire tree recursively to get the actual content summary.

      An issue is, both phases currently hold the fs lock.

      Phase 2 has already been written that, it will yield the fs lock over time, such that it does not block other operations for too long. However phase 1 does not yield. Meaning it's possible that the permission check phase still blocks things for long time.

      One fix is to add lock yield to phase 1. But a simpler fix is to merge phase 1 into phase 2. Namely, instead of doing a full traversal for permission check first, we start with phase 2 directly, but for each directory, before obtaining its summary, check its permission first. This way we take advantage of existing lock yield in phase 2 code and still able to check permission and terminate on access exception.

      Thanks szetszwo for the offline discussions!


        1. HDFS-12130.001.patch
          25 kB
          Chen Liang
        2. HDFS-12130.002.patch
          26 kB
          Chen Liang
        3. HDFS-12130.003.patch
          26 kB
          Chen Liang

        Issue Links



              vagarychen Chen Liang
              vagarychen Chen Liang
              0 Vote for this issue
              9 Start watching this issue