Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4222

NN is unresponsive and loses heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 0.23.3, 2.0.0-alpha
    • Fix Version/s: 1.2.0, 0.23.7, 2.1.0-beta
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      For Hadoop clusters configured to access directory information by LDAP, the FSNamesystem calls on behave of DFS clients might hang due to LDAP issues (including LDAP access issues caused by networking issues) while holding the single lock of FSNamesystem. That will result in the NN unresponsive and loss of the heartbeats from DNs.

      The places LDAP got accessed by FSNamesystem calls are the instantiation of FSPermissionChecker, which could be moved out of the lock scope since the instantiation does not need the FSNamesystem lock. After the move, a DFS client hang will not affect other threads by hogging the single lock. This is especially helpful when we use separate RPC servers for ClientProtocol and DatanodeProtocol since the calls for DatanodeProtocol do not need to access LDAP. So even if DFS clients hang due to LDAP issues, the NN will still be able to process the requests (including heartbeats) from DNs.

      1. HDFS-4222.23.patch
        28 kB
        Suresh Srinivas
      2. HDFS-4222.patch
        30 kB
        Suresh Srinivas
      3. HDFS-4222.patch
        27 kB
        Suresh Srinivas
      4. hdfs-4222-branch-0.23.3.patch
        29 kB
        Xiaobo Peng
      5. HDFS-4222-branch-1.patch
        49 kB
        Xiaobo Peng
      6. hdfs-4222-release-1.0.3.patch
        45 kB
        Xiaobo Peng

        Issue Links

          Activity

          Matt Foley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.
          Xiaobo Peng made changes -
          Summary NN is unresponsive and lose heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues NN is unresponsive and loses heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues
          Jing Zhao made changes -
          Link This issue relates to HDFS-4622 [ HDFS-4622 ]
          Hide
          Xiaobo Peng added a comment -

          Thank you very much for the quick action, Suresh.

          Show
          Xiaobo Peng added a comment - Thank you very much for the quick action, Suresh.
          Suresh Srinivas made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 1.2.0 [ 12321657 ]
          Resolution Fixed [ 1 ]
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to branch-1 as well. Thank you Xiaobo!

          Show
          Suresh Srinivas added a comment - I committed the patch to branch-1 as well. Thank you Xiaobo!
          Hide
          Suresh Srinivas added a comment -

          Xiaobo Peng, thanks for working branch-1 patch. +1 for the patch. I will commit it soon.

          Show
          Suresh Srinivas added a comment - Xiaobo Peng , thanks for working branch-1 patch. +1 for the patch. I will commit it soon.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #534 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/534/)
          HDFS-4222. Merge r1448801 from trunk (Revision 1449083)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1449083
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #534 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/534/ ) HDFS-4222 . Merge r1448801 from trunk (Revision 1449083) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1449083 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Xiaobo Peng made changes -
          Attachment HDFS-4222-branch-1.patch [ 12570612 ]
          Hide
          Xiaobo Peng added a comment -

          Many thanks to Suresh, Kihwal, and Daryn for the help. I am uploading the patch for branch-1. Please review.

          I have to move the synchronized keyword from the method level to the block level. So there are many lines got reformatted. I reviewed my patch several times. It was time consuming. Thanks a lot.

          Show
          Xiaobo Peng added a comment - Many thanks to Suresh, Kihwal, and Daryn for the help. I am uploading the patch for branch-1. Please review. I have to move the synchronized keyword from the method level to the block level. So there are many lines got reformatted. I reviewed my patch several times. It was time consuming. Thanks a lot.
          Hide
          Suresh Srinivas added a comment -

          Thanks Kihwal. I committed the patch to 0.23 branch.

          Show
          Suresh Srinivas added a comment - Thanks Kihwal. I committed the patch to 0.23 branch.
          Suresh Srinivas made changes -
          Fix Version/s 0.23.7 [ 12323955 ]
          Hide
          Kihwal Lee added a comment -

          The 0.23 patch looks fine. I also tried to resolve merge conflicts and ended up with pretty much the same patch as yours. The only significant code difference seems to be in getBlockLocations().

          Show
          Kihwal Lee added a comment - The 0.23 patch looks fine. I also tried to resolve merge conflicts and ended up with pretty much the same patch as yours. The only significant code difference seems to be in getBlockLocations().
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1352 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1352/)
          HDFS-4222. NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1352 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1352/ ) HDFS-4222 . NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1324 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1324/)
          HDFS-4222. NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1324 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1324/ ) HDFS-4222 . NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #135 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/135/)
          HDFS-4222. NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #135 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/135/ ) HDFS-4222 . NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Suresh Srinivas made changes -
          Attachment HDFS-4222.23.patch [ 12570416 ]
          Hide
          Suresh Srinivas added a comment -

          The merge was not straightforward. Kihwal, can you also do a quick review of 0.23 version of the patch.

          Show
          Suresh Srinivas added a comment - The merge was not straightforward. Kihwal, can you also do a quick review of 0.23 version of the patch.
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Suresh Srinivas added a comment -

          Canceling the patch to attach 0.23 patch.

          Show
          Suresh Srinivas added a comment - Canceling the patch to attach 0.23 patch.
          Hide
          Kihwal Lee added a comment -

          Kihwal, do you want me to commit this to branch 0.23?

          Yes. Thanks Suresh.

          Show
          Kihwal Lee added a comment - Kihwal, do you want me to commit this to branch 0.23? Yes. Thanks Suresh.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3375 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3375/)
          HDFS-4222. NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3375 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3375/ ) HDFS-4222 . NN is unresponsive and loses heartbeats from DNs when configured to use LDAP and LDAP has issues. Contributed by Xiaobo Peng and Suresh Srinivas. (Revision 1448801) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448801 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
          Suresh Srinivas made changes -
          Fix Version/s 2.0.4-beta [ 12324031 ]
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to trunk and branch-2. Keeping this jira open to make this change available for branch-1.

          Thank you Xiaobo Peng for the original patch. Kihwal and Daryn, thanks for the review.

          Kihwal, do you want me to commit this to branch 0.23?

          Show
          Suresh Srinivas added a comment - I committed the patch to trunk and branch-2. Keeping this jira open to make this change available for branch-1. Thank you Xiaobo Peng for the original patch. Kihwal and Daryn, thanks for the review. Kihwal, do you want me to commit this to branch 0.23?
          Hide
          Kihwal Lee added a comment -

          +1 The latest patch looks good to me. Future-proofing can be done separately if needed.

          Show
          Kihwal Lee added a comment - +1 The latest patch looks good to me. Future-proofing can be done separately if needed.
          Hide
          Suresh Srinivas added a comment -

          If unconditionally forced early lookup is not acceptable...

          Current patch could have been much simpler, if FSPermissionChecker was always created. Lets do further optimizations in another patch.

          If someone reviews this and +1s it, I will commit it. I plan on getting this into branch-1 as well.

          Show
          Suresh Srinivas added a comment - If unconditionally forced early lookup is not acceptable... Current patch could have been much simpler, if FSPermissionChecker was always created. Lets do further optimizations in another patch. If someone reviews this and +1s it, I will commit it. I plan on getting this into branch-1 as well.
          Hide
          Kihwal Lee added a comment -

          "lockGroups" would internally fetch the groups and then make them immutable in the UGI.

          This will ensure group lookups happen before entering a critical section. But since mappings are cached in Groups and the UGI is short-lived, simply calling getGroups() early will be enough. One concern in early lookup is whether it will generate excessive extra lookups. Some methods in client protocol don't need permission checks as they are lease-based. For example, addBlock(), complete() and renewLease() are lease-based and their request rate is quite high. In a busy grid, they add up to over 450 requests/sec. But again, caching may reduce actual lookups, so it may not be a big deal.

          If unconditionally forced early lookup is not acceptable, we could selectively do early lookup/cache-filling in NameNodeRpcServer for those calls that actually need group lookups and leave FSNamesystem mostly unchanged.

          Show
          Kihwal Lee added a comment - "lockGroups" would internally fetch the groups and then make them immutable in the UGI. This will ensure group lookups happen before entering a critical section. But since mappings are cached in Groups and the UGI is short-lived, simply calling getGroups() early will be enough. One concern in early lookup is whether it will generate excessive extra lookups. Some methods in client protocol don't need permission checks as they are lease-based. For example, addBlock(), complete() and renewLease() are lease-based and their request rate is quite high. In a busy grid, they add up to over 450 requests/sec. But again, caching may reduce actual lookups, so it may not be a big deal. If unconditionally forced early lookup is not acceptable, we could selectively do early lookup/cache-filling in NameNodeRpcServer for those calls that actually need group lookups and leave FSNamesystem mostly unchanged.
          Hide
          Suresh Srinivas added a comment -

          Perhaps init-ed in the same places where getPermissionChecker is being invoked, or ideally at a higher level to avoid all command methods from having "to do the right".

          Only problem is, if subsequent methods are not passed FSPermissionChecker, they might end up calling getPermissionChecker (due to a bug) as well, that too inside the lock. The likelihood of that with parameter passing is low.

          "lockGroups" would internally fetch the groups and then make them immutable in the UGI....

          We should certainly explore this in a subsequent jira.

          Show
          Suresh Srinivas added a comment - Perhaps init-ed in the same places where getPermissionChecker is being invoked, or ideally at a higher level to avoid all command methods from having "to do the right". Only problem is, if subsequent methods are not passed FSPermissionChecker, they might end up calling getPermissionChecker (due to a bug) as well, that too inside the lock. The likelihood of that with parameter passing is low. "lockGroups" would internally fetch the groups and then make them immutable in the UGI.... We should certainly explore this in a subsequent jira.
          Hide
          Daryn Sharp added a comment -

          Not sure how to make this work. When does thread local variable get initialized and when is it cleared, given a thread gets used for different current users?

          Perhaps init-ed in the same places where getPermissionChecker is being invoked, or ideally at a higher level to avoid all command methods from having "to do the right".

          bq. Another thought might be an option to tell a UGI to "lock-in" it's group list. Something earlier on at a high level, maybe the NN's RPC server, could call UserGroupInformation.getCurrentUser().lockGroups().

          Not sure I understood this.

          "lockGroups" would internally fetch the groups and then make them immutable in the UGI. It could be invoked where getPermissionChecker is being invoked, or ideally at a higher level chokepoint for calls so it's a one-line change. Maybe in the rpc call's doAs since a call shouldn't be running long enough that the groups will change. This would inoculate future methods or overlooked methods from taking the lookup penalty within a lock.

          In either case, I'm just trying to think of how to simplify the change and future-proof against similar issues. Again though, I really like this change.

          Show
          Daryn Sharp added a comment - Not sure how to make this work. When does thread local variable get initialized and when is it cleared, given a thread gets used for different current users? Perhaps init-ed in the same places where getPermissionChecker is being invoked, or ideally at a higher level to avoid all command methods from having "to do the right". bq. Another thought might be an option to tell a UGI to "lock-in" it's group list. Something earlier on at a high level, maybe the NN's RPC server, could call UserGroupInformation.getCurrentUser().lockGroups(). Not sure I understood this. "lockGroups" would internally fetch the groups and then make them immutable in the UGI. It could be invoked where getPermissionChecker is being invoked, or ideally at a higher level chokepoint for calls so it's a one-line change. Maybe in the rpc call's doAs since a call shouldn't be running long enough that the groups will change. This would inoculate future methods or overlooked methods from taking the lookup penalty within a lock. In either case, I'm just trying to think of how to simplify the change and future-proof against similar issues. Again though, I really like this change.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570230/HDFS-4222.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3990//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3990//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570230/HDFS-4222.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3990//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3990//console This message is automatically generated.
          Suresh Srinivas made changes -
          Attachment HDFS-4222.patch [ 12570230 ]
          Hide
          Suresh Srinivas added a comment -

          Updated patch to address Daryn's comments.

          Show
          Suresh Srinivas added a comment - Updated patch to address Daryn's comments.
          Hide
          Suresh Srinivas added a comment -

          Nice change. I've quickly scanned the patch and checkSuperuserPrivilege internally instantiates a permission checker, which appears to be getting called within the namespace lock in a number of places too.

          Good catch.

          To reduce the size of the patch, would it maybe make sense to use a thread-local permission checker singleton? Ie. FsPermissionChecker.init() sets the thread-local for the current user, and then FsPermissionChecker.getInstance() instead of new FsPermissionChecker returns the singleton? Just a thought.

          Not sure how to make this work. When does thread local variable get initialized and when is it cleared, given a thread gets used for different current users?

          Another thought might be an option to tell a UGI to "lock-in" it's group list. Something earlier on at a high level, maybe the NN's RPC server, could call UserGroupInformation.getCurrentUser().lockGroups().

          Not sure I understood this.

          Show
          Suresh Srinivas added a comment - Nice change. I've quickly scanned the patch and checkSuperuserPrivilege internally instantiates a permission checker, which appears to be getting called within the namespace lock in a number of places too. Good catch. To reduce the size of the patch, would it maybe make sense to use a thread-local permission checker singleton? Ie. FsPermissionChecker.init() sets the thread-local for the current user, and then FsPermissionChecker.getInstance() instead of new FsPermissionChecker returns the singleton? Just a thought. Not sure how to make this work. When does thread local variable get initialized and when is it cleared, given a thread gets used for different current users? Another thought might be an option to tell a UGI to "lock-in" it's group list. Something earlier on at a high level, maybe the NN's RPC server, could call UserGroupInformation.getCurrentUser().lockGroups(). Not sure I understood this.
          Hide
          Daryn Sharp added a comment -

          Nice change. I've quickly scanned the patch and checkSuperuserPrivilege internally instantiates a permission checker, which appears to be getting called within the namespace lock in a number of places too.

          To reduce the size of the patch, would it maybe make sense to use a thread-local permission checker singleton? Ie. FsPermissionChecker.init() sets the thread-local for the current user, and then FsPermissionChecker.getInstance() instead of new FsPermissionChecker returns the singleton? Just a thought.

          Another thought might be an option to tell a UGI to "lock-in" it's group list. Something earlier on at a high level, maybe the NN's RPC server, could call UserGroupInformation.getCurrentUser().lockGroups().

          Show
          Daryn Sharp added a comment - Nice change. I've quickly scanned the patch and checkSuperuserPrivilege internally instantiates a permission checker, which appears to be getting called within the namespace lock in a number of places too. To reduce the size of the patch, would it maybe make sense to use a thread-local permission checker singleton? Ie. FsPermissionChecker.init() sets the thread-local for the current user, and then FsPermissionChecker.getInstance() instead of new FsPermissionChecker returns the singleton? Just a thought. Another thought might be an option to tell a UGI to "lock-in" it's group list. Something earlier on at a high level, maybe the NN's RPC server, could call UserGroupInformation.getCurrentUser().lockGroups() .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570045/HDFS-4222.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3986//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3986//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570045/HDFS-4222.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3986//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3986//console This message is automatically generated.
          Suresh Srinivas made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 2.0.0-alpha [ 12320353 ]
          Affects Version/s 1.0.0 [ 12318243 ]
          Suresh Srinivas made changes -
          Attachment HDFS-4222.patch [ 12570045 ]
          Hide
          Suresh Srinivas added a comment -

          Changes in this patch:

          1. Introduced FSNamesystem#fsOwnerShortUserName to avoid getting short username from a UGI FSNamesystem#fsOwner every time FSPermissionChecker is used.
          2. Made changes to create FSPermissionChecker outside the FSNamesystem read and write lock. This is along the lines of the work Xiaobo has done.
          3. Documented the synchronization semantics in FSPermissionChecker and made some members final.
          Show
          Suresh Srinivas added a comment - Changes in this patch: Introduced FSNamesystem#fsOwnerShortUserName to avoid getting short username from a UGI FSNamesystem#fsOwner every time FSPermissionChecker is used. Made changes to create FSPermissionChecker outside the FSNamesystem read and write lock. This is along the lines of the work Xiaobo has done. Documented the synchronization semantics in FSPermissionChecker and made some members final.
          Hide
          Suresh Srinivas added a comment -

          Xiaobo, as I was reviewing the code, I started making some changes. Let me post my patch as well.

          Show
          Suresh Srinivas added a comment - Xiaobo, as I was reviewing the code, I started making some changes. Let me post my patch as well.
          Hide
          Xiaobo Peng added a comment -

          Suresh, Thanks a lot. I should finish what you suggested within 2 days.

          Show
          Xiaobo Peng added a comment - Suresh, Thanks a lot. I should finish what you suggested within 2 days.
          Hide
          Suresh Srinivas added a comment -

          Xiaobo PengThanks for working on this. I will review this shortly. Please make the patch available for trunk first. Along with it, you can also make patch available for branch-1. For branch-2 and branch-0.23.6 the patch can be merged usually. If it cannot be, then I will let you know.

          Show
          Suresh Srinivas added a comment - Xiaobo Peng Thanks for working on this. I will review this shortly. Please make the patch available for trunk first. Along with it, you can also make patch available for branch-1. For branch-2 and branch-0.23.6 the patch can be merged usually. If it cannot be, then I will let you know.
          Hide
          Xiaobo Peng added a comment -

          What branches should my patches be prepared for? trunk or branch-1 or branch-2 or branch-0.23.6? Thanks

          Show
          Xiaobo Peng added a comment - What branches should my patches be prepared for? trunk or branch-1 or branch-2 or branch-0.23.6? Thanks
          Hide
          Xiaobo Peng added a comment -

          Please review my patches. hdfs-4222-branch-0.23.3.patch should be easier to review. hdfs-4222-release-1.0.3.patch is messy since moving synchronized scopes result in reformatting.

          I think we should enclose all FSPermissionChecker instantiations in "if (isPermissionEnabled)" blocks as follows,

          FSPermissionChecker pc = null;
          if (isPermissionEnabled)

          { pc = new FSPermissionChecker(fsOwner.getShortUserName(), supergroup); }

          But I have not done so because I want to limit the impact of this changes. If you know it is totally safe to do such changes, please let me know. Thanks a lot.

          Show
          Xiaobo Peng added a comment - Please review my patches. hdfs-4222-branch-0.23.3.patch should be easier to review. hdfs-4222-release-1.0.3.patch is messy since moving synchronized scopes result in reformatting. I think we should enclose all FSPermissionChecker instantiations in "if (isPermissionEnabled)" blocks as follows, FSPermissionChecker pc = null; if (isPermissionEnabled) { pc = new FSPermissionChecker(fsOwner.getShortUserName(), supergroup); } But I have not done so because I want to limit the impact of this changes. If you know it is totally safe to do such changes, please let me know. Thanks a lot.
          Xiaobo Peng made changes -
          Attachment hdfs-4222-release-1.0.3.patch [ 12568929 ]
          Attachment hdfs-4222-branch-0.23.3.patch [ 12568928 ]
          Hide
          Xiaobo Peng added a comment -

          I manually tested the patch on release-1.0.3. Looks like we have some perf gain also.

          I will test hdfs-4222-branch-0.23.3.patch soon.

          Show
          Xiaobo Peng added a comment - I manually tested the patch on release-1.0.3. Looks like we have some perf gain also. I will test hdfs-4222-branch-0.23.3.patch soon.
          Hide
          Suresh Srinivas added a comment -

          Xiaobo, I think given that you described the change briefly earlier, the best way to cover the complete change is to by posting a patch.

          Show
          Suresh Srinivas added a comment - Xiaobo, I think given that you described the change briefly earlier, the best way to cover the complete change is to by posting a patch.
          Ted Yu made changes -
          Summary NN is unresponsive and lose hearbeats of DNs when Hadoop is configured to use LADP and LDAP has issues NN is unresponsive and lose heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues
          Hide
          Xiaobo Peng added a comment -

          Sorry the former comment did not format well. I'm trying to format it now.

          The following code snippets show a simple way to change FSNamesystem::renameTo in branch-0.23.4. Changes to other methods are similar.

          //////////// existent code

            /** Rename src to dst */
            void renameTo(String src, String dst, Options.Rename... options)
                throws IOException, UnresolvedLinkException {
          ...
              writeLock();
              try {
                renameToInternal(src, dst, options);
                if (auditLog.isInfoEnabled() && isExternalInvocation()) {
                  resultingStat = dir.getFileInfo(dst, false); 
                }
              } finally {
                writeUnlock();
              }
          ...
            }
          
          
            private void renameToInternal(String src, String dst,
                Options.Rename... options) throws IOException {
          ...
              if (isPermissionEnabled) {
                checkParentAccess(src, FsAction.WRITE);
                checkAncestorAccess(dst, FsAction.WRITE);
              }
          ...
            }
          
          
            private FSPermissionChecker checkParentAccess(String path, FsAction access
                ) throws AccessControlException, UnresolvedLinkException {
              return checkPermission(path, false, null, access, null, null);
            }
          
          
            private FSPermissionChecker checkPermission(String path, boolean doCheckOwner,
                FsAction ancestorAccess, FsAction parentAccess, FsAction access,
                FsAction subAccess) throws AccessControlException, UnresolvedLinkException {
              FSPermissionChecker pc = new FSPermissionChecker(
                  fsOwner.getShortUserName(), supergroup);
              if (!pc.isSuper) {
                dir.waitForReady();
                readLock();
                try {
                  pc.checkPermission(path, dir.rootDir, doCheckOwner,
                      ancestorAccess, parentAccess, access, subAccess);
                } finally {
                  readUnlock();
                } 
              }
              return pc;
            }
          

          //////////// proposed changes

            /** Rename src to dst */
            void renameTo(String src, String dst, Options.Rename... options)
                throws IOException, UnresolvedLinkException {
          ...
              FSPermissionChecker pc = new FSPermissionChecker(
                  fsOwner.getShortUserName(), supergroup);
          
              writeLock();
              try {
                renameToInternal(pc, src, dst, options);
                if (auditLog.isInfoEnabled() && isExternalInvocation()) {
                  resultingStat = dir.getFileInfo(dst, false); 
                }
              } finally {
                writeUnlock();
              }
          ...
            }
          
          
            private void renameToInternal(FSPermissionChecker pc, String src, String dst,
                Options.Rename... options) throws IOException {
          ...
              if (isPermissionEnabled) {
                checkParentAccess(pc, src, FsAction.WRITE);
                checkAncestorAccess(pc, dst, FsAction.WRITE);
              }
          ...
            }
          
          
            private FSPermissionChecker checkParentAccess(FSPermissionChecker pc, String path, FsAction access
                ) throws AccessControlException, UnresolvedLinkException {
              return checkPermission(pc, path, false, null, access, null, null);
            }
          
          
            private FSPermissionChecker checkPermission(FSPermissionChecker pc, String path, boolean doCheckOwner,
                FsAction ancestorAccess, FsAction parentAccess, FsAction access,
                FsAction subAccess) throws AccessControlException, UnresolvedLinkException {
              if (!pc.isSuper) {
                dir.waitForReady();
                readLock();
                try {
                  pc.checkPermission(path, dir.rootDir, doCheckOwner,
                      ancestorAccess, parentAccess, access, subAccess);
                } finally {
                  readUnlock();
                } 
              }
              return pc;
            }
          
          Show
          Xiaobo Peng added a comment - Sorry the former comment did not format well. I'm trying to format it now. The following code snippets show a simple way to change FSNamesystem::renameTo in branch-0.23.4. Changes to other methods are similar. //////////// existent code /** Rename src to dst */ void renameTo( String src, String dst, Options.Rename... options) throws IOException, UnresolvedLinkException { ... writeLock(); try { renameToInternal(src, dst, options); if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false ); } } finally { writeUnlock(); } ... } private void renameToInternal( String src, String dst, Options.Rename... options) throws IOException { ... if (isPermissionEnabled) { checkParentAccess(src, FsAction.WRITE); checkAncestorAccess(dst, FsAction.WRITE); } ... } private FSPermissionChecker checkParentAccess( String path, FsAction access ) throws AccessControlException, UnresolvedLinkException { return checkPermission(path, false , null , access, null , null ); } private FSPermissionChecker checkPermission( String path, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess) throws AccessControlException, UnresolvedLinkException { FSPermissionChecker pc = new FSPermissionChecker( fsOwner.getShortUserName(), supergroup); if (!pc.isSuper) { dir.waitForReady(); readLock(); try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); } } return pc; } //////////// proposed changes /** Rename src to dst */ void renameTo( String src, String dst, Options.Rename... options) throws IOException, UnresolvedLinkException { ... FSPermissionChecker pc = new FSPermissionChecker( fsOwner.getShortUserName(), supergroup); writeLock(); try { renameToInternal(pc, src, dst, options); if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false ); } } finally { writeUnlock(); } ... } private void renameToInternal(FSPermissionChecker pc, String src, String dst, Options.Rename... options) throws IOException { ... if (isPermissionEnabled) { checkParentAccess(pc, src, FsAction.WRITE); checkAncestorAccess(pc, dst, FsAction.WRITE); } ... } private FSPermissionChecker checkParentAccess(FSPermissionChecker pc, String path, FsAction access ) throws AccessControlException, UnresolvedLinkException { return checkPermission(pc, path, false , null , access, null , null ); } private FSPermissionChecker checkPermission(FSPermissionChecker pc, String path, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess) throws AccessControlException, UnresolvedLinkException { if (!pc.isSuper) { dir.waitForReady(); readLock(); try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); } } return pc; }
          Xiaobo Peng made changes -
          Field Original Value New Value
          Assignee Xiaobo Peng [ teledriver ]
          Hide
          Xiaobo Peng added a comment -

          The following code snippets show a simple way to change FSNamesystem::renameTo in branch-0.23.4. Changes to other methods are similar.

          //////////// existent code
          /** Rename src to dst */
          void renameTo(String src, String dst, Options.Rename... options)
          throws IOException, UnresolvedLinkException {
          ...
          writeLock();
          try {
          renameToInternal(src, dst, options);
          if (auditLog.isInfoEnabled() && isExternalInvocation())

          { resultingStat = dir.getFileInfo(dst, false); }
          } finally { writeUnlock(); }
          ...
          }


          private void renameToInternal(String src, String dst,
          Options.Rename... options) throws IOException {
          ...
          if (isPermissionEnabled) { checkParentAccess(src, FsAction.WRITE); checkAncestorAccess(dst, FsAction.WRITE); }
          ...
          }


          private FSPermissionChecker checkParentAccess(String path, FsAction access
          ) throws AccessControlException, UnresolvedLinkException { return checkPermission(path, false, null, access, null, null); }


          private FSPermissionChecker checkPermission(String path, boolean doCheckOwner,
          FsAction ancestorAccess, FsAction parentAccess, FsAction access,
          FsAction subAccess) throws AccessControlException, UnresolvedLinkException {
          FSPermissionChecker pc = new FSPermissionChecker(
          fsOwner.getShortUserName(), supergroup);
          if (!pc.isSuper) {
          dir.waitForReady();
          readLock();
          try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); }
          }
          return pc;
          }

          //////////// proposed changes
          /** Rename src to dst */
          void renameTo(String src, String dst, Options.Rename... options)
          throws IOException, UnresolvedLinkException {
          ...
          FSPermissionChecker pc = new FSPermissionChecker(
          fsOwner.getShortUserName(), supergroup);

          writeLock();
          try {
          renameToInternal(pc, src, dst, options);
          if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false); }

          } finally

          { writeUnlock(); }

          ...
          }

          private void renameToInternal(FSPermissionChecker pc, String src, String dst,
          Options.Rename... options) throws IOException {
          ...
          if (isPermissionEnabled)

          { checkParentAccess(pc, src, FsAction.WRITE); checkAncestorAccess(pc, dst, FsAction.WRITE); }

          ...
          }

          private FSPermissionChecker checkParentAccess(FSPermissionChecker pc, String path, FsAction access
          ) throws AccessControlException, UnresolvedLinkException

          { return checkPermission(pc, path, false, null, access, null, null); }

          private FSPermissionChecker checkPermission(FSPermissionChecker pc, String path, boolean doCheckOwner,
          FsAction ancestorAccess, FsAction parentAccess, FsAction access,
          FsAction subAccess) throws AccessControlException, UnresolvedLinkException {
          if (!pc.isSuper) {
          dir.waitForReady();
          readLock();
          try

          { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); }

          finally

          { readUnlock(); }


          }
          return pc;
          }

          Show
          Xiaobo Peng added a comment - The following code snippets show a simple way to change FSNamesystem::renameTo in branch-0.23.4. Changes to other methods are similar. //////////// existent code /** Rename src to dst */ void renameTo(String src, String dst, Options.Rename... options) throws IOException, UnresolvedLinkException { ... writeLock(); try { renameToInternal(src, dst, options); if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false); } } finally { writeUnlock(); } ... } private void renameToInternal(String src, String dst, Options.Rename... options) throws IOException { ... if (isPermissionEnabled) { checkParentAccess(src, FsAction.WRITE); checkAncestorAccess(dst, FsAction.WRITE); } ... } private FSPermissionChecker checkParentAccess(String path, FsAction access ) throws AccessControlException, UnresolvedLinkException { return checkPermission(path, false, null, access, null, null); } private FSPermissionChecker checkPermission(String path, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess) throws AccessControlException, UnresolvedLinkException { FSPermissionChecker pc = new FSPermissionChecker( fsOwner.getShortUserName(), supergroup); if (!pc.isSuper) { dir.waitForReady(); readLock(); try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); } } return pc; } //////////// proposed changes /** Rename src to dst */ void renameTo(String src, String dst, Options.Rename... options) throws IOException, UnresolvedLinkException { ... FSPermissionChecker pc = new FSPermissionChecker( fsOwner.getShortUserName(), supergroup); writeLock(); try { renameToInternal(pc, src, dst, options); if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false); } } finally { writeUnlock(); } ... } private void renameToInternal(FSPermissionChecker pc, String src, String dst, Options.Rename... options) throws IOException { ... if (isPermissionEnabled) { checkParentAccess(pc, src, FsAction.WRITE); checkAncestorAccess(pc, dst, FsAction.WRITE); } ... } private FSPermissionChecker checkParentAccess(FSPermissionChecker pc, String path, FsAction access ) throws AccessControlException, UnresolvedLinkException { return checkPermission(pc, path, false, null, access, null, null); } private FSPermissionChecker checkPermission(FSPermissionChecker pc, String path, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess) throws AccessControlException, UnresolvedLinkException { if (!pc.isSuper) { dir.waitForReady(); readLock(); try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); } } return pc; }
          Xiaobo Peng created issue -

            People

            • Assignee:
              Xiaobo Peng
              Reporter:
              Xiaobo Peng
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development