Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4222

NN is unresponsive and loses heartbeats of DNs when Hadoop is configured to use LDAP and LDAP has issues

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.0.0, 0.23.3, 2.0.0-alpha
    • 1.2.0, 0.23.7, 2.1.0-beta
    • namenode
    • None
    • Reviewed

    Description

      For Hadoop clusters configured to access directory information by LDAP, the FSNamesystem calls on behave of DFS clients might hang due to LDAP issues (including LDAP access issues caused by networking issues) while holding the single lock of FSNamesystem. That will result in the NN unresponsive and loss of the heartbeats from DNs.

      The places LDAP got accessed by FSNamesystem calls are the instantiation of FSPermissionChecker, which could be moved out of the lock scope since the instantiation does not need the FSNamesystem lock. After the move, a DFS client hang will not affect other threads by hogging the single lock. This is especially helpful when we use separate RPC servers for ClientProtocol and DatanodeProtocol since the calls for DatanodeProtocol do not need to access LDAP. So even if DFS clients hang due to LDAP issues, the NN will still be able to process the requests (including heartbeats) from DNs.

      Attachments

        1. HDFS-4222-branch-1.patch
          49 kB
          Xiaobo Peng
        2. HDFS-4222.23.patch
          28 kB
          Suresh Srinivas
        3. HDFS-4222.patch
          30 kB
          Suresh Srinivas
        4. HDFS-4222.patch
          27 kB
          Suresh Srinivas
        5. hdfs-4222-release-1.0.3.patch
          45 kB
          Xiaobo Peng
        6. hdfs-4222-branch-0.23.3.patch
          29 kB
          Xiaobo Peng

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            teledriver Xiaobo Peng
            teledriver Xiaobo Peng
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment