Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9239

DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • datanode, namenode
    • None
    • Reviewed
    • Hide
      This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count.
      Show
      This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count.

    Description

      This issue proposes introduction of a new feature: the DataNode Lifeline Protocol. This is an RPC protocol that is responsible for reporting liveness and basic health information about a DataNode to a NameNode. Compared to the existing heartbeat messages, it is lightweight and not prone to resource contention problems that can harm accurate tracking of DataNode liveness currently. The attached design document contains more details.

      Attachments

        1. DataNode-Lifeline-Protocol.pdf
          124 kB
          Chris Nauroth
        2. HDFS-9239.001.patch
          75 kB
          Chris Nauroth
        3. HDFS-9239.002.patch
          77 kB
          Chris Nauroth
        4. HDFS-9239.003.patch
          77 kB
          Chris Nauroth

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cnauroth Chris Nauroth
            cnauroth Chris Nauroth
            Votes:
            0 Vote for this issue
            Watchers:
            36 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment