Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9239

DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • datanode, namenode
    • None
    • Reviewed
    • Hide
      This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count.
      Show
      This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count.

    Description

      This issue proposes introduction of a new feature: the DataNode Lifeline Protocol. This is an RPC protocol that is responsible for reporting liveness and basic health information about a DataNode to a NameNode. Compared to the existing heartbeat messages, it is lightweight and not prone to resource contention problems that can harm accurate tracking of DataNode liveness currently. The attached design document contains more details.

      Attachments

        1. DataNode-Lifeline-Protocol.pdf
          124 kB
          Chris Nauroth
        2. HDFS-9239.001.patch
          75 kB
          Chris Nauroth
        3. HDFS-9239.002.patch
          77 kB
          Chris Nauroth
        4. HDFS-9239.003.patch
          77 kB
          Chris Nauroth

        Issue Links

          Activity

            People

              cnauroth Chris Nauroth
              cnauroth Chris Nauroth
              Votes:
              0 Vote for this issue
              Watchers:
              36 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: