Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9239

DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: datanode, namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count.
      Show
      This release adds a new feature called the DataNode Lifeline Protocol. If configured, then DataNodes can report that they are still alive to the NameNode via a fallback protocol, separate from the existing heartbeat messages. This can prevent the NameNode from incorrectly marking DataNodes as stale or dead in highly overloaded clusters where heartbeat processing is suffering delays. For more information, please refer to the hdfs-default.xml documentation for several new configuration properties: dfs.namenode.lifeline.rpc-address, dfs.namenode.lifeline.rpc-bind-host, dfs.datanode.lifeline.interval.seconds, dfs.namenode.lifeline.handler.ratio and dfs.namenode.lifeline.handler.count.

      Description

      This issue proposes introduction of a new feature: the DataNode Lifeline Protocol. This is an RPC protocol that is responsible for reporting liveness and basic health information about a DataNode to a NameNode. Compared to the existing heartbeat messages, it is lightweight and not prone to resource contention problems that can harm accurate tracking of DataNode liveness currently. The attached design document contains more details.

        Attachments

        1. HDFS-9239.003.patch
          77 kB
          Chris Nauroth
        2. HDFS-9239.002.patch
          77 kB
          Chris Nauroth
        3. HDFS-9239.001.patch
          75 kB
          Chris Nauroth
        4. DataNode-Lifeline-Protocol.pdf
          124 kB
          Chris Nauroth

          Issue Links

            Activity

              People

              • Assignee:
                cnauroth Chris Nauroth
                Reporter:
                cnauroth Chris Nauroth
              • Votes:
                0 Vote for this issue
                Watchers:
                32 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: