Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8995

Flaw in registration bookeeping can make DN die on reconnect

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Normally data nodes re-register with the namenode when it was unreachable for more than the heartbeat expiration and becomes reachable again. Datanodes keep retrying the last rpc call such as incremental block report and heartbeat and when it finally gets through the namenode tells it to re-register.

      We have observed that some of datanodes stay dead in such scenarios. Further investigation has revealed that those were told to shutdown by the namenode.

        Attachments

          Activity

            People

            • Assignee:
              kihwal Kihwal Lee
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: