Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1203

DataNode should sleep before reentering service loop after an exception

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When the DN gets an exception in response to a heartbeat, it logs it and continues, but there is no sleep. I've occasionally seen bugs produce a case where heartbeats continuously produce exceptions, and thus the DN floods the NN with bad heartbeats. Adding a 1 second sleep at least throttles the error messages for easier debugging and error isolation.

        Attachments

        1. hdfs-1203.txt
          0.6 kB
          Todd Lipcon
        2. hdfs-1203.txt
          0.7 kB
          Todd Lipcon

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: