Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1203

DataNode should sleep before reentering service loop after an exception

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.22.0
    • 0.22.0
    • datanode
    • None
    • Reviewed

    Description

      When the DN gets an exception in response to a heartbeat, it logs it and continues, but there is no sleep. I've occasionally seen bugs produce a case where heartbeats continuously produce exceptions, and thus the DN floods the NN with bad heartbeats. Adding a 1 second sleep at least throttles the error messages for easier debugging and error isolation.

      Attachments

        1. hdfs-1203.txt
          0.6 kB
          Todd Lipcon
        2. hdfs-1203.txt
          0.7 kB
          Todd Lipcon

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment