Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8676

Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In big busy clusters where the deletion rate is also high, a lot of blocks can pile up in the datanode trash directories until an upgrade is finalized. When it is finally finalized, the deletion of trash is done in the service actor thread's context synchronously. This blocks the heartbeat and can cause heartbeat expiration.

      We have seen a namenode losing hundreds of nodes after a delayed upgrade finalization. The deletion of trash directories should be made asynchronous.

        Attachments

        1. HDFS-8676.01.patch
          3 kB
          Walter Su
        2. HDFS-8676.02.patch
          3 kB
          Walter Su

        Issue Links

          Activity

            People

            • Assignee:
              walter.k.su Walter Su
              Reporter:
              kihwal Kihwal Lee

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment