Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15725

Lease Recovery never completes for a committed block which the DNs never finalize

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.3.1, 3.4.0, 2.10.2, 3.2.3
    • namenode
    • None
    • Reviewed

    Description

      It a very rare condition, the HDFS client process can get killed right at the time it is completing a block / file.

      The client sends the "complete" call to the namenode, moving the block into a committed state, but it dies before it can send the final packet to the Datanodes telling them to finalize the block.

      This means the blocks are stuck on the datanodes in RBW state and nothing will ever tell them to move out of that state.

      The namenode / lease manager will retry forever to close the file, but it will always complain it is waiting for blocks to reach minimal replication.

      I have a simple test and patch to fix this, but I think it warrants some discussion on whether this is the correct thing to do, or if I need to put the fix behind a config switch.

      My idea, is that if lease recovery occurs, and the block is still waiting on "minimal replication", just put the file back to UNDER_CONSTRUCTION so that on the next lease recovery attempt, BLOCK RECOVERY will happen, close the file and move the replicas to FINALIZED.

      Attachments

        1. HDFS-15725.001.patch
          7 kB
          Stephen O'Donnell
        2. lease_recovery_2_10.patch
          8 kB
          Kihwal Lee
        3. HDFS-15725.002.patch
          14 kB
          Stephen O'Donnell
        4. HDFS-15725.003.patch
          14 kB
          Stephen O'Donnell
        5. HDFS-15725.branch-3.2.001.patch
          13 kB
          Stephen O'Donnell
        6. HDFS-15725.branch-2.10.001.patch
          13 kB
          Stephen O'Donnell

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sodonnell Stephen O'Donnell
            sodonnell Stephen O'Donnell
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment