Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17150

EC: Fix the bug of failed lease recovery.

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersStop watchingWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      If the client crashes without writing the minimum number of internal blocks required by the EC policy, the lease recovery process for the corresponding unclosed file may continue to fail. Taking RS(6,3) policy as an example, the timeline is as follows:
      1. The client writes some data to only 5 datanodes;
      2. Client crashes;
      3. NN fails over;
      4. Now the result of `uc.getNumExpectedLocations()` completely depends on block report, and there are 5 datanodes reporting internal blocks;
      5. When the lease expires hard limit, NN issues a block recovery command;
      6. The datanode checks the command and finds that the number of internal blocks is insufficient, resulting in an error and recovery failure;

      7. The lease expires hard limit again, and NN issues a block recovery command again, but the recovery fails again......

      When the number of internal blocks written by the client is less than 6, the block group is actually unrecoverable. We should equate this situation to the case where the number of replicas is 0 when processing replica files, i.e., directly remove the last block group and close the file.

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zhangshuyan Shuyan Zhang Assign to me
            zhangshuyan Shuyan Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment