Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-16739

EC: Reconstruction failed when file has specified StoragePolicy

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.3
    • 3.1.3
    • None
    • None

    Description

      We found that due to BlockReconstructionWork use the same chooseTarget function with Redundancy Block, so the targe returned is more than real additionalReplRequired due to need to satisfy storage policy. So , it causes all kind of exception when DN do ECReconstructionWork.

      One of Exception in DN as follows:

      2022-08-24 03:01:39,534 WARN [Command processor] org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to reconstruct striped block blk_-9223372032283192848_35319673088
      java.lang.IllegalArgumentException: Too much missed striped blocks.
          at com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
          at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.<init>(StripedWriter.java:87)
          at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.<init>(StripedBlockReconstructor.java:45)
          at org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:134)
          at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:797)
          at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
          at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1306)
          at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1344)
          at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1280)
          at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1267) 

      this file ec policy is RS-6-3-1024k, here is inner block info, blk_-9223372032283192845 (index:3) need to reconstruct , and all Storage is DISK ,but the file's storage policy is ALL_SSD

      [blk_-9223372032283192848:DatanodeInfoWithStorage[10.x.x.33:50010,DS-e1435341-f43c-42ef-806f-90fsddfsfdcd,DISK],
       blk_-9223372032283192847:DatanodeInfoWithStorage[10.x.x.35:50010,DS-a6dsd16a-676a-4fed-8ffe-fsdfscw23445,DISK],
       blk_-9223372032283192846:DatanodeInfoWithStorage[10.x.x.34:50010,DS-40cdc124-e2e0-40f6-aa47-4d2bdsf3e8e5,DISK],
       blk_-9223372032283192844:DatanodeInfoWithStorage[10.x.x.21:50010,DS-ef9dee4f-dfb2-495c-872a-974dfscds58e,DISK],
       blk_-9223372032283192843:DatanodeInfoWithStorage[10.x.x.40:50010,DS-6dsedfa7-8291-46bb-964d-dfsf34567655,DISK],
       blk_-9223372032283192842:DatanodeInfoWithStorage[10.x.x.36:50010,DS-2dddc387-c38b-427d-9925-15a664d3472b,DISK],
       blk_-9223372032283192841:DatanodeInfoWithStorage[10.x.x.151:50010,DS-fds91a7-89ad-4899-bc44-675dfs32f58e,DISK],
       blk_-9223372032283192840:DatanodeInfoWithStorage[10.x.x.27:50010,DS-77dfs4c1-c23c-4b26-baa3-aadsfdff4118,DISK]] 

      here is BlockECReconstructionInfo, due to all inner block is not satisfied with storage policy(ALL_SSD) , so the target length is 9 rather than 1. 

      2022-08-24 03:01:39,534 INFO [Command processor] org.apache.hadoop.hdfs.server.datanode.DataNode: processErasureCodingTasks  BlockECReconstructionInfo(
        Recovering BP-390041874-10.x.x.x-1550651014658:blk_-9223372032283192848_35319673088 From: [10.x.x.33:50010, 10.x.x.35:50010, 10.x.x.34:50010, 10.x.x.21:50010, 10.x.x.40:50010, 10.x.x.36:50010, 10.x.x.151:50010, 10.x.x.27:50010] To: [[10.x.x.37:50010, 10.x.x.21:50010, 10.x.x.32:50010, 10.x.x.27:50010, 10.x.x.28:50010, 10.x.x.23:50010, 10.x.x.23:50010, 10.x.x.101:50010, 10.x.x.32:50010])
       Block Indices: [0, 1, 2, 4, 5, 6, 7, 8] 

      when init stripedWriter in DN StripedBlockReconstructor, need to judge targetIndicies.length<=prityBlkNum (9<=3) . so, this striped blocks will never reconstruct successfully.

      targetIndices = new short[targets.length];
      Preconditions.checkArgument(targetIndices.length <= parityBlkNum,
          "Too much missed striped blocks."); 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Luominghui MingHui Luo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: