Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11499

Decommissioning stuck because of failing recovery

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Allow a block to complete if the number of replicas on live nodes, decommissioning nodes and nodes in maintenance mode satisfies minimum replication factor.
      The fix prevents block recovery failure if replica of last block is being decommissioned. Vice versa, the decommissioning will be stuck, waiting for the last block to be completed. In addition, file close() operation will not fail due to last block being decommissioned.
      Show
      Allow a block to complete if the number of replicas on live nodes, decommissioning nodes and nodes in maintenance mode satisfies minimum replication factor. The fix prevents block recovery failure if replica of last block is being decommissioned. Vice versa, the decommissioning will be stuck, waiting for the last block to be completed. In addition, file close() operation will not fail due to last block being decommissioned.

      Description

      Block recovery will fail to finalize the file if the locations of the last, incomplete block are being decommissioned. Vice versa, the decommissioning will be stuck, waiting for the last block to be completed.

      org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize INodeFile testRecoveryFile since blocks[255] is non-complete, where blocks=[blk_1073741825_1001, blk_1073741826_1002...
      

      The fix is to count replicas on decommissioning nodes when completing last block in BlockManager.commitOrCompleteLastBlock, as we know that the DecommissionManager will not decommission a node that has UC blocks.

        Attachments

        1. HDFS-11499.patch
          4 kB
          Lukas Majercak
        2. HDFS-11499.branch-2.8.patch
          3 kB
          Wei-Chiu Chuang
        3. HDFS-11499.branch-2.7.patch
          3 kB
          Wei-Chiu Chuang
        4. HDFS-11499.05.patch
          7 kB
          Lukas Majercak
        5. HDFS-11499.04.patch
          8 kB
          Lukas Majercak
        6. HDFS-11499.03.patch
          8 kB
          Manoj Govindassamy
        7. HDFS-11499.02.patch
          8 kB
          Manoj Govindassamy

          Issue Links

            Activity

              People

              • Assignee:
                lukmajercak Lukas Majercak
                Reporter:
                lukmajercak Lukas Majercak
              • Votes:
                0 Vote for this issue
                Watchers:
                16 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: