Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14847

Erasure Coding: Blocks are over-replicated while EC decommissioning

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.2.0, 3.0.3, 3.1.2, 3.3.0
    • Fix Version/s: 3.3.0, 3.1.4, 3.2.2
    • Component/s: ec
    • Labels:
    • Hadoop Flags:


      Found that Some blocks are over-replicated while ec decommissioning. Messages in log as follow

      INFO BlockStateChange: Block: blk_-9223372035714984112_363779142, Expected Replicas: 9, live replicas: 8, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 3, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: , Current Datanode:, Is current datanode decommissioning: true, Is current datanode entering maintenance: false

      Decommisions hang for a long time.

      Deep into the code and find that There is a problem in ErasureCodingWork.java
      For Example, there are 2 nodes(dn0, dn1) in decommission and an ec block group with the 2 nodes. After creating an ErasureCodingWork to reconstruct, it will create 2 replication work.
      If dn0 replicates in success and dn1 replicates in failure, Then it will always create replication work for dn0. The block on dn0 is over-replicated and The block on dn1 will never replicate
      Here is the initial path for this.


        1. HDFS-14847.005.patch
          12 kB
          Hui Fei
        2. HDFS-14847.004.patch
          12 kB
          Hui Fei
        3. HDFS-14847.003.patch
          10 kB
          Hui Fei
        4. HDFS-14847.002.patch
          11 kB
          Hui Fei
        5. HDFS-14847.001.patch
          11 kB
          Hui Fei



            • Assignee:
              ferhui Hui Fei
              ferhui Hui Fei


              • Created:

                Issue deployment