Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4071

FSNameSystem.isReplicationInProgress should add an underReplicated block to the neededReplication queue using method "add" not "update"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.19.0
    • None
    • None
    • Reviewed

    Description

      We have a datanode that did not get decommission done for days. It turned out that there was an under replicated block that was never placed in the neededReplication queue and therefore did not get replicated. The following debug line showed the problem:

      " DEBUG org.apache.hadoop.dfs.StateChange: UnderReplicationBlocks.update blk_-7437651423871278837_0 curReplicas 8
      curExpectedReplicas 10 oldReplicas 9 oldExpectedReplicas 10 curPri 2 oldPri 2"

      The block was not in the neededReplication queue, but the update method concluded that the block was under replicated and the priority level did not change, so it did not add the block to the needReplication queue.

      The solution is that in stead of using the update method, the name node should use the add method to add the block to the neededReplication queue. The add method guarantees success if the block is indeed under replicated.

      Attachments

        1. decommission.patch
          1 kB
          Hairong Kuang

        Activity

          People

            hairong Hairong Kuang
            hairong Hairong Kuang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: