Hadoop Common
  1. Hadoop Common
  2. HADOOP-1184

Decommission fails if a block that needs replication has only one replica

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      If the only replica of a block resides on a node being decommissioned, then the decommission command does not complete. The blocks do not get added to neededReplication because neededReplications.update() believes that the number of current replicas is zero.

      1. decommissionOneReplica6.patch
        23 kB
        dhruba borthakur
      2. decommissionOneReplica5.patch
        13 kB
        dhruba borthakur
      3. decommissionOneReplica4.patch
        13 kB
        dhruba borthakur
      4. decommissionOneReplica3.patch
        14 kB
        dhruba borthakur

        Activity

        Hide
        dhruba borthakur added a comment -

        This patch enhances TestDecommission to test the case when there is only one replica of a block. It also removes method filterDecommissionedNodes() because it was not essential.

        The code change in neededReplications.update() needs some discussion. I did this as a short-cut to not propagate knowledge about "decommissioning" into neededReplications.getPriority().

        Show
        dhruba borthakur added a comment - This patch enhances TestDecommission to test the case when there is only one replica of a block. It also removes method filterDecommissionedNodes() because it was not essential. The code change in neededReplications.update() needs some discussion. I did this as a short-cut to not propagate knowledge about "decommissioning" into neededReplications.getPriority().
        Hide
        Raghu Angadi added a comment -

        > It also removes method filterDecommissionedNodes() because it was not essential.

        HADOOP-988 also does this.

        Show
        Raghu Angadi added a comment - > It also removes method filterDecommissionedNodes() because it was not essential. HADOOP-988 also does this.
        Hide
        dhruba borthakur added a comment -

        1. Updated unit test to test decommissioning a file with one replica.
        2. Made the unit test to trigger decommission-done chek every 3 seconds. This makes the unit test finish sooner.
        3. Removed the call to checkDecommissionState() from the ReplicationMonitorThread because the DecommissionMinitorThread is already doing this job.
        4. Changed UnderReplicatedBlocks.getPriority() to return a valid level for blocks that have 0 replicas.

        Show
        dhruba borthakur added a comment - 1. Updated unit test to test decommissioning a file with one replica. 2. Made the unit test to trigger decommission-done chek every 3 seconds. This makes the unit test finish sooner. 3. Removed the call to checkDecommissionState() from the ReplicationMonitorThread because the DecommissionMinitorThread is already doing this job. 4. Changed UnderReplicatedBlocks.getPriority() to return a valid level for blocks that have 0 replicas.
        Hide
        Hairong Kuang added a comment -

        For change 4:

        The parameter to UnderReplicatedBlocks.getPriority() is the number of nondecommisioning replicas. So I do not think by simply changing the priority of blocks with 0 nondecomissioning replicas to be 0 solves the problem. If a block has zero replicas, it needs to be deleted from the underReplicatedBlocks queue by having an invalid priority level. Only the block with zero nondecomissioning replicas needs to have 0 priority.

        Show
        Hairong Kuang added a comment - For change 4: The parameter to UnderReplicatedBlocks.getPriority() is the number of nondecommisioning replicas. So I do not think by simply changing the priority of blocks with 0 nondecomissioning replicas to be 0 solves the problem. If a block has zero replicas, it needs to be deleted from the underReplicatedBlocks queue by having an invalid priority level. Only the block with zero nondecomissioning replicas needs to have 0 priority.
        Hide
        dhruba borthakur added a comment -

        1. Incorporated Hairong's review comments. getPriority() now handles the case when there is only one replica of the file and that node is being decommissioned.
        2. Enhanced the test case to have a test case for decommissioning a node that has the only replica of a block.
        3. Removed the checkDecommissioned() method from the ReplciationMonitor because there is already a separate thread that checks whether the decommissioning was complete.
        4. Fixed a bug introduced in hadoop-988 that caused pendingTransfers to ignore replicating blocks that have only one replica on a being-decommissioned node.

        Show
        dhruba borthakur added a comment - 1. Incorporated Hairong's review comments. getPriority() now handles the case when there is only one replica of the file and that node is being decommissioned. 2. Enhanced the test case to have a test case for decommissioning a node that has the only replica of a block. 3. Removed the checkDecommissioned() method from the ReplciationMonitor because there is already a separate thread that checks whether the decommissioning was complete. 4. Fixed a bug introduced in hadoop-988 that caused pendingTransfers to ignore replicating blocks that have only one replica on a being-decommissioned node.
        Hide
        dhruba borthakur added a comment -

        Incorporated Hairong's review comments.

        Show
        dhruba borthakur added a comment - Incorporated Hairong's review comments.
        Hide
        Hairong Kuang added a comment -

        Here are a few more comments:
        1. The indention of the patch seems not the same as the requirement.
        2. In isReplicationInProgress, in addiion to to check if neededReplications contains a block, shall we also check if the pendingReplication queue does not contain the block?
        3. In containingNodeList, the parameter numReplicas works as an output. Is it clearer if we create a static class as the return class contaning both a datanode list and an integer? Also since the variable nonCommisiionedNodeList now contains decommissioning nodes, it would be better to change its name.
        4. In TestDecommission, the test case for decomissioning a node that has the only replica of a block does not 100% guarateen that the decomissioning node contains any block. A more determininstic testcase is better. One solution could be first start a minicluster with one datanode and create a file, then add one more datanode and decomission the old one.

        Show
        Hairong Kuang added a comment - Here are a few more comments: 1. The indention of the patch seems not the same as the requirement. 2. In isReplicationInProgress, in addiion to to check if neededReplications contains a block, shall we also check if the pendingReplication queue does not contain the block? 3. In containingNodeList, the parameter numReplicas works as an output. Is it clearer if we create a static class as the return class contaning both a datanode list and an integer? Also since the variable nonCommisiionedNodeList now contains decommissioning nodes, it would be better to change its name. 4. In TestDecommission, the test case for decomissioning a node that has the only replica of a block does not 100% guarateen that the decomissioning node contains any block. A more determininstic testcase is better. One solution could be first start a minicluster with one datanode and create a file, then add one more datanode and decomission the old one.
        Hide
        dhruba borthakur added a comment -

        Thanks Hairong for the review comments.

        1. Fixed indentation in a few places.
        2. Yes, I agree with you that we need to check the pendingReplication queue too. I added this change. This was a good catch! Thanks.
        3. I changed the variable name from nonCommisiionedNodeList to nodeList. I did not create a static class (yet) for the return values. If you feel strongly about it, then I can do it. Please let me know.
        4. I agree that the test should pick the a node that has some blocks to be decommissioned. Without changing MiniCluster, one way to go about it is to make getFileHints() return the hostname and the port of the datanode on which the block resides. The test can decommission only that node on which a block resides. Currently, getFileHints() returns only the hostname of the datanode. As part of hadoop-1296, we can make the getFileHints() return the port number. And then I can modify the test to decommission a node only if it has blocks on it. I will add this comment to hadoop-1296.

        Show
        dhruba borthakur added a comment - Thanks Hairong for the review comments. 1. Fixed indentation in a few places. 2. Yes, I agree with you that we need to check the pendingReplication queue too. I added this change. This was a good catch! Thanks. 3. I changed the variable name from nonCommisiionedNodeList to nodeList. I did not create a static class (yet) for the return values. If you feel strongly about it, then I can do it. Please let me know. 4. I agree that the test should pick the a node that has some blocks to be decommissioned. Without changing MiniCluster, one way to go about it is to make getFileHints() return the hostname and the port of the datanode on which the block resides. The test can decommission only that node on which a block resides. Currently, getFileHints() returns only the hostname of the datanode. As part of hadoop-1296, we can make the getFileHints() return the port number. And then I can modify the test to decommission a node only if it has blocks on it. I will add this comment to hadoop-1296.
        Hide
        dhruba borthakur added a comment -

        The previous patch did not apply cleanly to trunk anymore because FSNamesystem has been split into multiple files. Here is a patch that applies cleanly to trunk.

        Show
        dhruba borthakur added a comment - The previous patch did not apply cleanly to trunk anymore because FSNamesystem has been split into multiple files. Here is a patch that applies cleanly to trunk.
        Hide
        Hairong Kuang added a comment -

        The new patch changed APIs for class UnderReplicatedBlocks to take an additional parameter decommissionedReplicas. This parameter is used only when numReplicas is equal to zero. But since the possibilty of numReplicas with a value of 0 is quite slim, I feel that making API changes for this special case is not neccessary.

        Show
        Hairong Kuang added a comment - The new patch changed APIs for class UnderReplicatedBlocks to take an additional parameter decommissionedReplicas. This parameter is used only when numReplicas is equal to zero. But since the possibilty of numReplicas with a value of 0 is quite slim, I feel that making API changes for this special case is not neccessary.
        Hide
        dhruba borthakur added a comment -

        The change in API to UnderReplicatedBlocks is required because it is not an inner class of FSNamesystem anymore. This change was made to facilitate fine-grain locking of the namenode data structures. The locking model is simpler if UnderReplicatedBlocks does not access global variables/methods of FSnamesystem. Please let me know if this sounds reasonable.

        Show
        dhruba borthakur added a comment - The change in API to UnderReplicatedBlocks is required because it is not an inner class of FSNamesystem anymore. This change was made to facilitate fine-grain locking of the namenode data structures. The locking model is simpler if UnderReplicatedBlocks does not access global variables/methods of FSnamesystem. Please let me know if this sounds reasonable.
        Hide
        Hairong Kuang added a comment -

        +1 look good.

        Show
        Hairong Kuang added a comment - +1 look good.
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12356583/decommissionOneReplica6.patch applied and successfully tested against trunk revision r534975. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/113/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/113/console
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Dhruba!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Dhruba!
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #82 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/82/ )

          People

          • Assignee:
            Unassigned
            Reporter:
            dhruba borthakur
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development