Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2114

re-commission of a decommissioned node does not delete excess replica

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: None
    • Labels:
      None

      Description

      If a decommissioned node is removed from the decommissioned list, namenode does not delete the excess replicas it created while the node was decommissioned.

      1. HDFS-2114-6.patch
        11 kB
        John George
      2. HDFS-2114-5.patch
        9 kB
        John George
      3. HDFS-2114-4.patch
        9 kB
        John George
      4. HDFS-2114-3.patch
        7 kB
        John George
      5. HDFS-2114-2.patch
        7 kB
        John George
      6. HDFS-2114.patch
        7 kB
        John George

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #797 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/797/)
        HDFS-2114. re-commission of a decommissioned node does not delete excess replicas. Contributed by John George.

        mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1148981
        Files :

        • /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
        • /hadoop/common/trunk/hdfs/CHANGES.txt
        • /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/TestDecommission.java
        • /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #797 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/797/ ) HDFS-2114 . re-commission of a decommissioned node does not delete excess replicas. Contributed by John George. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1148981 Files : /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hdfs/CHANGES.txt /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/TestDecommission.java /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
        Hide
        Matt Foley added a comment -

        +1. Committed to trunk. Thanks, John!

        Note: There were garbage characters in the JavaDoc comments for method TestDecommission.checkFile(). (Might be my fault since I emailed the text to you.) I just removed them before committing.

        Note that TestDecommission takes 190 seconds to run on a quad-core Mac. Should consider ways to decrease this time.

        Show
        Matt Foley added a comment - +1. Committed to trunk. Thanks, John! Note: There were garbage characters in the JavaDoc comments for method TestDecommission.checkFile(). (Might be my fault since I emailed the text to you.) I just removed them before committing. Note that TestDecommission takes 190 seconds to run on a quad-core Mac. Should consider ways to decrease this time.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12487155/HDFS-2114-6.patch
        against trunk revision 1148348.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/981//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/981//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/981//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487155/HDFS-2114-6.patch against trunk revision 1148348. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/981//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/981//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/981//console This message is automatically generated.
        Hide
        John George added a comment -

        Thanks a lot Matt for the comments and the code snippet. Attaching a new patch with the suggested changes.

        Show
        John George added a comment - Thanks a lot Matt for the comments and the code snippet. Attaching a new patch with the suggested changes.
        Hide
        Matt Foley added a comment -

        Hi John, big improvement. Comments:

        1. Thanks for finding the place where decommissioned nodes are sorted to the end of the list. I feel much better now

        2. I'll go along with use of the isNodeDown flag on the grounds that if a node name were ever null (which should be impossible) it might match a null value of "downnode".

        3. This is a nit, but could you please change local constant NAMENODE_REPLICATION_INTERVAL_KEY to NAMENODE_REPLICATION_INTERVAL ?

        4. checkFile(): You've definitely improved the inner loop a lot. However, on second reading I am concerned about the use of checkFile() in testRecommission(). I believe the use in testDecommission() is intended to be instantaneous, reading the state of the system before replica deletion has time to be done, while the use in testRecommission() is intended to be after it reaches steady state, after the system has readjusted the number of replicas. The use of exceptions to notify problem states doesn't work too well in the latter case, as you had to work around. Also I'm concerned about race conditions while trying to get a particular reading from checkFile() during potential changes in replication state. My suggested fix is too long for a comment, so I've emailed you a code fragment. Thanks.

        Show
        Matt Foley added a comment - Hi John, big improvement. Comments: 1. Thanks for finding the place where decommissioned nodes are sorted to the end of the list. I feel much better now 2. I'll go along with use of the isNodeDown flag on the grounds that if a node name were ever null (which should be impossible) it might match a null value of "downnode". 3. This is a nit, but could you please change local constant NAMENODE_REPLICATION_INTERVAL_KEY to NAMENODE_REPLICATION_INTERVAL ? 4. checkFile(): You've definitely improved the inner loop a lot. However, on second reading I am concerned about the use of checkFile() in testRecommission(). I believe the use in testDecommission() is intended to be instantaneous, reading the state of the system before replica deletion has time to be done, while the use in testRecommission() is intended to be after it reaches steady state, after the system has readjusted the number of replicas. The use of exceptions to notify problem states doesn't work too well in the latter case, as you had to work around. Also I'm concerned about race conditions while trying to get a particular reading from checkFile() during potential changes in replication state. My suggested fix is too long for a comment, so I've emailed you a code fragment. Thanks.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12486877/HDFS-2114-5.patch
        against trunk revision 1147762.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestHDFSTrash

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/960//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/960//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/960//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486877/HDFS-2114-5.patch against trunk revision 1147762. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/960//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/960//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/960//console This message is automatically generated.
        Hide
        John George added a comment -

        attaching a patch with new name so that hudson picks it up.

        Show
        John George added a comment - attaching a patch with new name so that hudson picks it up.
        Hide
        John George added a comment -

        modifying checkFile based on Matt's comments.

        Show
        John George added a comment - modifying checkFile based on Matt's comments.
        Hide
        John George added a comment -

        > Matt Foley commented on HDFS-2114:
        > ----------------------------------
        >
        > bq. The first use of

        {isNodeDown}

        is necessary because

        {downnode}

        could be
        > null in cases when we are checking for "Recommission".
        >
        > Yes, but in this case nodes[j].getName().equals(downnode) will return false
        > without any problem.

        Also, I forgot to mention the check ensures that hasdown is not incremented. But, I agree with you that the code does not look as good as it should. I have attached a modified patch with some more changes.

        >
        > bq. My understanding is that this assert ensures that the current blk has only
        > ONE replica that is in decommissioned state.
        >
        > But it also seems to ensure that the only replica that is down is the LAST one
        > in the array of nodes for the block,
        > due to the way this block works:
        >

        >       int firstDecomNodeIndex = -1;
        >       DatanodeInfo[] nodes = blk.getLocations();
        >       for (int j = 0; j < nodes.length; j++) {     // for each replica of blk
        >         ...
        >         if (nodes[j].isDecommissioned()) {
        >           if (firstDecomNodeIndex == -1) {
        >             firstDecomNodeIndex = j;
        >           }
        >           continue;
        >         }
        >         assertEquals("Decom node is not at the end", firstDecomNodeIndex, -1);
        >       }
        > 

        > Yet I don't see any code (eg in LocatedBlock.getLocations()) where downed
        > nodes are moved to the end of that list. Maybe somewhere else?

        Yes I think it checks to ensure that the nodes are moved to the end of that list. The code is in FSNamesystem.java. The last 5 lines in the following code is what does it.

          /**
        
           * Get block locations within the specified range.
        
           * @see ClientProtocol#getBlockLocations(String, long, long)
        
           */
        
          LocatedBlocks getBlockLocations(String clientMachine, String src,
        
              long offset, long length) throws AccessControlException,
        
              FileNotFoundException, UnresolvedLinkException, IOException {
        
            LocatedBlocks blocks = getBlockLocations(src, offset, length, true, true);
        
            if (blocks != null) {
        
              //sort the blocks
        
              DatanodeDescriptor client = host2DataNodeMap.getDatanodeByHost(
        
                  clientMachine);
        
              for (LocatedBlock b : blocks.getLocatedBlocks()) {
        
                clusterMap.pseudoSortByDistance(client, b.getLocations());
        
        
        
                // Move decommissioned datanodes to the bottom
        
                Arrays.sort(b.getLocations(), DFSUtil.DECOM_COMPARATOR);
        
              }
        
            }
        
            return blocks;
        
          }
        
        

        >

        Show
        John George added a comment - > Matt Foley commented on HDFS-2114 : > ---------------------------------- > > bq. The first use of {isNodeDown} is necessary because {downnode} could be > null in cases when we are checking for "Recommission". > > Yes, but in this case nodes [j] .getName().equals(downnode) will return false > without any problem. Also, I forgot to mention the check ensures that hasdown is not incremented. But, I agree with you that the code does not look as good as it should. I have attached a modified patch with some more changes. > > bq. My understanding is that this assert ensures that the current blk has only > ONE replica that is in decommissioned state. > > But it also seems to ensure that the only replica that is down is the LAST one > in the array of nodes for the block, > due to the way this block works: > > int firstDecomNodeIndex = -1; > DatanodeInfo[] nodes = blk.getLocations(); > for ( int j = 0; j < nodes.length; j++) { // for each replica of blk > ... > if (nodes[j].isDecommissioned()) { > if (firstDecomNodeIndex == -1) { > firstDecomNodeIndex = j; > } > continue ; > } > assertEquals( "Decom node is not at the end" , firstDecomNodeIndex, -1); > } > > Yet I don't see any code (eg in LocatedBlock.getLocations()) where downed > nodes are moved to the end of that list. Maybe somewhere else? Yes I think it checks to ensure that the nodes are moved to the end of that list. The code is in FSNamesystem.java. The last 5 lines in the following code is what does it. /** * Get block locations within the specified range. * @see ClientProtocol#getBlockLocations( String , long , long ) */ LocatedBlocks getBlockLocations( String clientMachine, String src, long offset, long length) throws AccessControlException, FileNotFoundException, UnresolvedLinkException, IOException { LocatedBlocks blocks = getBlockLocations(src, offset, length, true , true ); if (blocks != null ) { //sort the blocks DatanodeDescriptor client = host2DataNodeMap.getDatanodeByHost( clientMachine); for (LocatedBlock b : blocks.getLocatedBlocks()) { clusterMap.pseudoSortByDistance(client, b.getLocations()); // Move decommissioned datanodes to the bottom Arrays.sort(b.getLocations(), DFSUtil.DECOM_COMPARATOR); } } return blocks; } >
        Hide
        Matt Foley added a comment -

        The first use of {isNodeDown} is necessary because {downnode} could be null in cases when we are checking for "Recommission".

        Yes, but in this case nodes[j].getName().equals(downnode) will return false without any problem.

        My understanding is that this assert ensures that the current blk has only ONE replica that is in decommissioned state.

        But it also seems to ensure that the only replica that is down is the LAST one in the array of nodes for the block,
        due to the way this block works:

              int firstDecomNodeIndex = -1;
              DatanodeInfo[] nodes = blk.getLocations();
              for (int j = 0; j < nodes.length; j++) {     // for each replica of blk
                ...
                if (nodes[j].isDecommissioned()) {
                  if (firstDecomNodeIndex == -1) {
                    firstDecomNodeIndex = j;
                  }
                  continue;
                }
                assertEquals("Decom node is not at the end", firstDecomNodeIndex, -1);
              }
        

        Yet I don't see any code (eg in LocatedBlock.getLocations()) where downed nodes are moved to the end of that list. Maybe somewhere else?

        > * And in the same block, why is it important to condition it on
        > isNodeDown, since (!isNodeDown) implies there shouldn't be any
        > decommissioned nodes? So the second use of isNodeDown also seems
        > unnecessary.
        Because, we don't care for this in cases where we there is no node that is down.

        Right, but in this case the assertEquals("Decom node is not at the end", firstDecomNodeIndex, -1) will always pass, because no nodes were decommissioned.

        In summary, the use of the "isNodeDown" flag is only serving to delineate a human-interesting state, but has no impact on the outcome of the code. I would remove it, since the checkFile() method works correctly without this flag when downnode == null or downnode != null.

        Show
        Matt Foley added a comment - The first use of {isNodeDown} is necessary because {downnode} could be null in cases when we are checking for "Recommission". Yes, but in this case nodes [j] .getName().equals(downnode) will return false without any problem. My understanding is that this assert ensures that the current blk has only ONE replica that is in decommissioned state. But it also seems to ensure that the only replica that is down is the LAST one in the array of nodes for the block, due to the way this block works: int firstDecomNodeIndex = -1; DatanodeInfo[] nodes = blk.getLocations(); for ( int j = 0; j < nodes.length; j++) { // for each replica of blk ... if (nodes[j].isDecommissioned()) { if (firstDecomNodeIndex == -1) { firstDecomNodeIndex = j; } continue ; } assertEquals( "Decom node is not at the end" , firstDecomNodeIndex, -1); } Yet I don't see any code (eg in LocatedBlock.getLocations()) where downed nodes are moved to the end of that list. Maybe somewhere else? > * And in the same block, why is it important to condition it on > isNodeDown, since (!isNodeDown) implies there shouldn't be any > decommissioned nodes? So the second use of isNodeDown also seems > unnecessary. Because, we don't care for this in cases where we there is no node that is down. Right, but in this case the assertEquals("Decom node is not at the end", firstDecomNodeIndex, -1) will always pass, because no nodes were decommissioned. In summary, the use of the "isNodeDown" flag is only serving to delineate a human-interesting state, but has no impact on the outcome of the code. I would remove it, since the checkFile() method works correctly without this flag when downnode == null or downnode != null.
        Hide
        John George added a comment -

        Mat, Thanks a lot for reviewing this patch. I will post a newer patch tomorrow with changes to checkFile(). I have tried to integrate your comments.

        > 1. I know you didn't write TestDecommission.checkFile(), but your addition in
        > this patch of the isNodeDown flag raises a number of questions:
        > * Aren't (isNodeDown && nodes[j].getName().equals(downnode)) and
        > (nodes[j].getName().equals(downnode)) always the same? So is the first use of
        > isNodeDown even necessary?

        The first use of

        {isNodeDown}

        is necessary because

        {downnode}

        could be null in cases when we are checking for "Recommission".

        > * Can there be any cases where (nodes[j].getName().equals(downnode)) and
        > (nodes[j].isDecommissioned()) are different? So can't the following two blocks
        > be merged? And the location of the "is decommissioned" log message further
        > confuses this issue.
        >

        >         if (isNodeDown && nodes[j].getName().equals(downnode)) {
        >           hasdown++;
        >           LOG.info("Block " + blk.getBlock() + " replica " + 
        > nodes[j].getName()
        >               + " is decommissioned.");
        >         }
        >         if (nodes[j].isDecommissioned()) {
        >           if (firstDecomNodeIndex == -1) {
        >             firstDecomNodeIndex = j;
        >           }
        >           continue;
        >         }
        > 

        I think you are right. I will change the code to assert if they are not the same.

        > * What is the purpose of the assertion
        >

        >         if(isNodeDown)
        >           assertEquals("Decom node is not at the end", firstDecomNodeIndex, 
        > -1);
        > 

        My understanding is that this assert ensures that the current blk has only ONE replica that is in decommissioned state.

        > And why does it even work, since the node to decommission is chosen at random?
        > * And in the same block, why is it important to condition it on
        > isNodeDown, since (!isNodeDown) implies there shouldn't be any
        > decommissioned nodes? So the second use of isNodeDown also seems
        > unnecessary.
        Because, we don't care for this in cases where we there is no node that is down.

        >
        > 2. Regarding the timeout: In TestDecommission, you appropriately set
        > BLOCKREPORT_INTERVAL_MSEC down to 1 sec to match the HEARTBEAT_INTERVAL. You
        > may also want to consider DFS_NAMENODE_REPLICATION_INTERVAL_KEY (default 3
        > sec). Adjusting this value to 1 might allow testRecommission() to run in 5
        > sec instead of 10.
        >
        Will try this.

        > 3. I wish there were a clear way to share the duplicated code between
        > testDecommission and testRecommission, but I couldn't see an obviously better
        > choice. Can you think of a way to improve this?
        ok

        Show
        John George added a comment - Mat, Thanks a lot for reviewing this patch. I will post a newer patch tomorrow with changes to checkFile(). I have tried to integrate your comments. > 1. I know you didn't write TestDecommission.checkFile(), but your addition in > this patch of the isNodeDown flag raises a number of questions: > * Aren't (isNodeDown && nodes [j] .getName().equals(downnode)) and > (nodes [j] .getName().equals(downnode)) always the same? So is the first use of > isNodeDown even necessary? The first use of {isNodeDown} is necessary because {downnode} could be null in cases when we are checking for "Recommission". > * Can there be any cases where (nodes [j] .getName().equals(downnode)) and > (nodes [j] .isDecommissioned()) are different? So can't the following two blocks > be merged? And the location of the "is decommissioned" log message further > confuses this issue. > > if (isNodeDown && nodes[j].getName().equals(downnode)) { > hasdown++; > LOG.info( "Block " + blk.getBlock() + " replica " + > nodes[j].getName() > + " is decommissioned." ); > } > if (nodes[j].isDecommissioned()) { > if (firstDecomNodeIndex == -1) { > firstDecomNodeIndex = j; > } > continue ; > } > I think you are right. I will change the code to assert if they are not the same. > * What is the purpose of the assertion > > if (isNodeDown) > assertEquals( "Decom node is not at the end" , firstDecomNodeIndex, > -1); > My understanding is that this assert ensures that the current blk has only ONE replica that is in decommissioned state. > And why does it even work, since the node to decommission is chosen at random? > * And in the same block, why is it important to condition it on > isNodeDown , since (!isNodeDown) implies there shouldn't be any > decommissioned nodes? So the second use of isNodeDown also seems > unnecessary. Because, we don't care for this in cases where we there is no node that is down. > > 2. Regarding the timeout: In TestDecommission, you appropriately set > BLOCKREPORT_INTERVAL_MSEC down to 1 sec to match the HEARTBEAT_INTERVAL. You > may also want to consider DFS_NAMENODE_REPLICATION_INTERVAL_KEY (default 3 > sec). Adjusting this value to 1 might allow testRecommission() to run in 5 > sec instead of 10. > Will try this. > 3. I wish there were a clear way to share the duplicated code between > testDecommission and testRecommission, but I couldn't see an obviously better > choice. Can you think of a way to improve this? ok
        Hide
        Matt Foley added a comment -

        Hi John,
        TestDecommission.java:

        1. I know you didn't write TestDecommission.checkFile(), but your addition in this patch of the isNodeDown flag raises a number of questions:

        • Aren't (isNodeDown && nodes[j].getName().equals(downnode)) and (nodes[j].getName().equals(downnode)) always the same? So is the first use of isNodeDown even necessary?
        • Can there be any cases where (nodes[j].getName().equals(downnode)) and (nodes[j].isDecommissioned()) are different? So can't the following two blocks be merged? And the location of the "is decommissioned" log message further confuses this issue.
                  if (isNodeDown && nodes[j].getName().equals(downnode)) {
                    hasdown++;
                    LOG.info("Block " + blk.getBlock() + " replica " + nodes[j].getName()
                        + " is decommissioned.");
                  }
                  if (nodes[j].isDecommissioned()) {
                    if (firstDecomNodeIndex == -1) {
                      firstDecomNodeIndex = j;
                    }
                    continue;
                  }
          
        • What is the purpose of the assertion
                  if(isNodeDown)
                    assertEquals("Decom node is not at the end", firstDecomNodeIndex, -1);
          

          And why does it even work, since the node to decommission is chosen at random?

        • And in the same block, why is it important to condition it on isNodeDown, since (!isNodeDown) implies there shouldn't be any decommissioned nodes? So the second use of isNodeDown also seems unnecessary.

        2. Regarding the timeout: In TestDecommission, you appropriately set BLOCKREPORT_INTERVAL_MSEC down to 1 sec to match the HEARTBEAT_INTERVAL. You may also want to consider DFS_NAMENODE_REPLICATION_INTERVAL_KEY (default 3 sec). Adjusting this value to 1 might allow testRecommission() to run in 5 sec instead of 10.

        3. I wish there were a clear way to share the duplicated code between testDecommission and testRecommission, but I couldn't see an obviously better choice. Can you think of a way to improve this?

        FSNamesystem and BlockManager:

        The implementation looks okay to me.

        The failure of unit test TestTrash.testTrashEmptier does not seem to be related to this change. It is probably related to HDFS-7326, although the symptoms reported are slightly different.

        Show
        Matt Foley added a comment - Hi John, TestDecommission.java: 1. I know you didn't write TestDecommission.checkFile(), but your addition in this patch of the isNodeDown flag raises a number of questions: Aren't (isNodeDown && nodes [j] .getName().equals(downnode)) and (nodes [j] .getName().equals(downnode)) always the same? So is the first use of isNodeDown even necessary? Can there be any cases where (nodes [j] .getName().equals(downnode)) and (nodes [j] .isDecommissioned()) are different? So can't the following two blocks be merged? And the location of the "is decommissioned" log message further confuses this issue. if (isNodeDown && nodes[j].getName().equals(downnode)) { hasdown++; LOG.info( "Block " + blk.getBlock() + " replica " + nodes[j].getName() + " is decommissioned." ); } if (nodes[j].isDecommissioned()) { if (firstDecomNodeIndex == -1) { firstDecomNodeIndex = j; } continue ; } What is the purpose of the assertion if (isNodeDown) assertEquals( "Decom node is not at the end" , firstDecomNodeIndex, -1); And why does it even work, since the node to decommission is chosen at random? And in the same block, why is it important to condition it on isNodeDown , since (!isNodeDown) implies there shouldn't be any decommissioned nodes? So the second use of isNodeDown also seems unnecessary. 2. Regarding the timeout: In TestDecommission, you appropriately set BLOCKREPORT_INTERVAL_MSEC down to 1 sec to match the HEARTBEAT_INTERVAL. You may also want to consider DFS_NAMENODE_REPLICATION_INTERVAL_KEY (default 3 sec). Adjusting this value to 1 might allow testRecommission() to run in 5 sec instead of 10. 3. I wish there were a clear way to share the duplicated code between testDecommission and testRecommission, but I couldn't see an obviously better choice. Can you think of a way to improve this? FSNamesystem and BlockManager: The implementation looks okay to me. The failure of unit test TestTrash.testTrashEmptier does not seem to be related to this change. It is probably related to HDFS-7326, although the symptoms reported are slightly different.
        Hide
        John George added a comment -

        Can someone please review this patch?

        Show
        John George added a comment - Can someone please review this patch?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12485596/HDFS-2114-3.patch
        against trunk revision 1143147.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.hdfs.TestHDFSTrash

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/893//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/893//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/893//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485596/HDFS-2114-3.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/893//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/893//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/893//console This message is automatically generated.
        Hide
        John George added a comment -

        added javadoc and increased test timeout.

        Show
        John George added a comment - added javadoc and increased test timeout.
        Hide
        John George added a comment -

        Yes, it is called in the context of writeLock(). I believe it needs to be since
        chooseExcessReplicates() is called which requires the writeLock to be held.

        Show
        John George added a comment - Yes, it is called in the context of writeLock(). I believe it needs to be since chooseExcessReplicates() is called which requires the writeLock to be held.
        Hide
        Matt Foley added a comment -

        Is processOverReplicatedBlocksOnReCommission() called in the context of the writeLock? Does it need to be? Thanks.

        Show
        Matt Foley added a comment - Is processOverReplicatedBlocksOnReCommission() called in the context of the writeLock? Does it need to be? Thanks.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12484659/HDFS-2114-2.patch
        against trunk revision 1140939.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/867//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/867//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/867//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12484659/HDFS-2114-2.patch against trunk revision 1140939. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/867//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/867//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/867//console This message is automatically generated.
        Hide
        John George added a comment -

        When stop decommission is called, it checks to make sure that the excess replicas are identified and put in the right queue for deletion.

        Show
        John George added a comment - When stop decommission is called, it checks to make sure that the excess replicas are identified and put in the right queue for deletion.
        Hide
        John George added a comment -

        patch based on trunk-latest

        Show
        John George added a comment - patch based on trunk-latest

          People

          • Assignee:
            John George
            Reporter:
            John George
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development