Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 1.2.0, 2.0.3-alpha
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      1. Make stale timeout adaptive to the number of nodes marked stale in the cluster.
      2. Consider having a separate configuration for write skipping the stale nodes.

      1. HDFS-3912.branch-1.patch
        44 kB
        Suresh Srinivas
      2. HDFS-3912-branch-1.patch
        46 kB
        Jing Zhao
      3. HDFS-3912-010.patch
        63 kB
        Jing Zhao
      4. HDFS-3912-branch-1.1-001.patch
        45 kB
        Jing Zhao
      5. HDFS-3912.009.patch
        63 kB
        Jing Zhao
      6. HDFS-3912.008.patch
        61 kB
        Jing Zhao
      7. HDFS-3912.007.patch
        61 kB
        Jing Zhao
      8. HDFS-3912.006.patch
        57 kB
        Jing Zhao
      9. HDFS-3912.005.patch
        59 kB
        Jing Zhao
      10. HDFS-3912.004.patch
        64 kB
        Jing Zhao
      11. HDFS-3912.003.patch
        68 kB
        Jing Zhao
      12. HDFS-3912.002.patch
        65 kB
        Jing Zhao
      13. HDFS-3912.001.patch
        105 kB
        Jing Zhao

        Issue Links

          Activity

          Hide
          Jing Zhao added a comment -

          Suresh's comments in HDFS-3703:

          However for the write site, not picking the stale node could result in an issue, especially for small clusters. That is the reason why I think we should do the write side changes in a related jira. We should consider making stale timeout adaptive to the number of nodes marked stale in the cluster as discussed in the previous comments. Additionally we should consider having a separate configuration for write skipping the stale nodes.

          The more detailed proposal for handling write is:
          For writes do not use stale datanodes (if possible). To avoid the scenario where a small T for judging stale state may generate new hotspots on cluster, T is proposed to be calculated as:
          T = t_c + (number of nodes already marked as stale) / (total number of nodes) * (T_d - t_c),
          where t_c is a constant value initially set in the configuration, and T_d is the time for marking as dead (i.e., 10.5 min).

          E.g., t_c can be set as 30s, then when there is no or few nodes marked as stale, we can have a small T to satisfy the HBase requirement. In case that there are large number nodes marked as stale, e.g., near the total number of nodes, T will be almost T_d (i.e., ~10min), and the workload can still be distributed to all the nodes alive.

          When almost all nodes are marked as stale, include stale nodes as writing target candidates when the number of remaining normal alive nodes is less than the replica number.

          Show
          Jing Zhao added a comment - Suresh's comments in HDFS-3703 : However for the write site, not picking the stale node could result in an issue, especially for small clusters. That is the reason why I think we should do the write side changes in a related jira. We should consider making stale timeout adaptive to the number of nodes marked stale in the cluster as discussed in the previous comments. Additionally we should consider having a separate configuration for write skipping the stale nodes. The more detailed proposal for handling write is: For writes do not use stale datanodes (if possible). To avoid the scenario where a small T for judging stale state may generate new hotspots on cluster, T is proposed to be calculated as: T = t_c + (number of nodes already marked as stale) / (total number of nodes) * (T_d - t_c), where t_c is a constant value initially set in the configuration, and T_d is the time for marking as dead (i.e., 10.5 min). E.g., t_c can be set as 30s, then when there is no or few nodes marked as stale, we can have a small T to satisfy the HBase requirement. In case that there are large number nodes marked as stale, e.g., near the total number of nodes, T will be almost T_d (i.e., ~10min), and the workload can still be distributed to all the nodes alive. When almost all nodes are marked as stale, include stale nodes as writing target candidates when the number of remaining normal alive nodes is less than the replica number.
          Hide
          Jing Zhao added a comment -

          Move and summarize part of the comments from HDFS-3703 here to highlight the existing thoughts on writing part.

          Show
          Jing Zhao added a comment - Move and summarize part of the comments from HDFS-3703 here to highlight the existing thoughts on writing part.
          Hide
          Nicolas Liochon added a comment -

          Some thinking, with an HBase bias:

          • if the datanode is too busy and cannot heartbeat in a minute, we will also get timeouts when writing the blocks (if the datanode is dead: 20s connect timeout. If it's not dead, or if we had previously a connection, we will fail on the read timeout for the ack, it's around 1 minute by default).
          • the recovery is on the critical path, so going to a suspicious node is not something you want to do.
          • things are already quite complicated, so I think I would end up with the same value for read & write to keep them simple.

          Then there is the case when many nodes are staled. I think we're in a really bad shape at this stage... I feel that just throwing an exception is the best solution. HBase would wait a few seconds and retry. That's better for the cluster than trying a node that is unlikely to execute the write. But it's a kind of change vs. today's behavior.

          To synthesis, this could make sense imho:

          • there are enough fully alive nodes: let's use them, whatever the number of stale nodes.
          • there are not enough fully alive nodes, but there are some stale nodes that we could use: let's use the stale nodes them, at least the behavior will be backward compatible.
          • there are not enough live node: as today.
          Show
          Nicolas Liochon added a comment - Some thinking, with an HBase bias: if the datanode is too busy and cannot heartbeat in a minute, we will also get timeouts when writing the blocks (if the datanode is dead: 20s connect timeout. If it's not dead, or if we had previously a connection, we will fail on the read timeout for the ack, it's around 1 minute by default). the recovery is on the critical path, so going to a suspicious node is not something you want to do. things are already quite complicated, so I think I would end up with the same value for read & write to keep them simple. Then there is the case when many nodes are staled. I think we're in a really bad shape at this stage... I feel that just throwing an exception is the best solution. HBase would wait a few seconds and retry. That's better for the cluster than trying a node that is unlikely to execute the write. But it's a kind of change vs. today's behavior. To synthesis, this could make sense imho: there are enough fully alive nodes: let's use them, whatever the number of stale nodes. there are not enough fully alive nodes, but there are some stale nodes that we could use: let's use the stale nodes them, at least the behavior will be backward compatible. there are not enough live node: as today.
          Hide
          Nicolas Liochon added a comment -

          Hi Jing,

          Are you working on it currently? I would like to try HDFS-3703 branch 1.1 on HBase, but I need as well the write path: without most of the time is spent on the write errors...

          Thanks!

          Show
          Nicolas Liochon added a comment - Hi Jing, Are you working on it currently? I would like to try HDFS-3703 branch 1.1 on HBase, but I need as well the write path: without most of the time is spent on the write errors... Thanks!
          Hide
          Jing Zhao added a comment -

          Hi Nicalos, I'm currently working on this. Will post something today.

          Show
          Jing Zhao added a comment - Hi Nicalos, I'm currently working on this. Will post something today.
          Hide
          Nicolas Liochon added a comment -

          Great! So I will test it beginning of next week then. Thanks a lot Jing.

          Show
          Nicolas Liochon added a comment - Great! So I will test it beginning of next week then. Thanks a lot Jing.
          Hide
          Jing Zhao added a comment -

          Nicalos:

          So based on your prior comments, we rethink the strategy that dynamically changes the stale interval for writing. One problem for this strategy is that after a datanode is marked as stale, since the stale interval may increase as a result of the increase of the number of the stale datanodes, the same datanode may be marked as healthy (i.e., non-stale) at once.

          In the current solution, we try to provide a simpler solution. The stale interval now is a fixed value after loading from the configuration. For read, the strategy is the same with HDFS-3703. And for write, we add a switch flag (only for write) so that when certain proportion of datanodes are marked as stale, the stale datanodes can also be included as writing targets. Users can specify this proportion through configuration. For example, if the proportion is set to 0.5 in the beginning, when more than half of the datanodes have been marked as stale in the cluster, we stop avoiding stale nodes for writing. And when some of the datanodes come back, we continue avoiding stale nodes for writing.

          Show
          Jing Zhao added a comment - Nicalos: So based on your prior comments, we rethink the strategy that dynamically changes the stale interval for writing. One problem for this strategy is that after a datanode is marked as stale, since the stale interval may increase as a result of the increase of the number of the stale datanodes, the same datanode may be marked as healthy (i.e., non-stale) at once. In the current solution, we try to provide a simpler solution. The stale interval now is a fixed value after loading from the configuration. For read, the strategy is the same with HDFS-3703 . And for write, we add a switch flag (only for write) so that when certain proportion of datanodes are marked as stale, the stale datanodes can also be included as writing targets. Users can specify this proportion through configuration. For example, if the proportion is set to 0.5 in the beginning, when more than half of the datanodes have been marked as stale in the cluster, we stop avoiding stale nodes for writing. And when some of the datanodes come back, we continue avoiding stale nodes for writing.
          Hide
          Jing Zhao added a comment -

          And some initial patch for the simpler solution.

          Show
          Jing Zhao added a comment - And some initial patch for the simpler solution.
          Hide
          Jing Zhao added a comment -

          Some cleanup for the patch.

          Show
          Jing Zhao added a comment - Some cleanup for the patch.
          Hide
          Jing Zhao added a comment -

          Moved stalenode-related information from FSNameSystem back to DatanodeManager.

          Show
          Jing Zhao added a comment - Moved stalenode-related information from FSNameSystem back to DatanodeManager.
          Hide
          Jing Zhao added a comment -

          Removed redundant test cases and correct part of the comments in the test.

          Show
          Jing Zhao added a comment - Removed redundant test cases and correct part of the comments in the test.
          Hide
          Nicolas Liochon added a comment -

          I like this approach, it's deterministic.
          I had issues trying branch 1.1 on HBase 0.96. Some (hbase) unit tests were not working with this branch. I was lacking time to understand why, but I will have a look again later (hopefully it will get fixed by just waiting...)

          Show
          Nicolas Liochon added a comment - I like this approach, it's deterministic. I had issues trying branch 1.1 on HBase 0.96. Some (hbase) unit tests were not working with this branch. I was lacking time to understand why, but I will have a look again later (hopefully it will get fixed by just waiting...)
          Hide
          Suresh Srinivas added a comment -

          nicholas, did you mean to assign this to yourself?

          Show
          Suresh Srinivas added a comment - nicholas, did you mean to assign this to yourself?
          Hide
          Suresh Srinivas added a comment -
          1. Remove HeartbeatManager#checkStaleNodes and use DatanodeManager#checkStaleNodes instead
          2. What happens when ratio is configured invalid?
          3. when calculating the ration in HeatbeatManager, you are accessing datanodes.size() outside synchronization block.
          4. Can we introduce a method in FSClusterStats to provide the cluster state of whether it is avoiding writes to stale nodes and avoid having to add DatanodeManager into BlockPlacementPolicy. This way, customer placemet policy implementations are not affected.
          5. I think we should create a separte jira to move some relevant methods such as getLiveNodes, stale nodes etc into DatanodeStatics interface.
          6. We should also add metrics related to stale datanodes.
          Show
          Suresh Srinivas added a comment - Remove HeartbeatManager#checkStaleNodes and use DatanodeManager#checkStaleNodes instead What happens when ratio is configured invalid? when calculating the ration in HeatbeatManager, you are accessing datanodes.size() outside synchronization block. Can we introduce a method in FSClusterStats to provide the cluster state of whether it is avoiding writes to stale nodes and avoid having to add DatanodeManager into BlockPlacementPolicy. This way, customer placemet policy implementations are not affected. I think we should create a separte jira to move some relevant methods such as getLiveNodes, stale nodes etc into DatanodeStatics interface. We should also add metrics related to stale datanodes.
          Hide
          Jing Zhao added a comment -

          Thanks for the comments Suresh! I've addressed most of the comments. I will create separate jiras for DatanodeStatics and metrics issues as well.

          Show
          Jing Zhao added a comment - Thanks for the comments Suresh! I've addressed most of the comments. I will create separate jiras for DatanodeStatics and metrics issues as well.
          Hide
          Nicolas Liochon added a comment -

          @Suresh aaa
          I was echoing my message from the 21th: I had issues (not yet analyzed) with branch 1.1 on HBase, but I definitively want to try Jing's patch, so I will give it another try later.

          Show
          Nicolas Liochon added a comment - @Suresh aaa I was echoing my message from the 21th: I had issues (not yet analyzed) with branch 1.1 on HBase, but I definitively want to try Jing's patch, so I will give it another try later.
          Hide
          Devaraj Das added a comment -

          I had issues trying branch 1.1 on HBase 0.96. Some (hbase) unit tests were not working with this branch. I was lacking time to understand why, but I will have a look again later (hopefully it will get fixed by just waiting...)

          Hey Nicolas, can you please enumerate the failing tests?

          Show
          Devaraj Das added a comment - I had issues trying branch 1.1 on HBase 0.96. Some (hbase) unit tests were not working with this branch. I was lacking time to understand why, but I will have a look again later (hopefully it will get fixed by just waiting...) Hey Nicolas, can you please enumerate the failing tests?
          Hide
          Nicolas Liochon added a comment -

          I haven't kept the log files, but it was in the small test categories (the ones executed first when you do a mvn test in HBase).

          Show
          Nicolas Liochon added a comment - I haven't kept the log files, but it was in the small test categories (the ones executed first when you do a mvn test in HBase).
          Hide
          Nicolas Liochon added a comment -

          Actually it seems it's HBASE-6928; so it's not related to the branch-1.1, I've just been unlucky when I've compared the test results on branch 1.1 vs. branch 1.0...

          Show
          Nicolas Liochon added a comment - Actually it seems it's HBASE-6928 ; so it's not related to the branch-1.1, I've just been unlucky when I've compared the test results on branch 1.1 vs. branch 1.0...
          Hide
          Jing Zhao added a comment -

          Upload the patch with minor updates.

          Show
          Jing Zhao added a comment - Upload the patch with minor updates.
          Hide
          Suresh Srinivas added a comment -

          Patch looks good. Nicolas, I will wait for HBase validation and your +1 to commit this patch.

          Show
          Suresh Srinivas added a comment - Patch looks good. Nicolas, I will wait for HBase validation and your +1 to commit this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547883/HDFS-3912.006.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3271//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3271//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547883/HDFS-3912.006.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3271//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3271//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          It will try the patch on HBase 0.96 next week (hopefully).
          I had a look at the patch, it seems ok to me. Only point is this one:

          +      LOG.warn("The given interval for marking stale datanode = "
          +          + staleInterval + ", which is smaller than the default value "
          +          + DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT
          +          + ".");
          

          I think we should not have a warning if we're below the default, because:

          • usually the default are just "the most common harmless setting", i.e. it's should be possible to go below it without being in danger.
          • a reasonable setting for HBase would be around 20s (so less than the hdfs default), to be sure that the datanode is not used when we start the HBase recovery. So when used with HBase we will have a warning when using the recommended setting.
          Show
          Nicolas Liochon added a comment - It will try the patch on HBase 0.96 next week (hopefully). I had a look at the patch, it seems ok to me. Only point is this one: + LOG.warn( "The given interval for marking stale datanode = " + + staleInterval + ", which is smaller than the default value " + + DFSConfigKeys.DFS_NAMENODE_STALE_DATANODE_INTERVAL_DEFAULT + + "." ); I think we should not have a warning if we're below the default, because: usually the default are just "the most common harmless setting", i.e. it's should be possible to go below it without being in danger. a reasonable setting for HBase would be around 20s (so less than the hdfs default), to be sure that the datanode is not used when we start the HBase recovery. So when used with HBase we will have a warning when using the recommended setting.
          Hide
          Jing Zhao added a comment -

          The DataNode#heartbeatsDisabledForTests should be declared as volatile, and for new test cases in TestReplicaitonPolicy, instead of waiting, I explicitly call the heartbeatCheck() method.

          Show
          Jing Zhao added a comment - The DataNode#heartbeatsDisabledForTests should be declared as volatile, and for new test cases in TestReplicaitonPolicy, instead of waiting, I explicitly call the heartbeatCheck() method.
          Hide
          Jing Zhao added a comment -

          Addressed Nicolas's comments. Now we check if the stale interval is positive instead of the original warning msg.

          Show
          Jing Zhao added a comment - Addressed Nicolas's comments. Now we check if the stale interval is positive instead of the original warning msg.
          Hide
          Suresh Srinivas added a comment -

          I think we should not have a warning if we're below the default, because:

          I think we should have a warning because some one could set this to way smaller value than what even HBase could be setup with. That said, we could print the warning if the stale period is, say, 3 times the hearbeat period. Also we need to also document in the hdfs-default.xml the pros and cons of the stale period choices.

          Show
          Suresh Srinivas added a comment - I think we should not have a warning if we're below the default, because: I think we should have a warning because some one could set this to way smaller value than what even HBase could be setup with. That said, we could print the warning if the stale period is, say, 3 times the hearbeat period. Also we need to also document in the hdfs-default.xml the pros and cons of the stale period choices.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548005/HDFS-3912.007.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3273//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3273//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548005/HDFS-3912.007.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3273//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3273//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548007/HDFS-3912.008.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          org.apache.hadoop.hdfs.TestPersistBlocks

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3274//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3274//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548007/HDFS-3912.008.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3274//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3274//console This message is automatically generated.
          Hide
          Jing Zhao added a comment -

          Updated based on Suresh's comments.

          Show
          Jing Zhao added a comment - Updated based on Suresh's comments.
          Hide
          Nicolas Liochon added a comment -

          Hi,

          I'm ok with the new logic for the warning, 3 times the heartbeat is a quite common rule. I would like to test the patch on branch 1.1, it defers quite a lot from branch 3.0 regarding to block placement policy and so on. Jing, do you want to do the port? If you don't have time I will do it. I've already tested HBase trunk with branch 1.1 without the patch, it works (but with write errors as it is sent on the dead box).

          Thanks!

          Nicolas

          Show
          Nicolas Liochon added a comment - Hi, I'm ok with the new logic for the warning, 3 times the heartbeat is a quite common rule. I would like to test the patch on branch 1.1, it defers quite a lot from branch 3.0 regarding to block placement policy and so on. Jing, do you want to do the port? If you don't have time I will do it. I've already tested HBase trunk with branch 1.1 without the patch, it works (but with write errors as it is sent on the dead box). Thanks! Nicolas
          Hide
          Jing Zhao added a comment -

          Hi Nicolas,
          I will work on the branch 1.1 patch. Hopefully I can upload the patch today or tomorrow.
          Thanks,
          -Jing

          Show
          Jing Zhao added a comment - Hi Nicolas, I will work on the branch 1.1 patch. Hopefully I can upload the patch today or tomorrow. Thanks, -Jing
          Hide
          Jing Zhao added a comment -

          Patch for branch 1.1. Also did some cleanup for the test code in the patch for trunk.

          Show
          Jing Zhao added a comment - Patch for branch 1.1. Also did some cleanup for the test code in the patch for trunk.
          Hide
          Jing Zhao added a comment -

          For the 1.1 patch, I've run local tests and all the testcases passed.

          Show
          Jing Zhao added a comment - For the 1.1 patch, I've run local tests and all the testcases passed.
          Hide
          Suresh Srinivas added a comment -

          Nicolas, when you get some time can you please give 1.x version of the patch a go?

          Show
          Suresh Srinivas added a comment - Nicolas, when you get some time can you please give 1.x version of the patch a go?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12548684/HDFS-3912-010.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3308//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3308//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548684/HDFS-3912-010.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3308//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3308//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          @Suresh, @Jing, Thanks a lot for doing the backport! I'm giving it a try today. With the jet lag, if everything goes well you will have the result when you wake up

          Show
          Nicolas Liochon added a comment - @Suresh, @Jing, Thanks a lot for doing the backport! I'm giving it a try today. With the jet lag, if everything goes well you will have the result when you wake up
          Hide
          Nicolas Liochon added a comment -

          Very good news: It works as expected . I don't have anymore write or read errors / timeouts during the HBase recovery. So we can now have a mttr under the minute in HBase.

          Show
          Nicolas Liochon added a comment - Very good news: It works as expected . I don't have anymore write or read errors / timeouts during the HBase recovery. So we can now have a mttr under the minute in HBase.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2909 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2909/)
          HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2909 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2909/ ) HDFS-3912 . Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2847 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2847/)
          HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2847 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2847/ ) HDFS-3912 . Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to trunk and branch-2. I will review the branch-1 patch soon.

          Show
          Suresh Srinivas added a comment - I committed the patch to trunk and branch-2. I will review the branch-1 patch soon.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2872 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2872/)
          HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2872 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2872/ ) HDFS-3912 . Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1193 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1193/)
          HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1193 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1193/ ) HDFS-3912 . Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1224 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1224/)
          HDFS-3912. Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1224 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1224/ ) HDFS-3912 . Detect and avoid stale datanodes for writes. Contributed by Jing Zhao (Revision 1397211) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1397211 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HeartbeatManager.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java
          Hide
          Suresh Srinivas added a comment -

          Jing, branch-1 patch does not cleanly apply. Can you please upload a new patch.

          Show
          Suresh Srinivas added a comment - Jing, branch-1 patch does not cleanly apply. Can you please upload a new patch.
          Hide
          Jing Zhao added a comment -

          The patch for branch-1.

          Show
          Jing Zhao added a comment - The patch for branch-1.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549392/HDFS-3912-branch-1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3348//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549392/HDFS-3912-branch-1.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3348//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          Typo in the patch: ecxludedNodes. This also needs to be fixed in trunk.

          With that change +1 for the patch.

          Show
          Suresh Srinivas added a comment - Typo in the patch: ecxludedNodes. This also needs to be fixed in trunk. With that change +1 for the patch.
          Hide
          Suresh Srinivas added a comment -

          Canceling patch to prevent Jenkins from running builds for branch-1 patches.

          Show
          Suresh Srinivas added a comment - Canceling patch to prevent Jenkins from running builds for branch-1 patches.
          Hide
          Suresh Srinivas added a comment -

          Fixed the typo

          Show
          Suresh Srinivas added a comment - Fixed the typo
          Hide
          Suresh Srinivas added a comment -

          I committed the patch to branch-1 as well. Thank you Jing.

          Show
          Suresh Srinivas added a comment - I committed the patch to branch-1 as well. Thank you Jing.
          Hide
          Jeremy Carroll added a comment -

          FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, etc..

          Show
          Jeremy Carroll added a comment - FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, etc..
          Hide
          Jeremy Carroll added a comment -

          Basically this patch requires HDFS-3601 (Version 3.0). So there is no Branch 2.0 patch on the ticket.

          Show
          Jeremy Carroll added a comment - Basically this patch requires HDFS-3601 (Version 3.0). So there is no Branch 2.0 patch on the ticket.
          Hide
          Nicolas Liochon added a comment -

          Are you sure? It's committed in branch-1?

          Show
          Nicolas Liochon added a comment - Are you sure? It's committed in branch-1?
          Hide
          Harsh J added a comment -

          FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, etc..

          The diff may be dependent on the JIRA you mention, but perhaps not the patch itself. We merged the trunk commit directly into branch-2, as viewable/downloadable here: view at http://svn.apache.org/viewvc?view=revision&revision=1397219 and download at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java?revision=1397219&view=co

          If you use git locally, you can also add a remote and cherry-pick it out I guess.

          Show
          Harsh J added a comment - FYI: This patch is missing the branch-2 patch. After applying HDFS-3703 for branch-2, it's missing the DFS_NAMENODE_CHECK_STALE_DATANODE_DEFAULT settings, etc.. The diff may be dependent on the JIRA you mention, but perhaps not the patch itself. We merged the trunk commit directly into branch-2, as viewable/downloadable here: view at http://svn.apache.org/viewvc?view=revision&revision=1397219 and download at http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java?revision=1397219&view=co If you use git locally, you can also add a remote and cherry-pick it out I guess.
          Hide
          Harsh J added a comment -

          Are you sure? It's committed in branch-1?

          Yes, branch-1 has this as a backport commit, whose different patch is attached as well.

          Show
          Harsh J added a comment - Are you sure? It's committed in branch-1? Yes, branch-1 has this as a backport commit, whose different patch is attached as well.

            People

            • Assignee:
              Jing Zhao
              Reporter:
              Jing Zhao
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development