Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.5.0
    • Fix Version/s: 2.8.0, 2.7.3, 2.6.4, 3.0.0-alpha1
    • Component/s: ipc
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      This fix includes public method interface change.
      A follow-up JIRA issue for this incompatibility for branch-2.7 is HADOOP-13579.
      Show
      This fix includes public method interface change. A follow-up JIRA issue for this incompatibility for branch-2.7 is HADOOP-13579 .

      Description

      The RPC client has a default timeout set to 0 when no timeout is passed in. This means that the network connection created will not timeout when used to write data. The issue has shown in YARN-2578 and HDFS-4858. Timeouts for writes then fall back to the tcp level retry (configured via tcp_retries2) and timeouts between the 15-30 minutes. Which is too long for a default behaviour.

      Using 0 as the default value for timeout is incorrect. We should use a sane value for the timeout and the "ipc.ping.interval" configuration value is a logical choice for it. The default behaviour should be changed from 0 to the value read for the ping interval from the Configuration.

      Fixing it in common makes more sense than finding and changing all other points in the code that do not pass in a timeout.

      Offending code lines:
      https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L488
      and
      https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L350

      1. HADOOP-11252.002.patch
        10 kB
        Masatake Iwasaki
      2. HADOOP-11252.003.patch
        9 kB
        Masatake Iwasaki
      3. HADOOP-11252.004.patch
        9 kB
        Masatake Iwasaki
      4. HADOOP-11252.patch
        5 kB
        Wilfred Spiegelenburg

        Issue Links

          Activity

          Hide
          mingma Ming Ma added a comment -

          Agree with Wilfred that we should fix it in hadoop-common. YARN-2714 is another example. While HDFS-4858 has addressed the issue between DN and NN, we still have issue between client and NN.

          1. Have a client machine make a RPC request to NN.
          2. Power off NN while NN is processing the RPC.
          3. The client machine can detect connection issue in around 16 minutes.

          Show
          mingma Ming Ma added a comment - Agree with Wilfred that we should fix it in hadoop-common. YARN-2714 is another example. While HDFS-4858 has addressed the issue between DN and NN, we still have issue between client and NN. 1. Have a client machine make a RPC request to NN. 2. Power off NN while NN is processing the RPC. 3. The client machine can detect connection issue in around 16 minutes.
          Hide
          kkambatl Karthik Kambatla (Inactive) added a comment -

          Can we make sure this patch undoes the changes in HDFS-4858, as we will be fixing in common? Thanks.

          Show
          kkambatl Karthik Kambatla (Inactive) added a comment - Can we make sure this patch undoes the changes in HDFS-4858 , as we will be fixing in common? Thanks.
          Hide
          kkambatl Karthik Kambatla (Inactive) added a comment -

          I would like to get include this in 2.6 if we finish this before the first RC.

          Show
          kkambatl Karthik Kambatla (Inactive) added a comment - I would like to get include this in 2.6 if we finish this before the first RC.
          Hide
          daryn Daryn Sharp added a comment -

          I agree that an optional write timeout is good, but I don't agree that ipc.ping.interval should be reused. It's for detecting broken connections or timing out when there are outstanding calls. There's an existing ipc.client.connect.timeout key so ipc.client.write.timeout would be a logical choice.

          I understand this change is for reducing failover latency with config-based HA. But it adds fast-fail in cases where it's not desired. Ex. If the NN is in GC, the last thing you want is for clients to repeatedly timeout & reconnect, overflow the listen queue, etc.

          During a network cut of both NNs, clients may burn through their retries prematurely or exponentially fall back too far. Or with IP-failover based HA, you want the clients to wait. When the standby assumes the IP, the connections break, and the clients reconnect.

          Whether you set the write timeout is based on if you favor jobs succeeding at all reasonable costs, or you want fast-fail that many apps won't handle well.

          Show
          daryn Daryn Sharp added a comment - I agree that an optional write timeout is good, but I don't agree that ipc.ping.interval should be reused. It's for detecting broken connections or timing out when there are outstanding calls. There's an existing ipc.client.connect.timeout key so ipc.client.write.timeout would be a logical choice. I understand this change is for reducing failover latency with config-based HA. But it adds fast-fail in cases where it's not desired. Ex. If the NN is in GC, the last thing you want is for clients to repeatedly timeout & reconnect, overflow the listen queue, etc. During a network cut of both NNs, clients may burn through their retries prematurely or exponentially fall back too far. Or with IP-failover based HA, you want the clients to wait. When the standby assumes the IP, the connections break, and the clients reconnect. Whether you set the write timeout is based on if you favor jobs succeeding at all reasonable costs, or you want fast-fail that many apps won't handle well.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I agree that an optional write timeout is good, but I don't agree that ipc.ping.interval should be reused. It's for detecting broken connections or timing out when there are outstanding calls. There's an existing ipc.client.connect.timeout key so ipc.client.write.timeout would be a logical choice.

          +1. I like the idea of adding ipc.client.write.timeout. Also agree that there may be some users who want to disable the timeout (i.e. set it to 0).

          It would be nice if we could specify that this timeout is in milliseconds (i.e. name it ipc.client.write.timeout.ms). The older timeouts are all unspecified, which is confusing since some of them are seconds, others milliseconds.

          Show
          cmccabe Colin P. McCabe added a comment - I agree that an optional write timeout is good, but I don't agree that ipc.ping.interval should be reused. It's for detecting broken connections or timing out when there are outstanding calls. There's an existing ipc.client.connect.timeout key so ipc.client.write.timeout would be a logical choice. +1. I like the idea of adding ipc.client.write.timeout . Also agree that there may be some users who want to disable the timeout (i.e. set it to 0). It would be nice if we could specify that this timeout is in milliseconds (i.e. name it ipc.client.write.timeout.ms ). The older timeouts are all unspecified, which is confusing since some of them are seconds, others milliseconds.
          Hide
          mingma Ming Ma added a comment -

          Should we use another name other than ipc.client.write.timeout given it can cover scenarios besides RPC request write time out?

          • HDFS-4858 covers the case where "The RPC server is unplugged before RPC call is delivered to the RPC server TCP stack". That is where write timeout applies.
          • RPC request has been delivered to the RPC server, but client doesn't get any response. That could happen as in YARN-2714 where RPC server swallows OutOfMemoryError and just drops the response. Or the RPC request is still in RPC server call queue when RPC server is unplugged.

          It seems like we want to define some end to end timeout, measure between the time when the RPC client writes the RPC call to client TCP stack and the time when RPC client reads the RPC response from client TCP stack.

          Show
          mingma Ming Ma added a comment - Should we use another name other than ipc.client.write.timeout given it can cover scenarios besides RPC request write time out? HDFS-4858 covers the case where "The RPC server is unplugged before RPC call is delivered to the RPC server TCP stack". That is where write timeout applies. RPC request has been delivered to the RPC server, but client doesn't get any response. That could happen as in YARN-2714 where RPC server swallows OutOfMemoryError and just drops the response. Or the RPC request is still in RPC server call queue when RPC server is unplugged. It seems like we want to define some end to end timeout, measure between the time when the RPC client writes the RPC call to client TCP stack and the time when RPC client reads the RPC response from client TCP stack.
          Hide
          mingma Ming Ma added a comment -

          Wilfred Spiegelenburg, are you working on this? If not, I can provide the initial patch for discussion.

          Show
          mingma Ming Ma added a comment - Wilfred Spiegelenburg , are you working on this? If not, I can provide the initial patch for discussion.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          A first version of a patch to set a value for the write timeout. I have used the proposed write timeout property name as given by Colin P. McCabe. I have set it to an arbitrary default of 5 minutes, which seems reasonable. We still allow anyone to still set it to 0 (no timeout) if they want it.

          The cases described by Ming Ma above YARN-2714, HDFS-4858 and YARN-2578 all have the same no response to a write cause. Setting this timeout should solve all these issues.
          One point I think needs to be checked is the getTimeOut() in Client.java. It uses the ping interval as a timeout, if ping is not enabled. I think that this should be changed to use the same timeout as this change introduces. Also changing the time out based on the ping interval is not really logical. This jira might not be the correct one to introduce that change so I left it out.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - A first version of a patch to set a value for the write timeout. I have used the proposed write timeout property name as given by Colin P. McCabe . I have set it to an arbitrary default of 5 minutes, which seems reasonable. We still allow anyone to still set it to 0 (no timeout) if they want it. The cases described by Ming Ma above YARN-2714 , HDFS-4858 and YARN-2578 all have the same no response to a write cause. Setting this timeout should solve all these issues. One point I think needs to be checked is the getTimeOut() in Client.java. It uses the ping interval as a timeout, if ping is not enabled. I think that this should be changed to use the same timeout as this change introduces. Also changing the time out based on the ping interval is not really logical. This jira might not be the correct one to introduce that change so I left it out.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12684142/HADOOP-11252.patch
          against trunk revision c1f2bb2.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5130//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5130//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684142/HADOOP-11252.patch against trunk revision c1f2bb2. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5130//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5130//console This message is automatically generated.
          Hide
          mingma Ming Ma added a comment -

          Thanks, Wilfred Spiegelenburg. Yes, it should cover all the cases discussed so far.

          Regarding the name of the parameter, write.timeout seems to indicates the RPC call request doesn't make it to the server. As mentioned above, there are other scenarios where the RPC call request actually makes it to the server only that server doesn't respond. Should we use another name like "ipc.client.call.timeout.ms"?

          For the unit test, we can verify the scenario where RPC call queue is full and thus RPC server doesn't respond in time. Here is the code snippet https://gist.github.com/mingmasplace/ce544ab21bbc4ff17564.

          Show
          mingma Ming Ma added a comment - Thanks, Wilfred Spiegelenburg . Yes, it should cover all the cases discussed so far. Regarding the name of the parameter, write.timeout seems to indicates the RPC call request doesn't make it to the server. As mentioned above, there are other scenarios where the RPC call request actually makes it to the server only that server doesn't respond. Should we use another name like "ipc.client.call.timeout.ms"? For the unit test, we can verify the scenario where RPC call queue is full and thus RPC server doesn't respond in time. Here is the code snippet https://gist.github.com/mingmasplace/ce544ab21bbc4ff17564 .
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Wilfred, thanks for working on this.

          I want to start by making sure I understand the patch correctly. We're changing the default rpc timeout to be 5min rather than 0. This means that, rather than sending a ping after a read blocks for 60s, we throw an exception after a read blocks for 5 mins. This actually does not involve write timeouts in the SO_SNDTIMEO sense, so it seems misleading to call it a "write timeout". If we get blocked on the socket write, we will still get stuck until the tcp stack bugs out (the tcp_retries2 you've mentioned elsewhere).

          As Daryn Sharp points out above, and also on HDFS-4858 by Aaron T. Myers, we've historically been reticent to change defaults like this because of potential side-effects. I'm not comfortable changing the defaults here either, without sign-off from e.g. Daryn Sharp who knows the RPC stuff better.

          So, a few review comments:

          • Let's rename the config param as Ming recommends above, seems more accurate. Including Ming's unit test would also be great.
          • Let's keep the default value of this at 0 to preserve current behavior, unless Daryn Sharp ok's things.
          • Since getPingInterval is now package-protected, we should also change setPingInterval to package-protected for parity. It's only used in a test.
          • Need to add the new config key to core-default.xml also, with description.
          Show
          andrew.wang Andrew Wang added a comment - Hi Wilfred, thanks for working on this. I want to start by making sure I understand the patch correctly. We're changing the default rpc timeout to be 5min rather than 0. This means that, rather than sending a ping after a read blocks for 60s, we throw an exception after a read blocks for 5 mins. This actually does not involve write timeouts in the SO_SNDTIMEO sense, so it seems misleading to call it a "write timeout". If we get blocked on the socket write, we will still get stuck until the tcp stack bugs out (the tcp_retries2 you've mentioned elsewhere). As Daryn Sharp points out above, and also on HDFS-4858 by Aaron T. Myers , we've historically been reticent to change defaults like this because of potential side-effects. I'm not comfortable changing the defaults here either, without sign-off from e.g. Daryn Sharp who knows the RPC stuff better. So, a few review comments: Let's rename the config param as Ming recommends above, seems more accurate. Including Ming's unit test would also be great. Let's keep the default value of this at 0 to preserve current behavior, unless Daryn Sharp ok's things. Since getPingInterval is now package-protected, we should also change setPingInterval to package-protected for parity. It's only used in a test. Need to add the new config key to core-default.xml also, with description.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          Andrew Wang due to the way the rpc timeout in the client code overwrites the ping time out you are most likely correct. I'll have to step through the code in the client to make sure it behaves as intended. The ping is generated after a SocketTimeoutException is thrown on the input stream which is triggered by the setSoTimeout(pingInterval) on the socket, combined with the overwrite that could be a problem. This might require a further decoupling of the ping and rpc time out.

          I also noticed that the ping output stream is created with a fixed timeout of 0, that means we can still hang up there after the changes.

          Looking at the HDFS code also to see how it is handled there and all references to the timeout that we are setting are called "socket write time out". I am happy to call it something else but this seems to be in line with HDFS also. The SO_SNDTIMEO only comes into play when the send buffers on the OS level on the local machine are full (as far as I am aware). If the buffer was not full when I wrote the data the time out will never trigger and I directly fall through to the tcp retries. That case should be handled by the time out we are setting.

          The default change was a proposal and setting it to 0 is the right choice for backwards compatibility.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - Andrew Wang due to the way the rpc timeout in the client code overwrites the ping time out you are most likely correct. I'll have to step through the code in the client to make sure it behaves as intended. The ping is generated after a SocketTimeoutException is thrown on the input stream which is triggered by the setSoTimeout(pingInterval) on the socket, combined with the overwrite that could be a problem. This might require a further decoupling of the ping and rpc time out. I also noticed that the ping output stream is created with a fixed timeout of 0, that means we can still hang up there after the changes. Looking at the HDFS code also to see how it is handled there and all references to the timeout that we are setting are called "socket write time out". I am happy to call it something else but this seems to be in line with HDFS also. The SO_SNDTIMEO only comes into play when the send buffers on the OS level on the local machine are full (as far as I am aware). If the buffer was not full when I wrote the data the time out will never trigger and I directly fall through to the tcp retries. That case should be handled by the time out we are setting. The default change was a proposal and setting it to 0 is the right choice for backwards compatibility.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12684142/HADOOP-11252.patch
          against trunk revision 3f56a4c.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5735//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5735//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12684142/HADOOP-11252.patch against trunk revision 3f56a4c. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5735//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5735//console This message is automatically generated.
          Hide
          rchiang Ray Chiang added a comment -

          +1 (non-binding). The code changes look fine to me. As an easy test of its effect, I'm able to cause problems on my cluster by setting the value to 1ms. Reasonable values run fine for me.

          I'd like to see the new property properly documented. From what I can see, core-default.xml looks like the right place.

          Show
          rchiang Ray Chiang added a comment - +1 (non-binding). The code changes look fine to me. As an easy test of its effect, I'm able to cause problems on my cluster by setting the value to 1ms. Reasonable values run fine for me. I'd like to see the new property properly documented. From what I can see, core-default.xml looks like the right place.
          Hide
          rchiang Ray Chiang added a comment -

          Talked to Wilfred. Canceling patch until he works out the RPC timeout vs ping timeout issues in the next patch.

          Show
          rchiang Ray Chiang added a comment - Talked to Wilfred. Canceling patch until he works out the RPC timeout vs ping timeout issues in the next patch.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Hi Ray Chiang and Wilfred Spiegelenburg, how is this issue going? We hit this issue and look forward to your updates.

          Show
          ajisakaa Akira Ajisaka added a comment - Hi Ray Chiang and Wilfred Spiegelenburg , how is this issue going? We hit this issue and look forward to your updates.
          Hide
          ajithshetty Ajith S added a comment -

          Patch looks good, can we commit this patch and track decoupling ping and timeout in separate jira.?

          Show
          ajithshetty Ajith S added a comment - Patch looks good, can we commit this patch and track decoupling ping and timeout in separate jira.?
          Hide
          rchiang Ray Chiang added a comment -

          Sorry for the delay. I think I'm fine with Ajith's suggestion and keeping this patch as-is and filing a separate JIRA for decoupling ping and timeout.

          Show
          rchiang Ray Chiang added a comment - Sorry for the delay. I think I'm fine with Ajith's suggestion and keeping this patch as-is and filing a separate JIRA for decoupling ping and timeout.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          The patch mostly looks good to me. Hi Wilfred Spiegelenburg, would you reflect the review comments from Andrew? I'm +1 if those are addressed.

          Show
          ajisakaa Akira Ajisaka added a comment - The patch mostly looks good to me. Hi Wilfred Spiegelenburg , would you reflect the review comments from Andrew? I'm +1 if those are addressed.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          Wilfred Spiegelenburg, do you have any update on this? I tested equivalent patch in YARN-2578 and +1(non-binding) for the fix. I would like to update the patch based on Andrew Wang's comment if you don't have time. Thanks.

          Show
          iwasakims Masatake Iwasaki added a comment - Wilfred Spiegelenburg , do you have any update on this? I tested equivalent patch in YARN-2578 and +1(non-binding) for the fix. I would like to update the patch based on Andrew Wang 's comment if you don't have time. Thanks.
          Hide
          wilfreds Wilfred Spiegelenburg added a comment -

          sorry I have been occupied with a number of other things over the last period. I finally have some cycles and will look at this over the coming days.

          Show
          wilfreds Wilfred Spiegelenburg added a comment - sorry I have been occupied with a number of other things over the last period. I finally have some cycles and will look at this over the coming days.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Hi Wilfred Spiegelenburg, how is this issue going? This issue is critical for us, so I'd like to fix it early.

          Show
          ajisakaa Akira Ajisaka added a comment - Hi Wilfred Spiegelenburg , how is this issue going? This issue is critical for us, so I'd like to fix it early.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          I updated the patch based on the review comment. Sorry for breaking in, Wilfred Spiegelenburg.

          • made Client#PingInterval package protected.
          • changed the name of configuration keys and methods.
          • changed the default timeout value to 0.
          • added unit test for the timeout.
          • added entries to core-default.xml.
          Show
          iwasakims Masatake Iwasaki added a comment - I updated the patch based on the review comment. Sorry for breaking in, Wilfred Spiegelenburg . made Client#PingInterval package protected. changed the name of configuration keys and methods. changed the default timeout value to 0. added unit test for the timeout. added entries to core-default.xml.
          Hide
          mohdshahidkhan Mohammad Shahid Khan added a comment -

          +1 for current patch.
          And the Wilfred Spiegelenburg's Comment

          One point I think needs to be checked is the getTimeOut() in Client.java. It uses the ping interval as a timeout, if ping is not enabled. I think that this should be changed to use the same timeout as this change introduces. Also changing the time out based on the ping interval is not really logical. This jira might not be the correct one to introduce that change so I left it out.

          I think we should track this in separate jira. Any thoughts?

          Show
          mohdshahidkhan Mohammad Shahid Khan added a comment - +1 for current patch. And the Wilfred Spiegelenburg 's Comment One point I think needs to be checked is the getTimeOut() in Client.java. It uses the ping interval as a timeout, if ping is not enabled. I think that this should be changed to use the same timeout as this change introduces. Also changing the time out based on the ping interval is not really logical. This jira might not be the correct one to introduce that change so I left it out. I think we should track this in separate jira. Any thoughts?
          Hide
          andrew.wang Andrew Wang added a comment -

          LGTM besides a few nits, +1 pending:

          • Unnecessary whitespace change in CommonConfigurationKeys
          • ipc.client.ping description in core-default.xml, I think you meant "a byte" rather than "byte"

          Thanks Masatake Iwasaki!

          Show
          andrew.wang Andrew Wang added a comment - LGTM besides a few nits, +1 pending: Unnecessary whitespace change in CommonConfigurationKeys ipc.client.ping description in core-default.xml, I think you meant "a byte" rather than "byte" Thanks Masatake Iwasaki !
          Hide
          iwasakims Masatake Iwasaki added a comment -

          Thanks, Andrew Wang. I updated the patch.

          Show
          iwasakims Masatake Iwasaki added a comment - Thanks, Andrew Wang . I updated the patch.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          I think we should track this in separate jira.

          I agree, filed HADOOP-12672 as follow-up.

          Show
          iwasakims Masatake Iwasaki added a comment - I think we should track this in separate jira. I agree, filed HADOOP-12672 as follow-up.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 53s trunk passed
          +1 compile 8m 59s trunk passed with JDK v1.8.0_66
          +1 compile 9m 29s trunk passed with JDK v1.7.0_91
          +1 checkstyle 1m 3s trunk passed
          +1 mvnsite 2m 6s trunk passed
          +1 mvneclipse 0m 30s trunk passed
          +1 findbugs 4m 2s trunk passed
          +1 javadoc 2m 10s trunk passed with JDK v1.8.0_66
          +1 javadoc 3m 0s trunk passed with JDK v1.7.0_91
          +1 mvninstall 2m 32s the patch passed
          +1 compile 9m 4s the patch passed with JDK v1.8.0_66
          -1 javac 18m 50s root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 731, now 731).
          +1 javac 9m 4s the patch passed
          +1 compile 9m 42s the patch passed with JDK v1.7.0_91
          -1 javac 28m 32s root-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new issues (was 724, now 724).
          +1 javac 9m 42s the patch passed
          -1 checkstyle 1m 5s Patch generated 5 new checkstyle issues in root (total was 381, now 382).
          +1 mvnsite 2m 2s the patch passed
          +1 mvneclipse 0m 29s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 4m 16s the patch passed
          +1 javadoc 2m 5s the patch passed with JDK v1.8.0_66
          +1 javadoc 2m 57s the patch passed with JDK v1.7.0_91
          +1 unit 8m 19s hadoop-common in the patch passed with JDK v1.8.0_66.
          -1 unit 63m 15s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          +1 unit 9m 13s hadoop-common in the patch passed with JDK v1.7.0_91.
          -1 unit 59m 16s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          -1 asflicense 0m 30s Patch generated 1 ASF License warnings.
          215m 20s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
            hadoop.hdfs.TestWriteReadStripedFile
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
            hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad
            hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality
            hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
            hadoop.hdfs.shortcircuit.TestShortCircuitCache
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockManager
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.balancer.TestBalancer
            hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040
            hadoop.hdfs.server.namenode.TestNNThroughputBenchmark



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779195/HADOOP-11252.003.patch
          JIRA Issue HADOOP-11252
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux 6177298d8d63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / bb5df27
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          javac root-jdk1.8.0_66: https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/diff-compile-javac-root-jdk1.8.0_66.txt
          javac root-jdk1.7.0_91: https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/diff-compile-javac-root-jdk1.7.0_91.txt
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/diff-checkstyle-root.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 53s trunk passed +1 compile 8m 59s trunk passed with JDK v1.8.0_66 +1 compile 9m 29s trunk passed with JDK v1.7.0_91 +1 checkstyle 1m 3s trunk passed +1 mvnsite 2m 6s trunk passed +1 mvneclipse 0m 30s trunk passed +1 findbugs 4m 2s trunk passed +1 javadoc 2m 10s trunk passed with JDK v1.8.0_66 +1 javadoc 3m 0s trunk passed with JDK v1.7.0_91 +1 mvninstall 2m 32s the patch passed +1 compile 9m 4s the patch passed with JDK v1.8.0_66 -1 javac 18m 50s root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 731, now 731). +1 javac 9m 4s the patch passed +1 compile 9m 42s the patch passed with JDK v1.7.0_91 -1 javac 28m 32s root-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new issues (was 724, now 724). +1 javac 9m 42s the patch passed -1 checkstyle 1m 5s Patch generated 5 new checkstyle issues in root (total was 381, now 382). +1 mvnsite 2m 2s the patch passed +1 mvneclipse 0m 29s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 4m 16s the patch passed +1 javadoc 2m 5s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 57s the patch passed with JDK v1.7.0_91 +1 unit 8m 19s hadoop-common in the patch passed with JDK v1.8.0_66. -1 unit 63m 15s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 9m 13s hadoop-common in the patch passed with JDK v1.7.0_91. -1 unit 59m 16s hadoop-hdfs in the patch failed with JDK v1.7.0_91. -1 asflicense 0m 30s Patch generated 1 ASF License warnings. 215m 20s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160   hadoop.hdfs.TestWriteReadStripedFile   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad   hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark   hadoop.hdfs.shortcircuit.TestShortCircuitCache JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure040   hadoop.hdfs.server.namenode.TestNNThroughputBenchmark Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779195/HADOOP-11252.003.patch JIRA Issue HADOOP-11252 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux 6177298d8d63 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / bb5df27 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 javac root-jdk1.8.0_66: https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/diff-compile-javac-root-jdk1.8.0_66.txt javac root-jdk1.7.0_91: https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/diff-compile-javac-root-jdk1.7.0_91.txt checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/testReport/ asflicense https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8304/console This message was automatically generated.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          I addressed checkstyle warnings in 004.

          • javac warnings are not related to the patch.
          • ASF License warning was caused by YARN-4234.
          Show
          iwasakims Masatake Iwasaki added a comment - I addressed checkstyle warnings in 004. javac warnings are not related to the patch. ASF License warning was caused by YARN-4234 .
          Hide
          iwasakims Masatake Iwasaki added a comment -

          The patch does not change the default behavior. Failed tests are flaky regardless of the patch and succeeded except TestReplicationPolicyConsiderLoad for which HDFS-9597 is already filed.

          Show
          iwasakims Masatake Iwasaki added a comment - The patch does not change the default behavior. Failed tests are flaky regardless of the patch and succeeded except TestReplicationPolicyConsiderLoad for which HDFS-9597 is already filed.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 46s trunk passed
          +1 compile 7m 59s trunk passed with JDK v1.8.0_66
          +1 compile 8m 49s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 57s trunk passed
          +1 mvnsite 1m 54s trunk passed
          +1 mvneclipse 0m 28s trunk passed
          +1 findbugs 3m 42s trunk passed
          +1 javadoc 2m 1s trunk passed with JDK v1.8.0_66
          +1 javadoc 2m 49s trunk passed with JDK v1.7.0_91
          +1 mvninstall 2m 35s the patch passed
          +1 compile 7m 58s the patch passed with JDK v1.8.0_66
          -1 javac 17m 17s root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 731, now 731).
          +1 javac 7m 58s the patch passed
          +1 compile 8m 49s the patch passed with JDK v1.7.0_91
          -1 javac 26m 7s root-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new issues (was 724, now 724).
          +1 javac 8m 49s the patch passed
          +1 checkstyle 0m 58s the patch passed
          +1 mvnsite 1m 55s the patch passed
          +1 mvneclipse 0m 28s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 xml 0m 1s The patch has no ill-formed XML file.
          +1 findbugs 4m 4s the patch passed
          +1 javadoc 1m 57s the patch passed with JDK v1.8.0_66
          +1 javadoc 2m 50s the patch passed with JDK v1.7.0_91
          +1 unit 7m 21s hadoop-common in the patch passed with JDK v1.8.0_66.
          -1 unit 51m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          +1 unit 7m 36s hadoop-common in the patch passed with JDK v1.7.0_91.
          -1 unit 50m 34s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          -1 asflicense 0m 25s Patch generated 1 ASF License warnings.
          187m 12s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestNNThroughputBenchmark
            hadoop.hdfs.server.balancer.TestBalancer
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.TestNameNodeMetricsLogger
            hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779371/HADOOP-11252.004.patch
          JIRA Issue HADOOP-11252
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml
          uname Linux b2baea48b884 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0af492b
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          javac root-jdk1.8.0_66: https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/diff-compile-javac-root-jdk1.8.0_66.txt
          javac root-jdk1.7.0_91: https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/diff-compile-javac-root-jdk1.7.0_91.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: .
          Max memory used 75MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 46s trunk passed +1 compile 7m 59s trunk passed with JDK v1.8.0_66 +1 compile 8m 49s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 57s trunk passed +1 mvnsite 1m 54s trunk passed +1 mvneclipse 0m 28s trunk passed +1 findbugs 3m 42s trunk passed +1 javadoc 2m 1s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 49s trunk passed with JDK v1.7.0_91 +1 mvninstall 2m 35s the patch passed +1 compile 7m 58s the patch passed with JDK v1.8.0_66 -1 javac 17m 17s root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 731, now 731). +1 javac 7m 58s the patch passed +1 compile 8m 49s the patch passed with JDK v1.7.0_91 -1 javac 26m 7s root-jdk1.7.0_91 with JDK v1.7.0_91 generated 2 new issues (was 724, now 724). +1 javac 8m 49s the patch passed +1 checkstyle 0m 58s the patch passed +1 mvnsite 1m 55s the patch passed +1 mvneclipse 0m 28s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 xml 0m 1s The patch has no ill-formed XML file. +1 findbugs 4m 4s the patch passed +1 javadoc 1m 57s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 50s the patch passed with JDK v1.7.0_91 +1 unit 7m 21s hadoop-common in the patch passed with JDK v1.8.0_66. -1 unit 51m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 7m 36s hadoop-common in the patch passed with JDK v1.7.0_91. -1 unit 50m 34s hadoop-hdfs in the patch failed with JDK v1.7.0_91. -1 asflicense 0m 25s Patch generated 1 ASF License warnings. 187m 12s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestNNThroughputBenchmark   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.TestNameNodeMetricsLogger   hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12779371/HADOOP-11252.004.patch JIRA Issue HADOOP-11252 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml uname Linux b2baea48b884 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0af492b Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 javac root-jdk1.8.0_66: https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/diff-compile-javac-root-jdk1.8.0_66.txt javac root-jdk1.7.0_91: https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/diff-compile-javac-root-jdk1.7.0_91.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/testReport/ asflicense https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . Max memory used 75MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8308/console This message was automatically generated.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          LGTM, +1.

          Show
          ajisakaa Akira Ajisaka added a comment - LGTM, +1.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Committed v4 patch to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6. Thanks Wilfred Spiegelenburg and Masatake Iwasaki for the contribution, and thanks Andrew Wang for the review.

          Show
          ajisakaa Akira Ajisaka added a comment - Committed v4 patch to trunk, branch-2, branch-2.8, branch-2.7, and branch-2.6. Thanks Wilfred Spiegelenburg and Masatake Iwasaki for the contribution, and thanks Andrew Wang for the review.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9043 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9043/)
          HADOOP-11252. RPC client does not time out by default. Contributed by (aajisaka: rev 64ae85fd2ea91f46ab3b21f007befbeef8c3c947)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          • hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9043 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9043/ ) HADOOP-11252 . RPC client does not time out by default. Contributed by (aajisaka: rev 64ae85fd2ea91f46ab3b21f007befbeef8c3c947) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java hadoop-common-project/hadoop-common/src/main/resources/core-default.xml hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRPC.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          chiwanpark Chiwan Park added a comment -

          Hi all, I have a question about this issue.

          Since 2.7.3 release, Client.get/setPingInterval is changed from public to package-private. Is there any reason to change visibility? This change breaks *source-level* compatibility. Giraph is one of broken examples for this changes. (https://github.com/apache/giraph/blob/release-1.0/giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java#L202)

          Show
          chiwanpark Chiwan Park added a comment - Hi all, I have a question about this issue. Since 2.7.3 release, Client.get/setPingInterval is changed from public to package-private. Is there any reason to change visibility? This change breaks * source-level * compatibility. Giraph is one of broken examples for this changes. ( https://github.com/apache/giraph/blob/release-1.0/giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java#L202 )
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Chiwan Park sorry for inconvenience. I think it looks our mistake, and it should be reverted from 2.7.x branch based on Hadoop compatibility guide.

          I will revert it from branch-2.7 if there are no opposition until tomorrow.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Chiwan Park sorry for inconvenience. I think it looks our mistake, and it should be reverted from 2.7.x branch based on Hadoop compatibility guide. I will revert it from branch-2.7 if there are no opposition until tomorrow.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Thanks Chiwan Park for the report.

          Is there any reason to change visibility? This change breaks source-level compatibility.

          Client is annotated as @InterfaceStability.Evolving, so we can change the API between minor releases, but not between point releases. I'll file a jira for fix it.

          Giraph is one of broken examples for this changes. (https://github.com/apache/giraph/blob/release-1.0/giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java#L202)

          You can use the following code instead.

          giraphConfiguration.setInt("ipc.ping.interval", 60000 * 5);
          
          Show
          ajisakaa Akira Ajisaka added a comment - Thanks Chiwan Park for the report. Is there any reason to change visibility? This change breaks source-level compatibility. Client is annotated as @InterfaceStability.Evolving , so we can change the API between minor releases, but not between point releases. I'll file a jira for fix it. Giraph is one of broken examples for this changes. ( https://github.com/apache/giraph/blob/release-1.0/giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java#L202 ) You can use the following code instead. giraphConfiguration.setInt( "ipc.ping.interval" , 60000 * 5);
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Filed HADOOP-13579.

          Show
          ajisakaa Akira Ajisaka added a comment - Filed HADOOP-13579 .
          Hide
          ajisakaa Akira Ajisaka added a comment -

          I will revert it from branch-2.7 if there are no opposition until tomorrow.

          I'm thinking we should fix in a separate jira because HADOOP-11252 is included in 2.7.3, which is already released. If we revert this and fix the compatibility in HADOOP-11252, there is a conflict in release note.

          Show
          ajisakaa Akira Ajisaka added a comment - I will revert it from branch-2.7 if there are no opposition until tomorrow. I'm thinking we should fix in a separate jira because HADOOP-11252 is included in 2.7.3, which is already released. If we revert this and fix the compatibility in HADOOP-11252 , there is a conflict in release note.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          Replied on HADOOP-13579.

          Show
          ozawa Tsuyoshi Ozawa added a comment - Replied on HADOOP-13579 .
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Replied on HADOOP-13579.

          Thanks!

          It's our mistake. Sorry for the inconvenience, Chiwan Park.

          Show
          ajisakaa Akira Ajisaka added a comment - Replied on HADOOP-13579 . Thanks! It's our mistake. Sorry for the inconvenience, Chiwan Park .
          Hide
          chiwanpark Chiwan Park added a comment -

          Thanks for explanation all!

          Show
          chiwanpark Chiwan Park added a comment - Thanks for explanation all!
          Hide
          ajisakaa Akira Ajisaka added a comment -

          You can use the following code instead.

          giraphConfiguration.setInt("ipc.ping.interval", 60000 * 5);
          

          Filed GIRAPH-1110 for this.

          Show
          ajisakaa Akira Ajisaka added a comment - You can use the following code instead. giraphConfiguration.setInt( "ipc.ping.interval" , 60000 * 5); Filed GIRAPH-1110 for this.
          Hide
          ozawa Tsuyoshi Ozawa added a comment - - edited

          Akira Ajisaka

          giraphConfiguration.setInt("ipc.ping.interval", 60000 * 5);
          

          If an user need to change the value from program, we should keep the API public. It means that "user can change the IPC ping interval via program in Hadoop". In this case, why the API should be removed?

          If an user shouldn't change the value, we can make the API private. In that case, your suggestion on GIRAPH-1110 can break something - it works by chance for now, but future change which is assuming that the ipc timeout is private can cause a problem). Does it make sense to you?

          Show
          ozawa Tsuyoshi Ozawa added a comment - - edited Akira Ajisaka giraphConfiguration.setInt( "ipc.ping.interval" , 60000 * 5); If an user need to change the value from program, we should keep the API public. It means that "user can change the IPC ping interval via program in Hadoop". In this case, why the API should be removed? If an user shouldn't change the value, we can make the API private. In that case, your suggestion on GIRAPH-1110 can break something - it works by chance for now, but future change which is assuming that the ipc timeout is private can cause a problem). Does it make sense to you?
          Hide
          ajisakaa Akira Ajisaka added a comment - - edited

          If an user need to change the value from program

          Yes. Users may change the value from program.

          In this case, why the API should be removed?

          This API is unused in Hadoop except test code, and Client.java is not annotated with @InterfaceAudience.Stable, so we decided to remove it.
          Andrew Wang and Masatake Iwasaki, please correct me if I'm wrong.

          Show
          ajisakaa Akira Ajisaka added a comment - - edited If an user need to change the value from program Yes. Users may change the value from program. In this case, why the API should be removed? This API is unused in Hadoop except test code, and Client.java is not annotated with @InterfaceAudience.Stable , so we decided to remove it. Andrew Wang and Masatake Iwasaki , please correct me if I'm wrong.
          Hide
          ajisakaa Akira Ajisaka added a comment -

          Now I'm thinking it's good for users to deprecate these APIs and cherry-pick HADOOP-13579 to branch-2/2.8.

          Show
          ajisakaa Akira Ajisaka added a comment - Now I'm thinking it's good for users to deprecate these APIs and cherry-pick HADOOP-13579 to branch-2/2.8.
          Hide
          ozawa Tsuyoshi Ozawa added a comment -

          This API is unused in Hadoop except test code, and Client.java is not annotated with @InterfaceAudience.Stable, so we decided to remove it.

          We should have a design-level discussion here or another jira including:

          • Should we accept the change of the value from program?
            • If the answer is positive, does it affect the policy or maintenance of the value of ipc.client.rpc-timeout.ms? e.g. IPC_PING_INTERVAL_KEY must be smaller than IPC_CLIENT_RPC_TIMEOUT_KEY, etc.
          Show
          ozawa Tsuyoshi Ozawa added a comment - This API is unused in Hadoop except test code, and Client.java is not annotated with @InterfaceAudience.Stable, so we decided to remove it. We should have a design-level discussion here or another jira including: Should we accept the change of the value from program? If the answer is positive, does it affect the policy or maintenance of the value of ipc.client.rpc-timeout.ms? e.g. IPC_PING_INTERVAL_KEY must be smaller than IPC_CLIENT_RPC_TIMEOUT_KEY, etc.
          Hide
          iwasakims Masatake Iwasaki added a comment -

          Should we accept the change of the value from program?

          I think there is no usecase for applications and donstream projects. org.apache.hadoop.ipc is marked as LimitedPrivate and is not publiched in javadoc. Though HBase is in the audience, HBase has its own ipc code now.

          Show
          iwasakims Masatake Iwasaki added a comment - Should we accept the change of the value from program? I think there is no usecase for applications and donstream projects. org.apache.hadoop.ipc is marked as LimitedPrivate and is not publiched in javadoc . Though HBase is in the audience, HBase has its own ipc code now.

            People

            • Assignee:
              iwasakims Masatake Iwasaki
              Reporter:
              wilfreds Wilfred Spiegelenburg
            • Votes:
              2 Vote for this issue
              Watchers:
              36 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development