Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11552

Allow handoff on the server side for RPC requests

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: ipc
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      An RPC server handler thread is tied up for each incoming RPC request. This isn't ideal, since this essentially implies that RPC operations should be short lived, and most operations which could take time end up falling back to a polling mechanism.

      Some use cases where this is useful.

      • YARN submitApplication - which currently submits, followed by a poll to check if the application is accepted while the submit operation is written out to storage. This can be collapsed into a single call.
      • YARN allocate - requests and allocations use the same protocol. New allocations are received via polling.
        The allocate protocol could be split into a request/heartbeat along with a 'awaitResponse'. The request/heartbeat is sent only when there's a request or on a much longer heartbeat interval. awaitResponse is always left active with the RM - and returns the moment something is available.
        MapReduce/Tez task to AM communication is another example of this pattern.

      The same pattern of splitting calls can be used for other protocols as well. This should serve to improve latency, as well as reduce network traffic since the keep-alive heartbeat can be sent less frequently.

      I believe there's some cases in HDFS as well, where the DN gets told to perform some operations when they heartbeat into the NN.

      1. HADOOP-11552.05.patch
        38 kB
        Siddharth Seth
      2. HADOOP-11552.06.patch
        39 kB
        Siddharth Seth
      3. HADOOP-11552.07.patch
        39 kB
        Siddharth Seth
      4. HADOOP-11552.08.patch
        39 kB
        Jian He
      5. HADOOP-11552.1.wip.txt
        18 kB
        Siddharth Seth
      6. HADOOP-11552.2.txt
        30 kB
        Siddharth Seth
      7. HADOOP-11552.3.txt
        31 kB
        Siddharth Seth
      8. HADOOP-11552.3.txt
        30 kB
        Siddharth Seth
      9. HADOOP-11552.4.txt
        30 kB
        Siddharth Seth

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10882 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10882/)
          HADOOP-11552. Allow handoff on the server side for RPC requests. (jianhe: rev 3d94da1e00fc6238fad458e415219f87920f1fc3)

          • (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngineCallback.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcDetailedMetrics.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcMetrics.java
          • (add) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpcServerHandoff.java
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java
          • (edit) hadoop-common-project/hadoop-common/src/test/proto/test.proto
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
          • (edit) hadoop-common-project/hadoop-common/src/test/proto/test_rpc_service.proto
          • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java
          • (add) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRpcServerHandoff.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10882 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10882/ ) HADOOP-11552 . Allow handoff on the server side for RPC requests. (jianhe: rev 3d94da1e00fc6238fad458e415219f87920f1fc3) (add) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngineCallback.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcDetailedMetrics.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcMetrics.java (add) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpcServerHandoff.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/WritableRpcEngine.java (edit) hadoop-common-project/hadoop-common/src/test/proto/test.proto (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java (edit) hadoop-common-project/hadoop-common/src/test/proto/test_rpc_service.proto (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/ProtobufRpcEngine.java (add) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestRpcServerHandoff.java
          Hide
          jianhe Jian He added a comment -

          Committed to trunk and branch-2, thanks Siddharth Seth !

          Thanks Arun Suresh for reviewing the patch !

          Show
          jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Siddharth Seth ! Thanks Arun Suresh for reviewing the patch !
          Hide
          jianhe Jian He added a comment -

          We'll get this committed tomorrow, if no more comments from others

          Show
          jianhe Jian He added a comment - We'll get this committed tomorrow, if no more comments from others
          Hide
          asuresh Arun Suresh added a comment -

          +1 from me too.

          Show
          asuresh Arun Suresh added a comment - +1 from me too.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 46s trunk passed
          +1 compile 9m 30s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 2s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 23s trunk passed
          +1 javadoc 0m 48s trunk passed
          +1 mvninstall 0m 38s the patch passed
          +1 compile 9m 10s the patch passed
          +1 cc 9m 10s the patch passed
          +1 javac 9m 10s the patch passed
          -0 checkstyle 0m 33s hadoop-common-project/hadoop-common: The patch generated 34 new + 277 unchanged - 1 fixed = 311 total (was 278)
          +1 mvnsite 1m 1s the patch passed
          +1 mvneclipse 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 35s the patch passed
          +1 javadoc 0m 47s the patch passed
          +1 unit 8m 25s hadoop-common in the patch passed.
          +1 asflicense 0m 31s The patch does not generate ASF License warnings.
          45m 26s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-11552
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839732/HADOOP-11552.08.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux f1ae256e6919 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c68dad1
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11108/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11108/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11108/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 46s trunk passed +1 compile 9m 30s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 2s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 23s trunk passed +1 javadoc 0m 48s trunk passed +1 mvninstall 0m 38s the patch passed +1 compile 9m 10s the patch passed +1 cc 9m 10s the patch passed +1 javac 9m 10s the patch passed -0 checkstyle 0m 33s hadoop-common-project/hadoop-common: The patch generated 34 new + 277 unchanged - 1 fixed = 311 total (was 278) +1 mvnsite 1m 1s the patch passed +1 mvneclipse 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 35s the patch passed +1 javadoc 0m 47s the patch passed +1 unit 8m 25s hadoop-common in the patch passed. +1 asflicense 0m 31s The patch does not generate ASF License warnings. 45m 26s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-11552 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839732/HADOOP-11552.08.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux f1ae256e6919 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c68dad1 Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11108/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11108/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11108/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jianhe Jian He added a comment -

          latest patch looks good to me.
          I just fixed the findbug warning myself.

          Show
          jianhe Jian He added a comment - latest patch looks good to me. I just fixed the findbug warning myself.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 43s trunk passed
          +1 compile 9m 30s trunk passed
          +1 checkstyle 0m 34s trunk passed
          +1 mvnsite 1m 4s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 25s trunk passed
          +1 javadoc 0m 48s trunk passed
          +1 mvninstall 0m 39s the patch passed
          +1 compile 9m 17s the patch passed
          +1 cc 9m 17s the patch passed
          +1 javac 9m 17s the patch passed
          -0 checkstyle 0m 33s hadoop-common-project/hadoop-common: The patch generated 38 new + 277 unchanged - 1 fixed = 315 total (was 278)
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 35s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 48s the patch passed
          +1 unit 8m 25s hadoop-common in the patch passed.
          +1 asflicense 0m 32s The patch does not generate ASF License warnings.
          45m 36s



          Reason Tests
          FindBugs module:hadoop-common-project/hadoop-common
            Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:[line 946]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-11552
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839647/HADOOP-11552.07.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 2d0c956bc40e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 7584fbf
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 43s trunk passed +1 compile 9m 30s trunk passed +1 checkstyle 0m 34s trunk passed +1 mvnsite 1m 4s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 25s trunk passed +1 javadoc 0m 48s trunk passed +1 mvninstall 0m 39s the patch passed +1 compile 9m 17s the patch passed +1 cc 9m 17s the patch passed +1 javac 9m 17s the patch passed -0 checkstyle 0m 33s hadoop-common-project/hadoop-common: The patch generated 38 new + 277 unchanged - 1 fixed = 315 total (was 278) +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 35s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 48s the patch passed +1 unit 8m 25s hadoop-common in the patch passed. +1 asflicense 0m 32s The patch does not generate ASF License warnings. 45m 36s Reason Tests FindBugs module:hadoop-common-project/hadoop-common   Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java: [line 946] Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-11552 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839647/HADOOP-11552.07.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 2d0c956bc40e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7584fbf Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11102/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          Thanks for taking a look, and pointing out the issue with setDeferredResponse.
          Modified to not attempt a sendResponse - that's what is done for non deferred calls as well (LOG and forget).

          Show
          sseth Siddharth Seth added a comment - Thanks for taking a look, and pointing out the issue with setDeferredResponse. Modified to not attempt a sendResponse - that's what is done for non deferred calls as well (LOG and forget).
          Hide
          asuresh Arun Suresh added a comment -

          The patch looks good to me and the tests help understand the code path, Thanks Siddharth Seth..

          Minor nit:
          In Server::setDeferedResponse(), if setupResponse() throws an error, you still call sendDefferredResponse(). Wondering if you should just send a canned response.. or atleast something derived from the Exception.

          Show
          asuresh Arun Suresh added a comment - The patch looks good to me and the tests help understand the code path, Thanks Siddharth Seth .. Minor nit: In Server::setDeferedResponse(), if setupResponse() throws an error, you still call sendDefferredResponse(). Wondering if you should just send a canned response.. or atleast something derived from the Exception.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 8m 7s trunk passed
          +1 compile 10m 0s trunk passed
          +1 checkstyle 0m 31s trunk passed
          +1 mvnsite 1m 2s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 1m 25s trunk passed
          +1 javadoc 0m 45s trunk passed
          +1 mvninstall 0m 36s the patch passed
          +1 compile 9m 6s the patch passed
          +1 cc 9m 6s the patch passed
          +1 javac 9m 6s the patch passed
          -0 checkstyle 0m 32s hadoop-common-project/hadoop-common: The patch generated 38 new + 275 unchanged - 1 fixed = 313 total (was 276)
          +1 mvnsite 1m 5s the patch passed
          +1 mvneclipse 0m 19s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 53s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 45s the patch passed
          +1 unit 7m 41s hadoop-common in the patch passed.
          +1 asflicense 0m 32s The patch does not generate ASF License warnings.
          46m 40s



          Reason Tests
          FindBugs module:hadoop-common-project/hadoop-common
            Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:[line 945]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-11552
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839098/HADOOP-11552.06.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 45e626f7a172 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 7ef290c
          Default Java 1.8.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 8m 7s trunk passed +1 compile 10m 0s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 1m 2s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 25s trunk passed +1 javadoc 0m 45s trunk passed +1 mvninstall 0m 36s the patch passed +1 compile 9m 6s the patch passed +1 cc 9m 6s the patch passed +1 javac 9m 6s the patch passed -0 checkstyle 0m 32s hadoop-common-project/hadoop-common: The patch generated 38 new + 275 unchanged - 1 fixed = 313 total (was 276) +1 mvnsite 1m 5s the patch passed +1 mvneclipse 0m 19s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 53s hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 45s the patch passed +1 unit 7m 41s hadoop-common in the patch passed. +1 asflicense 0m 32s The patch does not generate ASF License warnings. 46m 40s Reason Tests FindBugs module:hadoop-common-project/hadoop-common   Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java: [line 945] Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-11552 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12839098/HADOOP-11552.06.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 45e626f7a172 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 7ef290c Default Java 1.8.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11079/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          Revised to address review comments, and fix the findbugs warnings.

          Show
          sseth Siddharth Seth added a comment - Revised to address review comments, and fix the findbugs warnings.
          Hide
          jianhe Jian He added a comment -

          Daryn Sharp, would you like to look at the patch ? If things are going well, we plan to commit this soon.

          Show
          jianhe Jian He added a comment - Daryn Sharp , would you like to look at the patch ? If things are going well, we plan to commit this soon.
          Hide
          jianhe Jian He added a comment -

          looks good to me overall, minor comments:

          • rename " updateMetrics(String name, long processingTime) {" to updateDeferredMetrics ?
          • Should rpcMetrics.addRpcProcessingTime(processingTime); be inside the if (!deferredCall) condition, also the logSlowRpcCalls.
                rpcMetrics.addRpcQueueTime(queueTime);
                rpcMetrics.addRpcProcessingTime(processingTime);
                if (!deferredCall) {
                  rpcDetailedMetrics.addProcessingTime(name, processingTime);
                  callQueue.addResponseTime(name, getPriorityLevel(), queueTime,
                      processingTime);
                }
                if (isLogSlowRPC()) {
                  logSlowRpcCalls(name, processingTime);
                }
            
          • Add assert.fail after "BytesWritable response = (BytesWritable) future.get();" ?
                  server.sendError();
                  try {
                    BytesWritable response = (BytesWritable) future.get();
                  } catch (ExecutionException e) {
                    Throwable cause = e.getCause();
                    Assert.assertTrue(cause instanceof RemoteException);
                    RemoteException re = (RemoteException) cause;
                    Assert.assertTrue(re.toString().contains("DeferredError"));
                  }
            
          • several lines exceed the 80 column limit, mind fixing those?
          • also, the findbugs warnings
          Show
          jianhe Jian He added a comment - looks good to me overall, minor comments: rename " updateMetrics(String name, long processingTime) {" to updateDeferredMetrics ? Should rpcMetrics.addRpcProcessingTime(processingTime); be inside the if (!deferredCall) condition, also the logSlowRpcCalls . rpcMetrics.addRpcQueueTime(queueTime); rpcMetrics.addRpcProcessingTime(processingTime); if (!deferredCall) { rpcDetailedMetrics.addProcessingTime(name, processingTime); callQueue.addResponseTime(name, getPriorityLevel(), queueTime, processingTime); } if (isLogSlowRPC()) { logSlowRpcCalls(name, processingTime); } Add assert.fail after "BytesWritable response = (BytesWritable) future.get();" ? server.sendError(); try { BytesWritable response = (BytesWritable) future .get(); } catch (ExecutionException e) { Throwable cause = e.getCause(); Assert.assertTrue(cause instanceof RemoteException); RemoteException re = (RemoteException) cause; Assert.assertTrue(re.toString().contains( "DeferredError" )); } several lines exceed the 80 column limit, mind fixing those? also, the findbugs warnings
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 1s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 6m 45s trunk passed
          +1 compile 9m 35s trunk passed
          +1 checkstyle 0m 32s trunk passed
          +1 mvnsite 1m 2s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 1m 23s trunk passed
          +1 javadoc 0m 48s trunk passed
          +1 mvninstall 0m 37s the patch passed
          +1 compile 9m 10s the patch passed
          +1 cc 9m 10s the patch passed
          +1 javac 9m 10s the patch passed
          -0 checkstyle 0m 34s hadoop-common-project/hadoop-common: The patch generated 79 new + 276 unchanged - 0 fixed = 355 total (was 276)
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 18s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 1m 34s hadoop-common-project/hadoop-common generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)
          +1 javadoc 0m 49s the patch passed
          +1 unit 8m 22s hadoop-common in the patch passed.
          +1 asflicense 0m 32s The patch does not generate ASF License warnings.
          45m 26s



          Reason Tests
          FindBugs module:hadoop-common-project/hadoop-common
            Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:[line 941]
            Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.setDeferredError(Throwable) At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.setDeferredError(Throwable) At Server.java:[line 985]
            Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.setDeferredResponse(Writable) At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.setDeferredResponse(Writable) At Server.java:[line 959]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue HADOOP-11552
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838809/HADOOP-11552.05.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc
          uname Linux 807b14e2d261 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6efb8c9
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt
          findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/testReport/
          modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 1s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 6m 45s trunk passed +1 compile 9m 35s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 2s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 23s trunk passed +1 javadoc 0m 48s trunk passed +1 mvninstall 0m 37s the patch passed +1 compile 9m 10s the patch passed +1 cc 9m 10s the patch passed +1 javac 9m 10s the patch passed -0 checkstyle 0m 34s hadoop-common-project/hadoop-common: The patch generated 79 new + 276 unchanged - 0 fixed = 355 total (was 276) +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 18s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 1m 34s hadoop-common-project/hadoop-common generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) +1 javadoc 0m 49s the patch passed +1 unit 8m 22s hadoop-common in the patch passed. +1 asflicense 0m 32s The patch does not generate ASF License warnings. 45m 26s Reason Tests FindBugs module:hadoop-common-project/hadoop-common   Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.sendDeferedResponse() At Server.java: [line 941]   Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.setDeferredError(Throwable) At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.setDeferredError(Throwable) At Server.java: [line 985]   Invocation of toString on Server$Call.clientId in org.apache.hadoop.ipc.Server$RpcCall.setDeferredResponse(Writable) At Server.java:in org.apache.hadoop.ipc.Server$RpcCall.setDeferredResponse(Writable) At Server.java: [line 959] Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HADOOP-11552 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838809/HADOOP-11552.05.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc uname Linux 807b14e2d261 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6efb8c9 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt findbugs https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/artifact/patchprocess/new-findbugs-hadoop-common-project_hadoop-common.html Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/testReport/ modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/11058/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          Rebased patch for trunk. This makes some changes for metrics as well.

          Show
          sseth Siddharth Seth added a comment - Rebased patch for trunk. This makes some changes for metrics as well.
          Hide
          daryn Daryn Sharp added a comment -

          The patch is interesting (I planted the idea for this type of functionality). HADOOP-10300 is a bit different. It does allow for a return value but not a dynamic response. The ipc processing occurs exactly as normal, down through the engine, back up, response encoded. The difference is the ability to tell the ipc later "hold on, don't send the response, there's another precondition that must be satisfied". Edit logging for hdfs.

          This lets the processing return nothing, allowing something else to later encode an arbitrary response.

          Patch needs extensive rebasing, but some things we'll need to be careful about:

          • ensure encrypted SASL works correctly, there are subtle ordering issues.
          • consider how to deal with the static Server methods that depend on a thread-local for the call.
          • how to make the rpc metrics accurate, otherwise processing time becomes meaningless
          Show
          daryn Daryn Sharp added a comment - The patch is interesting (I planted the idea for this type of functionality). HADOOP-10300 is a bit different. It does allow for a return value but not a dynamic response. The ipc processing occurs exactly as normal, down through the engine, back up, response encoded. The difference is the ability to tell the ipc later "hold on, don't send the response, there's another precondition that must be satisfied". Edit logging for hdfs. This lets the processing return nothing, allowing something else to later encode an arbitrary response. Patch needs extensive rebasing, but some things we'll need to be careful about: ensure encrypted SASL works correctly, there are subtle ordering issues. consider how to deal with the static Server methods that depend on a thread-local for the call. how to make the rpc metrics accurate, otherwise processing time becomes meaningless
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          0 patch 0m 4s The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions.
          -1 patch 0m 7s HADOOP-11552 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue HADOOP-11552
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12708476/HADOOP-11552.4.txt
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10514/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. 0 patch 0m 4s The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. -1 patch 0m 7s HADOOP-11552 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-11552 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12708476/HADOOP-11552.4.txt Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/10514/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jianhe Jian He added a comment -

          Hi Siddharth Seth, I looked at the patch. the approach looks good to me. Could you rebase it ?
          I'm going to use this for the relocalize API in YARN.

          Show
          jianhe Jian He added a comment - Hi Siddharth Seth , I looked at the patch. the approach looks good to me. Could you rebase it ? I'm going to use this for the relocalize API in YARN.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          0 patch 0m 2s The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions.
          -1 patch 0m 5s HADOOP-11552 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12708476/HADOOP-11552.4.txt
          JIRA Issue HADOOP-11552
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8830/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. 0 patch 0m 2s The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. -1 patch 0m 5s HADOOP-11552 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12708476/HADOOP-11552.4.txt JIRA Issue HADOOP-11552 Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/8830/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          cc/ Vinod Kumar Vavilapalli - thoughts on making YARN changes in a branch ?

          Show
          sseth Siddharth Seth added a comment - cc/ Vinod Kumar Vavilapalli - thoughts on making YARN changes in a branch ?
          Hide
          sseth Siddharth Seth added a comment -

          I'm interested in getting this patch into a released version of Hadoop. Having it in a released version does make it easier to consume for downstream projects; and I do intend to use this feature in Tez - and that can serve as another testbed. Was hoping to get this into 2.7, but it's too late for that. Will change the target version to 2.8 - which gives more breathing room to have it reviewed, and tried out in components within Hadoop.

          There isn't that much work in the RPC layer itself. Follow up patches like the shared thread pool will be more disruptive. When this is used by YARN / HDFS - those patches are likely to be more involved, and a larger change set. I can create jiras for some of the YARN tasks, and would request folks in HDFS to create relevant jiras there.

          This could absolutely be done in a branch. If this particular patch is considered 'safe' - it'd be good to get it into 2.8 even if the rest of the work to use it in sub-components isn't done.

          HADOOP-10300 is related, and this patch borrows elements from there - like I mentioned in my first comment. If I'm not mistaken, 10300 doesn't allow for a return value. Daryn could correct me here if I've understood that incorrectly.

          Multiplexing UGIs over a single connection - that's TBD right ? We still use distinct connections per UGI if I'm not mistaken. Don't think the patch affects this path. Are there plans to support multiplexing responses on a connection - i.e. allow a smaller response through, even if the responder isn't done with a previous response on the same connection ?

          Show
          sseth Siddharth Seth added a comment - I'm interested in getting this patch into a released version of Hadoop. Having it in a released version does make it easier to consume for downstream projects; and I do intend to use this feature in Tez - and that can serve as another testbed. Was hoping to get this into 2.7, but it's too late for that. Will change the target version to 2.8 - which gives more breathing room to have it reviewed, and tried out in components within Hadoop. There isn't that much work in the RPC layer itself. Follow up patches like the shared thread pool will be more disruptive. When this is used by YARN / HDFS - those patches are likely to be more involved, and a larger change set. I can create jiras for some of the YARN tasks, and would request folks in HDFS to create relevant jiras there. This could absolutely be done in a branch. If this particular patch is considered 'safe' - it'd be good to get it into 2.8 even if the rest of the work to use it in sub-components isn't done. HADOOP-10300 is related, and this patch borrows elements from there - like I mentioned in my first comment. If I'm not mistaken, 10300 doesn't allow for a return value. Daryn could correct me here if I've understood that incorrectly. Multiplexing UGIs over a single connection - that's TBD right ? We still use distinct connections per UGI if I'm not mistaken. Don't think the patch affects this path. Are there plans to support multiplexing responses on a connection - i.e. allow a smaller response through, even if the responder isn't done with a previous response on the same connection ?
          Hide
          daryn Daryn Sharp added a comment -

          I'm buried with support but I'll try to look at this ASAP. We've offline talked about wanting to do this for quite awhile. We need to make sure this doesn't introduce any behavior that will prevent multiplexing ugis over a single connection - something specifically considered in RPCv9 and quickly becoming important for very large cluster to reduce connections.

          Show
          daryn Daryn Sharp added a comment - I'm buried with support but I'll try to look at this ASAP. We've offline talked about wanting to do this for quite awhile. We need to make sure this doesn't introduce any behavior that will prevent multiplexing ugis over a single connection - something specifically considered in RPCv9 and quickly becoming important for very large cluster to reduce connections.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708476/HADOOP-11552.4.txt
          against trunk revision 79f7f2a.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6035//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6035//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708476/HADOOP-11552.4.txt against trunk revision 79f7f2a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6035//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6035//console This message is automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment - - edited

          This work is really interesting and I think it could be helpful for us. I guess my concern is mainly that it seems like a big job. Do you think it makes sense to do this in a branch? That way we could experiment without the risk of destabilizing the production RPC code. We could test out some use-cases and see what the best API was.

          I'm also curious if you have taken a look at HADOOP-10300. It seems similar in some ways.

          Show
          cmccabe Colin P. McCabe added a comment - - edited This work is really interesting and I think it could be helpful for us. I guess my concern is mainly that it seems like a big job. Do you think it makes sense to do this in a branch? That way we could experiment without the risk of destabilizing the production RPC code. We could test out some use-cases and see what the best API was. I'm also curious if you have taken a look at HADOOP-10300 . It seems similar in some ways.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708349/HADOOP-11552.3.txt
          against trunk revision 85dc3c1.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6032//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708349/HADOOP-11552.3.txt against trunk revision 85dc3c1. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6032//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708349/HADOOP-11552.3.txt
          against trunk revision 85dc3c1.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6031//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708349/HADOOP-11552.3.txt against trunk revision 85dc3c1. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6031//console This message is automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          Updated patch to fix the rat check and javac warning. I don't think the unit test failure is related.

          I would like to see some code actually using this before we add it, to make sure we are getting the APIs right.

          Colin P. McCabe - the APIs are being added as unstable for now. There's at least one follow up jira to make use a common thread pool to handle requests and send responses - or at least have the RPC layer provide a pool for responses. At the mement apps would have to setup threads to send responses in parallel.
          The test does show how this would be used in an application - where the specific method on a protocol would need to indicate intent to send a delayed response, and relinquish control.
          I think it'll be useful to get this in - so that it's possible to start trying out this mechanism.

          Show
          sseth Siddharth Seth added a comment - Updated patch to fix the rat check and javac warning. I don't think the unit test failure is related. I would like to see some code actually using this before we add it, to make sure we are getting the APIs right. Colin P. McCabe - the APIs are being added as unstable for now. There's at least one follow up jira to make use a common thread pool to handle requests and send responses - or at least have the RPC layer provide a pool for responses. At the mement apps would have to setup threads to send responses in parallel. The test does show how this would be used in an application - where the specific method on a protocol would need to indicate intent to send a delayed response, and relinquish control. I think it'll be useful to get this in - so that it's possible to start trying out this mechanism.
          Hide
          sanjay.radia Sanjay Radia added a comment -

          You mean more than tests?

          Show
          sanjay.radia Sanjay Radia added a comment - You mean more than tests?
          Hide
          cmccabe Colin P. McCabe added a comment -

          I would like to see some code actually using this before we add it, to make sure we are getting the APIs right.

          Show
          cmccabe Colin P. McCabe added a comment - I would like to see some code actually using this before we add it, to make sure we are getting the APIs right.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708198/HADOOP-11552.3.txt
          against trunk revision 5358b83.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          -1 javac. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1148 warnings).

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ipc.TestCallQueueManager

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//artifact/patchprocess/patchReleaseAuditProblems.txt
          Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//artifact/patchprocess/diffJavacWarnings.txt
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708198/HADOOP-11552.3.txt against trunk revision 5358b83. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 javac . The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1148 warnings). +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ipc.TestCallQueueManager Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//artifact/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6026//console This message is automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          Updated patch to apply cleanly to trunk.

          Show
          sseth Siddharth Seth added a comment - Updated patch to apply cleanly to trunk.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12708089/HADOOP-11552.2.txt
          against trunk revision 1ed9fb7.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6023//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708089/HADOOP-11552.2.txt against trunk revision 1ed9fb7. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6023//console This message is automatically generated.
          Hide
          sseth Siddharth Seth added a comment -

          Updated patch with another couple of tests, and error handling.

          Sanjay Radia, Daryn Sharp - please review. Would appreciate it if you could pay additional attention to ensuring this doesn't break the regular flow. I've tried to keep the changes to a minimal for the regular flow.

          Show
          sseth Siddharth Seth added a comment - Updated patch with another couple of tests, and error handling. Sanjay Radia , Daryn Sharp - please review. Would appreciate it if you could pay additional attention to ensuring this doesn't break the regular flow. I've tried to keep the changes to a minimal for the regular flow.
          Hide
          sanjay.radia Sanjay Radia added a comment -

          If we move to an offer-based system like Mesos,

          You are mixing layers. SId is talking about the RPC layer. The layer above RPC such as how Yarn resources are obtained and used will be unaffected.

          have the resource manager make outgoing connections to the executors

          Making outgoing connections as you suggest is another valid approach. For that to work well we need client-side async support while this jira is proposing a server-side "async" (I put async in quotes because in my mind the hand-off is not asycn-rpc since the rpc client blocks till the work is done).

          Another good usecase for this jira is the the write-operations on the NN that write to the journal. Such operations should be handed off to a worker thread who writes to the journal and then replies. The original handler-thread goes back to serving new requests as soon as the hand off is done. If we do this we could drastically reduce the number of handler threads needed in NN (you already noted the reduction in handler threads for the other use case).

          Show
          sanjay.radia Sanjay Radia added a comment - If we move to an offer-based system like Mesos, You are mixing layers. SId is talking about the RPC layer. The layer above RPC such as how Yarn resources are obtained and used will be unaffected. have the resource manager make outgoing connections to the executors Making outgoing connections as you suggest is another valid approach. For that to work well we need client-side async support while this jira is proposing a server-side "async" (I put async in quotes because in my mind the hand-off is not asycn-rpc since the rpc client blocks till the work is done). Another good usecase for this jira is the the write-operations on the NN that write to the journal. Such operations should be handed off to a worker thread who writes to the journal and then replies. The original handler-thread goes back to serving new requests as soon as the hand off is done. If we do this we could drastically reduce the number of handler threads needed in NN (you already noted the reduction in handler threads for the other use case).
          Hide
          sanjay.radia Sanjay Radia added a comment -

          Are you proposing to keep the TCP session open, but reuse the handler thread for something else, while the RPC is progressing?

          Yes, the intent is to keep the TPC session open and re-use the handlers

          Note our RPC system forces the handler thread to do the response and hence we have to have a large number of handler threads since some of the requests (such a write operation on a NN) takes a longer because it has to write to the journal. Other RPC system and also request-response message passing systems allow hand-off to any thread to do the work and reply. The TCP connection being kept open is not due to the handler thread-binding, but it is instead because our RCP layer depends on a connection close to detect server failures (and i believe we send some heartbeat bytes to detect server failures promptly). So we need to keep the connection open if the RPC is operation is not completed.
          Now the impact on RCP connections that you raised:

          • for normal end-clients (e.g. HDFS clients) the connections will remain open as in the original case - ie the till the request is completed and reply is sent. Hence the number of such connections will be the same.
          • for internal clients where the request is of type "do you have more work for me" (as sent by DN or NM) the number of connections will increase but will be bounded. Here we can have a hybrid approach where the the RM could keep a few requests blocked and reply only when work is available and for other such requests it could say "no work, but try 2 seconds later".
          Show
          sanjay.radia Sanjay Radia added a comment - Are you proposing to keep the TCP session open, but reuse the handler thread for something else, while the RPC is progressing? Yes, the intent is to keep the TPC session open and re-use the handlers Note our RPC system forces the handler thread to do the response and hence we have to have a large number of handler threads since some of the requests (such a write operation on a NN) takes a longer because it has to write to the journal. Other RPC system and also request-response message passing systems allow hand-off to any thread to do the work and reply. The TCP connection being kept open is not due to the handler thread-binding, but it is instead because our RCP layer depends on a connection close to detect server failures (and i believe we send some heartbeat bytes to detect server failures promptly). So we need to keep the connection open if the RPC is operation is not completed. Now the impact on RCP connections that you raised: for normal end-clients (e.g. HDFS clients) the connections will remain open as in the original case - ie the till the request is completed and reply is sent. Hence the number of such connections will be the same. for internal clients where the request is of type "do you have more work for me" (as sent by DN or NM) the number of connections will increase but will be bounded. Here we can have a hybrid approach where the the RM could keep a few requests blocked and reply only when work is available and for other such requests it could say "no work, but try 2 seconds later".
          Hide
          sseth Siddharth Seth added a comment -

          Thanks for the feedback Colin P. McCabe.

          Are you proposing to keep the TCP session open, but reuse the handler thread for something else, while the RPC is progressing? It seems like this approach would alleviate the handler thread bottleneck, but potentially run into another bottleneck by keeping open a lot of open TCP sockets (file descriptors). Given that we've had problems with hitting max file descriptors in the past, couldn't this be problematic?

          Yes, the intent is to keep the TPC session open and re-use the handlers (decided by individual protocol). That's a great point about FDs remaining open. Depending on where this mechanism is used, I believe this could be controlled. Examples.

          • AM-RM communication - one connection per AM running on a cluster. This is typically limited to < #nodes on the cluster.
          • NM-RM communication - if at all it makes sense to add it here. Would be limited to #nodes on the cluster. (Similarly for DN-NN communication)
          • Task-AM communication (MR as an example) - this definitely has the potential to blow up, since the # of containers on a cluster can be large, and multiple AMs could be running on the same node.

          The approach doesn't force all protocols / all methods in a protocol to make use of this, and only selective ones could be setup to use this.
          It does, however, open up the option to design protocols without the #handler limit, where applicable - and individual components would need to exercise this wisely.

          Heartbeats are fundamental to how YARN works. If we move to an offer-based system like Mesos, it seems like the way to do it would be to have the resource manager make outgoing connections to the executors, rather than keeping TCP sessions open a long time.

          I don't think we need to make a drastic change to protocol like moving to the offer model used in Mesos.
          The request-response model used by YARN can be made faster though. Rightnow, heartbeats are serving multiple purposes. Taking the AM-RM example, the allocate request serves as 1) a keep-alive - this isn't required every second, and 2) request-response for container requests - the default 1 second heartbeat is to avoid delaying getting allocations for a previous request.
          Full two-way communication would be ideal - where the RM opens connections when required, but is a much larger change - and I doubt it'll be possible to make this in a compatible manner, if we want to go down that route.

          For individual components to use this - we could even consider adding new methods to make use of this mechanism, and leave the old ones intact - with the option to use either mechanism.

          Show
          sseth Siddharth Seth added a comment - Thanks for the feedback Colin P. McCabe . Are you proposing to keep the TCP session open, but reuse the handler thread for something else, while the RPC is progressing? It seems like this approach would alleviate the handler thread bottleneck, but potentially run into another bottleneck by keeping open a lot of open TCP sockets (file descriptors). Given that we've had problems with hitting max file descriptors in the past, couldn't this be problematic? Yes, the intent is to keep the TPC session open and re-use the handlers (decided by individual protocol). That's a great point about FDs remaining open. Depending on where this mechanism is used, I believe this could be controlled. Examples. AM-RM communication - one connection per AM running on a cluster. This is typically limited to < #nodes on the cluster. NM-RM communication - if at all it makes sense to add it here. Would be limited to #nodes on the cluster. (Similarly for DN-NN communication) Task-AM communication (MR as an example) - this definitely has the potential to blow up, since the # of containers on a cluster can be large, and multiple AMs could be running on the same node. The approach doesn't force all protocols / all methods in a protocol to make use of this, and only selective ones could be setup to use this. It does, however, open up the option to design protocols without the #handler limit, where applicable - and individual components would need to exercise this wisely. Heartbeats are fundamental to how YARN works. If we move to an offer-based system like Mesos, it seems like the way to do it would be to have the resource manager make outgoing connections to the executors, rather than keeping TCP sessions open a long time. I don't think we need to make a drastic change to protocol like moving to the offer model used in Mesos. The request-response model used by YARN can be made faster though. Rightnow, heartbeats are serving multiple purposes. Taking the AM-RM example, the allocate request serves as 1) a keep-alive - this isn't required every second, and 2) request-response for container requests - the default 1 second heartbeat is to avoid delaying getting allocations for a previous request. Full two-way communication would be ideal - where the RM opens connections when required, but is a much larger change - and I doubt it'll be possible to make this in a compatible manner, if we want to go down that route. For individual components to use this - we could even consider adding new methods to make use of this mechanism, and leave the old ones intact - with the option to use either mechanism.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Hi Siddharth Seth, can you explain the approach you're planning to take in more detail? Are you proposing to keep the TCP session open, but reuse the handler thread for something else, while the RPC is progressing? It seems like this approach would alleviate the handler thread bottleneck, but potentially run into another bottleneck by keeping open a lot of open TCP sockets (file descriptors). Given that we've had problems with hitting max file descriptors in the past, couldn't this be problematic?

          Heartbeats are fundamental to how YARN works. If we move to an offer-based system like Mesos, it seems like the way to do it would be to have the resource manager make outgoing connections to the executors, rather than keeping TCP sessions open a long time.

          Show
          cmccabe Colin P. McCabe added a comment - Hi Siddharth Seth , can you explain the approach you're planning to take in more detail? Are you proposing to keep the TCP session open, but reuse the handler thread for something else, while the RPC is progressing? It seems like this approach would alleviate the handler thread bottleneck, but potentially run into another bottleneck by keeping open a lot of open TCP sockets (file descriptors). Given that we've had problems with hitting max file descriptors in the past, couldn't this be problematic? Heartbeats are fundamental to how YARN works. If we move to an offer-based system like Mesos, it seems like the way to do it would be to have the resource manager make outgoing connections to the executors, rather than keeping TCP sessions open a long time.
          Hide
          sseth Siddharth Seth added a comment -

          Work in progress patch to get input on the approach.

          Trying to make this as unobtrusive as possible so that APIs do not break. Ofcourse, changing YARN, HDFS, etc to use something like this will require a fair amount of attention to ensure compatibility.

          Individual methods to decide whether they want to perform the operation within the handler thread, or send the response back at a later point. Borrows elements from HADOOP-10300.

          There's several bits pending - like error handling.
          Also, this ends up sending the response back on the application thread which invokes the reply - which may not work for models like YARN where the central dispatcher would end up blocking. Ideally, we should be able to use the handler threads to send these responses. Alternately, with the current patch - a threadpool for sending responses could be setup.
          Additional considerations - number of outstanding requests, and whether they need to be limited at the RPC layer.

          Show
          sseth Siddharth Seth added a comment - Work in progress patch to get input on the approach. Trying to make this as unobtrusive as possible so that APIs do not break. Ofcourse, changing YARN, HDFS, etc to use something like this will require a fair amount of attention to ensure compatibility. Individual methods to decide whether they want to perform the operation within the handler thread, or send the response back at a later point. Borrows elements from HADOOP-10300 . There's several bits pending - like error handling. Also, this ends up sending the response back on the application thread which invokes the reply - which may not work for models like YARN where the central dispatcher would end up blocking. Ideally, we should be able to use the handler threads to send these responses. Alternately, with the current patch - a threadpool for sending responses could be setup. Additional considerations - number of outstanding requests, and whether they need to be limited at the RPC layer.

            People

            • Assignee:
              sseth Siddharth Seth
              Reporter:
              sseth Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              35 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development