Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      On failover, a series of exception stack shown in the log, which is harmless, but confusing to user.

        Activity

        Hide
        jianhe Jian He added a comment -

        A couple of messages are changed to debug level, as the caller will eventually log when retry ends.
        Added few logs in RequestHedgingRMFailoverProxyProvider
        RetryInvocationHandler is also changed to not print the stack if at retrying

        Show
        jianhe Jian He added a comment - A couple of messages are changed to debug level, as the caller will eventually log when retry ends. Added few logs in RequestHedgingRMFailoverProxyProvider RetryInvocationHandler is also changed to not print the stack if at retrying
        Hide
        jianhe Jian He added a comment -

        Sample log for ConfiguredRMFailoverProxyProvider after the patch:

        17/02/03 21:45:18 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/
        17/02/03 21:45:18 INFO client.AHSProxy: Connecting to Application History server at host/172.22.126.225:10200
        17/02/03 21:45:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
        17/02/03 21:45:19 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 1 failover attempts. Trying to failover after sleeping for 24348ms.
        17/02/03 21:45:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
        17/02/03 21:45:44 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 2 failover attempts. Trying to failover after sleeping for 20126ms.
        17/02/03 21:46:04 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
        17/02/03 21:46:04 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 3 failover attempts. Trying to failover after sleeping for 44768ms.
        17/02/03 21:46:48 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
        17/02/03 21:46:48 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 4 failover attempts. Trying to failover after sleeping for 20670ms.
        17/02/03 21:47:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
        17/02/03 21:47:09 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 5 failover attempts. Trying to failover after sleeping for 42523ms.
        17/02/03 21:47:52 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1
        17/02/03 21:47:52 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 6 failover attempts. Trying to failover after sleeping for 16803ms.
        
        Show
        jianhe Jian He added a comment - Sample log for ConfiguredRMFailoverProxyProvider after the patch: 17/02/03 21:45:18 INFO impl.TimelineClientImpl: Timeline service address: http: //host:8188/ws/v1/timeline/ 17/02/03 21:45:18 INFO client.AHSProxy: Connecting to Application History server at host/172.22.126.225:10200 17/02/03 21:45:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 17/02/03 21:45:19 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 1 failover attempts. Trying to failover after sleeping for 24348ms. 17/02/03 21:45:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 17/02/03 21:45:44 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 2 failover attempts. Trying to failover after sleeping for 20126ms. 17/02/03 21:46:04 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 17/02/03 21:46:04 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 3 failover attempts. Trying to failover after sleeping for 44768ms. 17/02/03 21:46:48 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 17/02/03 21:46:48 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 4 failover attempts. Trying to failover after sleeping for 20670ms. 17/02/03 21:47:09 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 17/02/03 21:47:09 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm2 after 5 failover attempts. Trying to failover after sleeping for 42523ms. 17/02/03 21:47:52 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm1 17/02/03 21:47:52 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host/172.22.126.229 to host:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http: //wiki.apache.org/hadoop/ConnectionRefused while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over rm1 after 6 failover attempts. Trying to failover after sleeping for 16803ms.
        Hide
        jianhe Jian He added a comment -

        sample log for RequestHedgingRMFailoverProxyProvider after the patch

        17/02/03 22:34:26 INFO impl.TimelineClientImpl: Timeline service address: http://host:8188/ws/v1/timeline/
        17/02/03 22:34:26 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2]
        17/02/03 22:34:26 INFO client.AHSProxy: Connecting to Application History server at host/172.22.126.225:10200
        17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
        17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM on [rm2]
        17/02/03 22:34:28 INFO mapreduce.JobSubmitter: number of splits:1
        17/02/03 22:34:29 INFO impl.YarnClientImpl: Submitted application application_1486160572621_0002
        
        Show
        jianhe Jian He added a comment - sample log for RequestHedgingRMFailoverProxyProvider after the patch 17/02/03 22:34:26 INFO impl.TimelineClientImpl: Timeline service address: http: //host:8188/ws/v1/timeline/ 17/02/03 22:34:26 INFO client.RequestHedgingRMFailoverProxyProvider: Created wrapped proxy for [rm1, rm2] 17/02/03 22:34:26 INFO client.AHSProxy: Connecting to Application History server at host/172.22.126.225:10200 17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]... 17/02/03 22:34:27 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM on [rm2] 17/02/03 22:34:28 INFO mapreduce.JobSubmitter: number of splits:1 17/02/03 22:34:29 INFO impl.YarnClientImpl: Submitted application application_1486160572621_0002
        Hide
        djp Junping Du added a comment -

        Patch looks reasonable. Previous warn message for connection failure could be unnecessary given we have other layer of retry on top of RPC.
        +1. Will commit it tomorrow if no further comments from others.

        Show
        djp Junping Du added a comment - Patch looks reasonable. Previous warn message for connection failure could be unnecessary given we have other layer of retry on top of RPC. +1. Will commit it tomorrow if no further comments from others.
        Hide
        templedf Daniel Templeton added a comment - - edited

        I would be nice to move the + for concatenation to the end of the line instead of the beginning of the next line, just for consistency. Otherwise, looks good.

        Nevermind. Long day. LGTM. +1

        Show
        templedf Daniel Templeton added a comment - - edited I would be nice to move the + for concatenation to the end of the line instead of the beginning of the next line, just for consistency. Otherwise, looks good. Nevermind. Long day. LGTM. +1
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        0 mvndep 1m 56s Maven dependency ordering for branch
        +1 mvninstall 13m 42s trunk passed
        +1 compile 14m 27s trunk passed
        +1 checkstyle 1m 39s trunk passed
        +1 mvnsite 1m 37s trunk passed
        +1 mvneclipse 0m 35s trunk passed
        +1 findbugs 2m 40s trunk passed
        +1 javadoc 1m 26s trunk passed
        0 mvndep 0m 16s Maven dependency ordering for patch
        +1 mvninstall 1m 11s the patch passed
        +1 compile 11m 53s the patch passed
        +1 javac 11m 53s the patch passed
        -0 checkstyle 1m 37s root: The patch generated 1 new + 108 unchanged - 0 fixed = 109 total (was 108)
        +1 mvnsite 1m 41s the patch passed
        +1 mvneclipse 0m 43s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 2m 56s the patch passed
        +1 javadoc 1m 31s the patch passed
        -1 unit 13m 41s hadoop-common in the patch failed.
        +1 unit 3m 16s hadoop-yarn-common in the patch passed.
        +1 asflicense 0m 43s The patch does not generate ASF License warnings.
        102m 22s



        Reason Tests
        Failed junit tests hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue YARN-6145
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850906/YARN-6145.1.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 4b6f82c08ced 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / deb368b
        Default Java 1.8.0_121
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14851/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/14851/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14851/testReport/
        modules C: hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: .
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/14851/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. 0 mvndep 1m 56s Maven dependency ordering for branch +1 mvninstall 13m 42s trunk passed +1 compile 14m 27s trunk passed +1 checkstyle 1m 39s trunk passed +1 mvnsite 1m 37s trunk passed +1 mvneclipse 0m 35s trunk passed +1 findbugs 2m 40s trunk passed +1 javadoc 1m 26s trunk passed 0 mvndep 0m 16s Maven dependency ordering for patch +1 mvninstall 1m 11s the patch passed +1 compile 11m 53s the patch passed +1 javac 11m 53s the patch passed -0 checkstyle 1m 37s root: The patch generated 1 new + 108 unchanged - 0 fixed = 109 total (was 108) +1 mvnsite 1m 41s the patch passed +1 mvneclipse 0m 43s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 56s the patch passed +1 javadoc 1m 31s the patch passed -1 unit 13m 41s hadoop-common in the patch failed. +1 unit 3m 16s hadoop-yarn-common in the patch passed. +1 asflicense 0m 43s The patch does not generate ASF License warnings. 102m 22s Reason Tests Failed junit tests hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6145 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12850906/YARN-6145.1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4b6f82c08ced 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / deb368b Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/14851/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/14851/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/14851/testReport/ modules C: hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: . Console output https://builds.apache.org/job/PreCommit-YARN-Build/14851/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        djp Junping Du added a comment -

        I have commit the patch to trunk and branch-2. Thanks Jian He for patch contribution and Daniel Templeton for review!

        Show
        djp Junping Du added a comment - I have commit the patch to trunk and branch-2. Thanks Jian He for patch contribution and Daniel Templeton for review!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11222 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11222/)
        YARN-6145. Improve log message on fail over. Contributed by Jian He. (junping_du: rev eec52e158b7bc14b2d3d53512323ba05e15e09e3)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RequestHedgingRMFailoverProxyProvider.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
        • (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11222 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11222/ ) YARN-6145 . Improve log message on fail over. Contributed by Jian He. (junping_du: rev eec52e158b7bc14b2d3d53512323ba05e15e09e3) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RequestHedgingRMFailoverProxyProvider.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java

          People

          • Assignee:
            jianhe Jian He
            Reporter:
            jianhe Jian He
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development