Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5677

RM should transition to standby when connection is lost for an extended period

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In trunk, there is no maximum number of retries that I see. It appears the connection will be retried forever, with the active never figuring out it's no longer active. In my testing, the active-active state lasted almost 2 hours with no sign of stopping before I killed it. The solution appears to be to cap the number of retries or amount of time spent retrying.

      This issue is significant because of the asynchronous nature of job submission. If the active doesn't know it's not active, it will buffer up job submissions until it finally realizes it has become the standby. Then it will fail all the job submissions in bulk. In high-volume workflows, that behavior can create huge mass job failures.

      This issue is also important because the node managers will not fail over to the new active until the old active realizes it's the standby. Workloads submitted after the old active loses contact with ZK will therefore fail to be executed regardless of which RM the clients contact.

      1. YARN-5677.001.patch
        4 kB
        Daniel Templeton
      2. YARN-5677.002.patch
        4 kB
        Daniel Templeton
      3. YARN-5677.003.patch
        10 kB
        Daniel Templeton
      4. YARN-5677.004.patch
        12 kB
        Daniel Templeton
      5. YARN-5677.005.patch
        12 kB
        Daniel Templeton
      6. YARN-5677.branch-2.001.patch
        12 kB
        Daniel Templeton

        Activity

        Hide
        jianhe Jian He added a comment -

        when the active RM loses contact with the ZK node

        Definitely, the leader elector in RM should not keep retrying in this case. I don't remember this issue is fixed or not (cc Xuan Gong). IIUC, the leader elector in this case should notice that the connection is lost, and signal RM to transition to standby. Are you testing with the latest code in trunk?

        Show
        jianhe Jian He added a comment - when the active RM loses contact with the ZK node Definitely, the leader elector in RM should not keep retrying in this case. I don't remember this issue is fixed or not (cc Xuan Gong ). IIUC, the leader elector in this case should notice that the connection is lost, and signal RM to transition to standby. Are you testing with the latest code in trunk?
        Hide
        templedf Daniel Templeton added a comment -

        I was testing with 2.9.0 since that's what I had lying about. I'll build the latest trunk version and give it another go to be sure.

        Show
        templedf Daniel Templeton added a comment - I was testing with 2.9.0 since that's what I had lying about. I'll build the latest trunk version and give it another go to be sure.
        Hide
        templedf Daniel Templeton added a comment -

        Test is running now with latest trunk build. Note to self: ZK connection failure started at 11:45.

        Show
        templedf Daniel Templeton added a comment - Test is running now with latest trunk build. Note to self: ZK connection failure started at 11:45.
        Hide
        templedf Daniel Templeton added a comment -

        The issue is still present in trunk. My cluster was in active-active state for almost 4 hours before I shut it down.

        Show
        templedf Daniel Templeton added a comment - The issue is still present in trunk. My cluster was in active-active state for almost 4 hours before I shut it down.
        Hide
        subru Subru Krishnan added a comment -

        Daniel Templeton/Jian He, this looks like the root cause of what we faced in the client/NM side where they were continuing to connect to original active RM instance and not discovering the new active RM as reported in YARN-5119?

        FYI we worked-around this issue by setting client-side RPC timeouts which was introduced in HADOOP-11252.

        Show
        subru Subru Krishnan added a comment - Daniel Templeton / Jian He , this looks like the root cause of what we faced in the client/NM side where they were continuing to connect to original active RM instance and not discovering the new active RM as reported in YARN-5119 ? FYI we worked-around this issue by setting client-side RPC timeouts which was introduced in HADOOP-11252 .
        Hide
        templedf Daniel Templeton added a comment -

        I can see that the RM tries to do the right thing:

            CuratorFramework client =  CuratorFrameworkFactory.builder()
                .connectString(zkHostPort)
                .sessionTimeoutMs(zkSessionTimeout)
                .retryPolicy(new RetryNTimes(numRetries, zkRetryInterval))
                .authorization(authInfos).build();

        I'll have to do some digging to see why that's not cutting it.

        Show
        templedf Daniel Templeton added a comment - I can see that the RM tries to do the right thing: CuratorFramework client = CuratorFrameworkFactory.builder() .connectString(zkHostPort) .sessionTimeoutMs(zkSessionTimeout) .retryPolicy( new RetryNTimes(numRetries, zkRetryInterval)) .authorization(authInfos).build(); I'll have to do some digging to see why that's not cutting it.
        Hide
        jianhe Jian He added a comment -

        Daniel Templeton, this curator based leader election code is new in 2.9, and it's not used by default, unless you enable it by setting yarn.resourcemanager.ha.curator-leader-elector.enabled to true.
        If you don't set it, it will use the hadoop-common ActiveStandbyElector and we did see similar issues in the past with ActiveStandbyElector too.
        Which one are you testing with ? It you are testing with the hadoop-common one, it'll be good if you can test the curator based implementation too..

        Show
        jianhe Jian He added a comment - Daniel Templeton , this curator based leader election code is new in 2.9, and it's not used by default, unless you enable it by setting yarn.resourcemanager.ha.curator-leader-elector.enabled to true. If you don't set it, it will use the hadoop-common ActiveStandbyElector and we did see similar issues in the past with ActiveStandbyElector too. Which one are you testing with ? It you are testing with the hadoop-common one, it'll be good if you can test the curator based implementation too..
        Hide
        templedf Daniel Templeton added a comment -

        Just tested the leader election (with the right property enabled), and it works as advertised. The active becomes standby by the time the standby becomes active.

        Show
        templedf Daniel Templeton added a comment - Just tested the leader election (with the right property enabled), and it works as advertised. The active becomes standby by the time the standby becomes active.
        Hide
        templedf Daniel Templeton added a comment -

        Jian He, is leader election on by default in Hadoop 3? I'd recommend it. Shall I roll that into the patch for this JIRA?

        Show
        templedf Daniel Templeton added a comment - Jian He , is leader election on by default in Hadoop 3? I'd recommend it. Shall I roll that into the patch for this JIRA?
        Hide
        templedf Daniel Templeton added a comment -

        This patch fixes the issue in trunk. I opted to be conservative and wait out the ZK session timeout rather than failing over immediately. The delay extends the period of time that the cluster is in active-active, but it hopefully reduces jitter in the face of minor network disturbances.

        I'll need to post an entirely different fix for branch-2.7. Should I open a second JIRA for that?

        Show
        templedf Daniel Templeton added a comment - This patch fixes the issue in trunk. I opted to be conservative and wait out the ZK session timeout rather than failing over immediately. The delay extends the period of time that the cluster is in active-active, but it hopefully reduces jitter in the face of minor network disturbances. I'll need to post an entirely different fix for branch-2.7. Should I open a second JIRA for that?
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 7m 15s trunk passed
        +1 compile 0m 32s trunk passed
        +1 checkstyle 0m 20s trunk passed
        +1 mvnsite 0m 38s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 0m 56s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 30s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvnsite 0m 35s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 3s the patch passed
        +1 javadoc 0m 17s the patch passed
        +1 unit 33m 50s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        48m 41s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830814/YARN-5677.001.patch
        JIRA Issue YARN-5677
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 378e0f79203f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 47f8092
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13245/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13245/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 15s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 38s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 3s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 33m 50s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 48m 41s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12830814/YARN-5677.001.patch JIRA Issue YARN-5677 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 378e0f79203f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 47f8092 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13245/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13245/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jianhe Jian He added a comment -

        Just tested the leader election (with the right property enabled), and it works as advertised.

        Sorry, didn't get it.. could you clarify what config you set ? which LeaderElection class are you testing ?

        Show
        jianhe Jian He added a comment - Just tested the leader election (with the right property enabled), and it works as advertised. Sorry, didn't get it.. could you clarify what config you set ? which LeaderElection class are you testing ?
        Hide
        templedf Daniel Templeton added a comment -

        curator=false, embedded=false => completely broken
        curator=false, embedded=true => allows indefinite active-active state
        curator=true, embedded=* => works correctly

        Show
        templedf Daniel Templeton added a comment - curator=false, embedded=false => completely broken curator=false, embedded=true => allows indefinite active-active state curator=true, embedded=* => works correctly
        Hide
        templedf Daniel Templeton added a comment -

        I had initially though that branch-2.7 was different because I was seeing different errors. Turns out this patch apply for trunk as well as branch-2.7. This patch resolves the issue in trunk. Resolving the issue in branch-2.7 also requires YARN-5694.

        Show
        templedf Daniel Templeton added a comment - I had initially though that branch-2.7 was different because I was seeing different errors. Turns out this patch apply for trunk as well as branch-2.7. This patch resolves the issue in trunk. Resolving the issue in branch-2.7 also requires YARN-5694 .
        Hide
        kasha Karthik Kambatla added a comment -

        Meaningful implementation for enterNeutralMode makes a lot of sense. Sorry for not filing a JIRA for the TODO I added years ago.

        The patch here makes sense. My one concern is with letting the outstanding task run even after canceling the timer, especially when canceled as part of becomeActive.

        Daniel Templeton - in an offline conversation, you mentioned running into issues with the VerifyActiveStatusThread being stuck on transition to standby. Is the plan to fix that too in this JIRA? Or, to take care of it as a follow-up?

        Show
        kasha Karthik Kambatla added a comment - Meaningful implementation for enterNeutralMode makes a lot of sense. Sorry for not filing a JIRA for the TODO I added years ago. The patch here makes sense. My one concern is with letting the outstanding task run even after canceling the timer, especially when canceled as part of becomeActive. Daniel Templeton - in an offline conversation, you mentioned running into issues with the VerifyActiveStatusThread being stuck on transition to standby. Is the plan to fix that too in this JIRA? Or, to take care of it as a follow-up?
        Hide
        templedf Daniel Templeton added a comment -

        This patch addresses the race. I was not planning to tackle the ZKRMStateStore.VerifyActiveStatusThread issues in this patch. Let's work out the right thing to do on YARN-5694 and resolve it there.

        Show
        templedf Daniel Templeton added a comment - This patch addresses the race. I was not planning to tackle the ZKRMStateStore.VerifyActiveStatusThread issues in this patch. Let's work out the right thing to do on YARN-5694 and resolve it there.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 14s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 8m 18s trunk passed
        +1 compile 0m 36s trunk passed
        +1 checkstyle 0m 21s trunk passed
        +1 mvnsite 0m 39s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 0m 58s trunk passed
        +1 javadoc 0m 21s trunk passed
        +1 mvninstall 0m 33s the patch passed
        +1 compile 0m 32s the patch passed
        +1 javac 0m 32s the patch passed
        +1 checkstyle 0m 18s the patch passed
        +1 mvnsite 0m 36s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 5s the patch passed
        +1 javadoc 0m 19s the patch passed
        +1 unit 36m 44s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        53m 5s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831812/YARN-5677.002.patch
        JIRA Issue YARN-5677
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 76069434e8e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 00160f7
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13297/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13297/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 18s trunk passed +1 compile 0m 36s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 39s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 0m 58s trunk passed +1 javadoc 0m 21s trunk passed +1 mvninstall 0m 33s the patch passed +1 compile 0m 32s the patch passed +1 javac 0m 32s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 36s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 5s the patch passed +1 javadoc 0m 19s the patch passed +1 unit 36m 44s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 53m 5s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831812/YARN-5677.002.patch JIRA Issue YARN-5677 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 76069434e8e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 00160f7 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13297/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13297/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        templedf Daniel Templeton added a comment -

        Here's a patch that adds tests.

        Show
        templedf Daniel Templeton added a comment - Here's a patch that adds tests.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 21s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 23s trunk passed
        +1 compile 0m 34s trunk passed
        +1 checkstyle 0m 21s trunk passed
        +1 mvnsite 0m 41s trunk passed
        +1 mvneclipse 0m 19s trunk passed
        +1 findbugs 1m 0s trunk passed
        +1 javadoc 0m 24s trunk passed
        +1 mvninstall 0m 34s the patch passed
        +1 compile 0m 32s the patch passed
        +1 javac 0m 32s the patch passed
        +1 checkstyle 0m 19s the patch passed
        +1 mvnsite 0m 37s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 4s the patch passed
        +1 javadoc 0m 17s the patch passed
        +1 unit 40m 16s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 25s The patch does not generate ASF License warnings.
        56m 3s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831836/YARN-5677.003.patch
        JIRA Issue YARN-5677
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 2d8435242594 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / e68c7b9
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13300/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13300/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 23s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 41s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 1m 0s trunk passed +1 javadoc 0m 24s trunk passed +1 mvninstall 0m 34s the patch passed +1 compile 0m 32s the patch passed +1 javac 0m 32s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 0m 37s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 4s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 40m 16s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 25s The patch does not generate ASF License warnings. 56m 3s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831836/YARN-5677.003.patch JIRA Issue YARN-5677 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2d8435242594 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e68c7b9 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13300/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13300/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        kasha Karthik Kambatla added a comment -

        The patch looks pretty good. Nice tests. Couple of comments on the tests though:

        1. Should the sleep be larger than the timeout? May be, 100 ms?
        2. Would it make sense to abstract out the common parts of the test?
        Show
        kasha Karthik Kambatla added a comment - The patch looks pretty good. Nice tests. Couple of comments on the tests though: Should the sleep be larger than the timeout? May be, 100 ms? Would it make sense to abstract out the common parts of the test?
        Hide
        templedf Daniel Templeton added a comment -

        Patch to address comments.

        Show
        templedf Daniel Templeton added a comment - Patch to address comments.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 40s trunk passed
        +1 compile 0m 30s trunk passed
        +1 checkstyle 0m 20s trunk passed
        +1 mvnsite 0m 37s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 0m 56s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 30s the patch passed
        +1 compile 0m 29s the patch passed
        +1 javac 0m 29s the patch passed
        -1 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8)
        +1 mvnsite 0m 35s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 0s the patch passed
        +1 javadoc 0m 17s the patch passed
        +1 unit 38m 50s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        53m 4s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831989/YARN-5677.004.patch
        JIRA Issue YARN-5677
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 248fd8aec8ba 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 2cc841f
        Default Java 1.8.0_101
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13310/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13310/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13310/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 30s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 56s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 30s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed -1 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8) +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 38m 50s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 53m 4s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831989/YARN-5677.004.patch JIRA Issue YARN-5677 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 248fd8aec8ba 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2cc841f Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13310/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13310/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13310/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        templedf Daniel Templeton added a comment -

        And here's a quick update to address the checkstyle complaint.

        Show
        templedf Daniel Templeton added a comment - And here's a quick update to address the checkstyle complaint.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 39s trunk passed
        +1 compile 0m 31s trunk passed
        +1 checkstyle 0m 20s trunk passed
        +1 mvnsite 0m 37s trunk passed
        +1 mvneclipse 0m 16s trunk passed
        +1 findbugs 0m 57s trunk passed
        +1 javadoc 0m 20s trunk passed
        +1 mvninstall 0m 31s the patch passed
        +1 compile 0m 28s the patch passed
        +1 javac 0m 28s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvnsite 0m 34s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 1s the patch passed
        +1 javadoc 0m 18s the patch passed
        +1 unit 38m 31s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 15s The patch does not generate ASF License warnings.
        52m 41s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12832021/YARN-5677.005.patch
        JIRA Issue YARN-5677
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 44b22861ef6e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 72a2ae6
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13312/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13312/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 39s trunk passed +1 compile 0m 31s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 37s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 57s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 28s the patch passed +1 javac 0m 28s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 34s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 1s the patch passed +1 javadoc 0m 18s the patch passed +1 unit 38m 31s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 52m 41s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12832021/YARN-5677.005.patch JIRA Issue YARN-5677 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 44b22861ef6e 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 72a2ae6 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13312/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13312/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        kasha Karthik Kambatla added a comment -

        +1. Checking this in.

        Show
        kasha Karthik Kambatla added a comment - +1. Checking this in.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10596 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10596/)
        YARN-5677. RM should transition to standby when connection is lost for (kasha: rev 6476934ae5de1be7988ab198b673d82fe0f006e3)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10596 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10596/ ) YARN-5677 . RM should transition to standby when connection is lost for (kasha: rev 6476934ae5de1be7988ab198b673d82fe0f006e3) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/EmbeddedElectorService.java
        Hide
        kasha Karthik Kambatla added a comment -

        Committed this to trunk.

        The patch does not compile with branch-2. Looks like some type issues with any() in tests. Daniel Templeton - can you post a branch-2 patch as well?

        Show
        kasha Karthik Kambatla added a comment - Committed this to trunk. The patch does not compile with branch-2. Looks like some type issues with any() in tests. Daniel Templeton - can you post a branch-2 patch as well?
        Hide
        templedf Daniel Templeton added a comment -

        Here's a branch-2 patch that adds explicit casts to get around the issue.

        Show
        templedf Daniel Templeton added a comment - Here's a branch-2 patch that adds explicit casts to get around the issue.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 19s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 7m 15s branch-2 passed
        +1 compile 0m 29s branch-2 passed with JDK v1.8.0_101
        +1 compile 0m 32s branch-2 passed with JDK v1.7.0_111
        +1 checkstyle 0m 23s branch-2 passed
        +1 mvnsite 0m 37s branch-2 passed
        +1 mvneclipse 0m 16s branch-2 passed
        +1 findbugs 1m 10s branch-2 passed
        +1 javadoc 0m 21s branch-2 passed with JDK v1.8.0_101
        +1 javadoc 0m 24s branch-2 passed with JDK v1.7.0_111
        +1 mvninstall 0m 32s the patch passed
        +1 compile 0m 28s the patch passed with JDK v1.8.0_101
        +1 javac 0m 28s the patch passed
        +1 compile 0m 29s the patch passed with JDK v1.7.0_111
        +1 javac 0m 29s the patch passed
        +1 checkstyle 0m 20s the patch passed
        +1 mvnsite 0m 38s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 25s the patch passed
        +1 javadoc 0m 19s the patch passed with JDK v1.8.0_101
        +1 javadoc 0m 22s the patch passed with JDK v1.7.0_111
        +1 unit 39m 6s hadoop-yarn-server-resourcemanager in the patch passed with JDK v1.8.0_101.
        +1 unit 40m 42s hadoop-yarn-server-resourcemanager in the patch passed with JDK v1.7.0_111.
        +1 asflicense 0m 18s The patch does not generate ASF License warnings.
        97m 42s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:b59b8b7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833765/YARN-5677.branch-2.001.patch
        JIRA Issue YARN-5677
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 6eb46e7fe9be 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision branch-2 / 7993fb5
        Default Java 1.7.0_111
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
        findbugs v3.0.0
        JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13409/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13409/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 15s branch-2 passed +1 compile 0m 29s branch-2 passed with JDK v1.8.0_101 +1 compile 0m 32s branch-2 passed with JDK v1.7.0_111 +1 checkstyle 0m 23s branch-2 passed +1 mvnsite 0m 37s branch-2 passed +1 mvneclipse 0m 16s branch-2 passed +1 findbugs 1m 10s branch-2 passed +1 javadoc 0m 21s branch-2 passed with JDK v1.8.0_101 +1 javadoc 0m 24s branch-2 passed with JDK v1.7.0_111 +1 mvninstall 0m 32s the patch passed +1 compile 0m 28s the patch passed with JDK v1.8.0_101 +1 javac 0m 28s the patch passed +1 compile 0m 29s the patch passed with JDK v1.7.0_111 +1 javac 0m 29s the patch passed +1 checkstyle 0m 20s the patch passed +1 mvnsite 0m 38s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 25s the patch passed +1 javadoc 0m 19s the patch passed with JDK v1.8.0_101 +1 javadoc 0m 22s the patch passed with JDK v1.7.0_111 +1 unit 39m 6s hadoop-yarn-server-resourcemanager in the patch passed with JDK v1.8.0_101. +1 unit 40m 42s hadoop-yarn-server-resourcemanager in the patch passed with JDK v1.7.0_111. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 97m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:b59b8b7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833765/YARN-5677.branch-2.001.patch JIRA Issue YARN-5677 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 6eb46e7fe9be 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / 7993fb5 Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13409/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13409/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        kasha Karthik Kambatla added a comment -

        Thanks for the branch-2 patch as well, Daniel. Just committed this to branch-2 as well.

        Show
        kasha Karthik Kambatla added a comment - Thanks for the branch-2 patch as well, Daniel. Just committed this to branch-2 as well.
        Hide
        djp Junping Du added a comment -

        It sounds like this patch is pushed into branch-2.8. Adding it to fix version.

        Show
        djp Junping Du added a comment - It sounds like this patch is pushed into branch-2.8. Adding it to fix version.

          People

          • Assignee:
            templedf Daniel Templeton
            Reporter:
            templedf Daniel Templeton
          • Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development