Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7102

NM heartbeat stuck when responseId overflows MAX_INT

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      ResponseId overflow problem in NM-RM heartbeat. This is same as AM-RM heartbeat in YARN-6640, please refer to YARN-6640 for details.

      1. YARN-7102.v1.patch
        9 kB
        Botong Huang
      2. YARN-7102.v2.patch
        103 kB
        Botong Huang
      3. YARN-7102.v3.patch
        112 kB
        Botong Huang
      4. YARN-7102.v4.patch
        113 kB
        Botong Huang
      5. YARN-7102.v5.patch
        113 kB
        Botong Huang
      6. YARN-7102.v6.patch
        113 kB
        Botong Huang

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 41s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
              trunk Compile Tests
        +1 mvninstall 16m 21s trunk passed
        +1 compile 0m 35s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 36s trunk passed
        +1 findbugs 0m 59s trunk passed
        +1 javadoc 0m 22s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 34s the patch passed
        +1 compile 0m 31s the patch passed
        +1 javac 0m 31s the patch passed
        +1 checkstyle 0m 23s the patch passed
        +1 mvnsite 0m 34s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 6s the patch passed
        +1 javadoc 0m 18s the patch passed
              Other Tests
        -1 unit 54m 49s hadoop-yarn-server-resourcemanager in the patch failed.
        +1 asflicense 0m 14s The patch does not generate ASF License warnings.
        79m 47s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
          hadoop.yarn.server.resourcemanager.TestResourceTrackerService
          hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
          hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService
          hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
          hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
          hadoop.yarn.server.resourcemanager.TestRMRestart
          hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
          hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
          hadoop.yarn.server.resourcemanager.TestApplicationCleanup
        Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
          org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:71bbb86
        JIRA Issue YARN-7102
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884828/YARN-7102.v1.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux b53e2243b255 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 91cc070
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17246/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17246/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/17246/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 41s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.       trunk Compile Tests +1 mvninstall 16m 21s trunk passed +1 compile 0m 35s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 36s trunk passed +1 findbugs 0m 59s trunk passed +1 javadoc 0m 22s trunk passed       Patch Compile Tests +1 mvninstall 0m 34s the patch passed +1 compile 0m 31s the patch passed +1 javac 0m 31s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 34s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 6s the patch passed +1 javadoc 0m 18s the patch passed       Other Tests -1 unit 54m 49s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 14s The patch does not generate ASF License warnings. 79m 47s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter   hadoop.yarn.server.resourcemanager.TestResourceTrackerService   hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation   hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart   hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart   hadoop.yarn.server.resourcemanager.TestApplicationCleanup Timed out junit tests org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA   org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue YARN-7102 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12884828/YARN-7102.v1.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b53e2243b255 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 91cc070 Default Java 1.8.0_144 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-YARN-Build/17246/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17246/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/17246/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        botong Botong Huang added a comment -

        Some explanations since v2 patch is much bigger. This change revealed more flaky tests regarding MockNM heartbeats to RM. Every heartbeat triggers events dispatched in RM. Which needs draining for many cases. Furthermore, with this change enforcing more strict responseId check, now we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event.

        Instead of going through all the place where nm.nodeHeartbeat is called and add rm.drainEvent afterwards, I changed the MockNM api, and call drain inside the heartbeat method.

        For easy review, the real changes are in these four files: ResourceTrackerService, MockNM, MockRM and TestResourceTrackerService. All other file changes are simply because of api change in MockNM.

        Show
        botong Botong Huang added a comment - Some explanations since v2 patch is much bigger. This change revealed more flaky tests regarding MockNM heartbeats to RM. Every heartbeat triggers events dispatched in RM. Which needs draining for many cases. Furthermore, with this change enforcing more strict responseId check, now we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event. Instead of going through all the place where nm.nodeHeartbeat is called and add rm.drainEvent afterwards, I changed the MockNM api, and call drain inside the heartbeat method. For easy review, the real changes are in these four files: ResourceTrackerService , MockNM , MockRM and TestResourceTrackerService . All other file changes are simply because of api change in MockNM.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 26 new or modified test files.
              trunk Compile Tests
        0 mvndep 0m 15s Maven dependency ordering for branch
        +1 mvninstall 16m 58s trunk passed
        +1 compile 18m 46s trunk passed
        +1 checkstyle 2m 37s trunk passed
        +1 mvnsite 2m 51s trunk passed
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 2m 58s trunk passed
        +1 javadoc 1m 53s trunk passed
              Patch Compile Tests
        0 mvndep 0m 21s Maven dependency ordering for patch
        +1 mvninstall 2m 9s the patch passed
        +1 compile 14m 38s the patch passed
        +1 javac 14m 38s the patch passed
        -0 checkstyle 2m 55s root: The patch generated 3 new + 1161 unchanged - 4 fixed = 1164 total (was 1165)
        +1 mvnsite 2m 32s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 3m 1s the patch passed
        +1 javadoc 1m 49s the patch passed
              Other Tests
        -1 unit 44m 18s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 unit 4m 34s hadoop-yarn-server-tests in the patch failed.
        +1 unit 20m 46s hadoop-yarn-client in the patch passed.
        +1 unit 9m 19s hadoop-mapreduce-client-app in the patch passed.
        +1 asflicense 0m 43s The patch does not generate ASF License warnings.
        176m 14s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          hadoop.yarn.server.TestMiniYarnClusterNodeUtilization
          hadoop.yarn.server.TestContainerManagerSecurity



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:71bbb86
        JIRA Issue YARN-7102
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12885863/YARN-7102.v2.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 833a2954cfcb 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 13eda50
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/17336/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17336/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17336/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17336/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: .
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/17336/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 26 new or modified test files.       trunk Compile Tests 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 16m 58s trunk passed +1 compile 18m 46s trunk passed +1 checkstyle 2m 37s trunk passed +1 mvnsite 2m 51s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 2m 58s trunk passed +1 javadoc 1m 53s trunk passed       Patch Compile Tests 0 mvndep 0m 21s Maven dependency ordering for patch +1 mvninstall 2m 9s the patch passed +1 compile 14m 38s the patch passed +1 javac 14m 38s the patch passed -0 checkstyle 2m 55s root: The patch generated 3 new + 1161 unchanged - 4 fixed = 1164 total (was 1165) +1 mvnsite 2m 32s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 3m 1s the patch passed +1 javadoc 1m 49s the patch passed       Other Tests -1 unit 44m 18s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 4m 34s hadoop-yarn-server-tests in the patch failed. +1 unit 20m 46s hadoop-yarn-client in the patch passed. +1 unit 9m 19s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 43s The patch does not generate ASF License warnings. 176m 14s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization   hadoop.yarn.server.TestContainerManagerSecurity Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue YARN-7102 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12885863/YARN-7102.v2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 833a2954cfcb 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 13eda50 Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/17336/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17336/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17336/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17336/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: . Console output https://builds.apache.org/job/PreCommit-YARN-Build/17336/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        botong Botong Huang added a comment - - edited

        V3 updated, fix more unit test failures around MiniYarnCluster. Removing one unit test in TestMiniYarnClusterNodeUtilization because the other test consumes this one.

        Show
        botong Botong Huang added a comment - - edited V3 updated, fix more unit test failures around MiniYarnCluster . Removing one unit test in TestMiniYarnClusterNodeUtilization because the other test consumes this one.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 28 new or modified test files.
              trunk Compile Tests
        0 mvndep 0m 25s Maven dependency ordering for branch
        +1 mvninstall 17m 2s trunk passed
        +1 compile 16m 45s trunk passed
        +1 checkstyle 2m 33s trunk passed
        +1 mvnsite 2m 20s trunk passed
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 2m 47s trunk passed
        +1 javadoc 1m 32s trunk passed
              Patch Compile Tests
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 1m 59s the patch passed
        +1 compile 12m 33s the patch passed
        +1 javac 12m 33s the patch passed
        -0 checkstyle 2m 32s root: The patch generated 1 new + 1230 unchanged - 4 fixed = 1231 total (was 1234)
        +1 mvnsite 2m 34s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 3m 7s the patch passed
        +1 javadoc 1m 48s the patch passed
              Other Tests
        -1 unit 43m 55s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 unit 2m 30s hadoop-yarn-server-tests in the patch failed.
        +1 unit 20m 14s hadoop-yarn-client in the patch passed.
        +1 unit 8m 42s hadoop-mapreduce-client-app in the patch passed.
        +1 asflicense 0m 37s The patch does not generate ASF License warnings.
        166m 38s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
          hadoop.yarn.server.TestContainerManagerSecurity



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:71bbb86
        JIRA Issue YARN-7102
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886459/YARN-7102.v3.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0f965ee9de71 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / de9994b
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/17403/artifact/patchprocess/diff-checkstyle-root.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17403/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17403/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17403/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: .
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/17403/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 28 new or modified test files.       trunk Compile Tests 0 mvndep 0m 25s Maven dependency ordering for branch +1 mvninstall 17m 2s trunk passed +1 compile 16m 45s trunk passed +1 checkstyle 2m 33s trunk passed +1 mvnsite 2m 20s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 2m 47s trunk passed +1 javadoc 1m 32s trunk passed       Patch Compile Tests 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 1m 59s the patch passed +1 compile 12m 33s the patch passed +1 javac 12m 33s the patch passed -0 checkstyle 2m 32s root: The patch generated 1 new + 1230 unchanged - 4 fixed = 1231 total (was 1234) +1 mvnsite 2m 34s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 3m 7s the patch passed +1 javadoc 1m 48s the patch passed       Other Tests -1 unit 43m 55s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 2m 30s hadoop-yarn-server-tests in the patch failed. +1 unit 20m 14s hadoop-yarn-client in the patch passed. +1 unit 8m 42s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 37s The patch does not generate ASF License warnings. 166m 38s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation   hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler   hadoop.yarn.server.TestContainerManagerSecurity Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue YARN-7102 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886459/YARN-7102.v3.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0f965ee9de71 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / de9994b Default Java 1.8.0_144 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/17403/artifact/patchprocess/diff-checkstyle-root.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17403/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17403/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17403/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: . Console output https://builds.apache.org/job/PreCommit-YARN-Build/17403/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 6s YARN-7102 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue YARN-7102
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886971/YARN-7102.v4.patch
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/17441/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 6s YARN-7102 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue YARN-7102 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12886971/YARN-7102.v4.patch Console output https://builds.apache.org/job/PreCommit-YARN-Build/17441/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 28 new or modified test files.
              trunk Compile Tests
        0 mvndep 0m 18s Maven dependency ordering for branch
        +1 mvninstall 14m 47s trunk passed
        +1 compile 14m 58s trunk passed
        +1 checkstyle 2m 17s trunk passed
        +1 mvnsite 2m 12s trunk passed
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 2m 26s trunk passed
        +1 javadoc 1m 35s trunk passed
              Patch Compile Tests
        0 mvndep 0m 17s Maven dependency ordering for patch
        +1 mvninstall 1m 43s the patch passed
        +1 compile 10m 52s the patch passed
        +1 javac 10m 52s the patch passed
        +1 checkstyle 2m 23s root: The patch generated 0 new + 1233 unchanged - 5 fixed = 1233 total (was 1238)
        +1 mvnsite 2m 25s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 3m 3s the patch passed
        +1 javadoc 1m 50s the patch passed
              Other Tests
        -1 unit 46m 5s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 unit 2m 30s hadoop-yarn-server-tests in the patch failed.
        -1 unit 20m 35s hadoop-yarn-client in the patch failed.
        +1 unit 8m 59s hadoop-mapreduce-client-app in the patch passed.
        +1 asflicense 0m 35s The patch does not generate ASF License warnings.
        166m 19s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation
          hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          hadoop.yarn.server.resourcemanager.TestRMRestart
          hadoop.yarn.server.resourcemanager.scheduler.TestSchedulingWithAllocationRequestId
          hadoop.yarn.server.TestContainerManagerSecurity
          hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:71bbb86
        JIRA Issue YARN-7102
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12887015/YARN-7102.v5.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 5a4f6ce08328 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / e0b3c64
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17450/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17450/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17450/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17450/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: .
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/17450/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 28 new or modified test files.       trunk Compile Tests 0 mvndep 0m 18s Maven dependency ordering for branch +1 mvninstall 14m 47s trunk passed +1 compile 14m 58s trunk passed +1 checkstyle 2m 17s trunk passed +1 mvnsite 2m 12s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 2m 26s trunk passed +1 javadoc 1m 35s trunk passed       Patch Compile Tests 0 mvndep 0m 17s Maven dependency ordering for patch +1 mvninstall 1m 43s the patch passed +1 compile 10m 52s the patch passed +1 javac 10m 52s the patch passed +1 checkstyle 2m 23s root: The patch generated 0 new + 1233 unchanged - 5 fixed = 1233 total (was 1238) +1 mvnsite 2m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 3m 3s the patch passed +1 javadoc 1m 50s the patch passed       Other Tests -1 unit 46m 5s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 2m 30s hadoop-yarn-server-tests in the patch failed. -1 unit 20m 35s hadoop-yarn-client in the patch failed. +1 unit 8m 59s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 35s The patch does not generate ASF License warnings. 166m 19s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation   hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation   hadoop.yarn.server.resourcemanager.TestRMRestart   hadoop.yarn.server.resourcemanager.scheduler.TestSchedulingWithAllocationRequestId   hadoop.yarn.server.TestContainerManagerSecurity   hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue YARN-7102 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12887015/YARN-7102.v5.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5a4f6ce08328 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e0b3c64 Default Java 1.8.0_144 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-YARN-Build/17450/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17450/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17450/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17450/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: . Console output https://builds.apache.org/job/PreCommit-YARN-Build/17450/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 28 new or modified test files.
              trunk Compile Tests
        0 mvndep 0m 15s Maven dependency ordering for branch
        +1 mvninstall 13m 27s trunk passed
        +1 compile 14m 13s trunk passed
        +1 checkstyle 2m 17s trunk passed
        +1 mvnsite 2m 11s trunk passed
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 2m 28s trunk passed
        +1 javadoc 1m 37s trunk passed
              Patch Compile Tests
        0 mvndep 0m 18s Maven dependency ordering for patch
        +1 mvninstall 1m 47s the patch passed
        +1 compile 12m 24s the patch passed
        +1 javac 12m 24s the patch passed
        +1 checkstyle 2m 24s root: The patch generated 0 new + 1233 unchanged - 5 fixed = 1233 total (was 1238)
        +1 mvnsite 2m 29s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
        +1 findbugs 3m 18s the patch passed
        +1 javadoc 1m 44s the patch passed
              Other Tests
        -1 unit 45m 55s hadoop-yarn-server-resourcemanager in the patch failed.
        -1 unit 2m 32s hadoop-yarn-server-tests in the patch failed.
        -1 unit 18m 27s hadoop-yarn-client in the patch failed.
        +1 unit 10m 22s hadoop-mapreduce-client-app in the patch passed.
        +1 asflicense 0m 40s The patch does not generate ASF License warnings.
        164m 51s



        Reason Tests
        Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
          hadoop.yarn.server.TestContainerManagerSecurity
          hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest
        Timed out junit tests org.apache.hadoop.yarn.client.api.impl.TestAMRMClient



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:71bbb86
        JIRA Issue YARN-7102
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12887146/YARN-7102.v6.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 82384466b689 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 66ca0a6
        Default Java 1.8.0_144
        findbugs v3.1.0-RC1
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17456/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17456/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
        unit https://builds.apache.org/job/PreCommit-YARN-Build/17456/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17456/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: .
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/17456/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 28 new or modified test files.       trunk Compile Tests 0 mvndep 0m 15s Maven dependency ordering for branch +1 mvninstall 13m 27s trunk passed +1 compile 14m 13s trunk passed +1 checkstyle 2m 17s trunk passed +1 mvnsite 2m 11s trunk passed 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 2m 28s trunk passed +1 javadoc 1m 37s trunk passed       Patch Compile Tests 0 mvndep 0m 18s Maven dependency ordering for patch +1 mvninstall 1m 47s the patch passed +1 compile 12m 24s the patch passed +1 javac 12m 24s the patch passed +1 checkstyle 2m 24s root: The patch generated 0 new + 1233 unchanged - 5 fixed = 1233 total (was 1238) +1 mvnsite 2m 29s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. 0 findbugs 0m 0s Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests +1 findbugs 3m 18s the patch passed +1 javadoc 1m 44s the patch passed       Other Tests -1 unit 45m 55s hadoop-yarn-server-resourcemanager in the patch failed. -1 unit 2m 32s hadoop-yarn-server-tests in the patch failed. -1 unit 18m 27s hadoop-yarn-client in the patch failed. +1 unit 10m 22s hadoop-mapreduce-client-app in the patch passed. +1 asflicense 0m 40s The patch does not generate ASF License warnings. 164m 51s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation   hadoop.yarn.server.TestContainerManagerSecurity   hadoop.yarn.client.api.impl.TestAMRMClientContainerRequest Timed out junit tests org.apache.hadoop.yarn.client.api.impl.TestAMRMClient Subsystem Report/Notes Docker Image:yetus/hadoop:71bbb86 JIRA Issue YARN-7102 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12887146/YARN-7102.v6.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 82384466b689 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 66ca0a6 Default Java 1.8.0_144 findbugs v3.1.0-RC1 unit https://builds.apache.org/job/PreCommit-YARN-Build/17456/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17456/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/17456/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/17456/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app U: . Console output https://builds.apache.org/job/PreCommit-YARN-Build/17456/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        botong Botong Huang added a comment -

        After fighting through unit tests... in v6 patch:
        TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is already failing in trunk, YARN-7199 opened for it
        TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable is being tracked under YARN-7044
        I need help on TestContainerManagerSecurity.testContainerManager, it seems consistently failing in yetus, but I cannot repro locally at all.

        Tan, Wangda and Jason Lowe, can you please take a look? Some quick notes in summary:

        After a more strict responseId check in NM heartbeat, we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event.

        Instead of going through all the place where nm.nodeHeartbeat is called and add rm.drainEvent afterwards (some already have though), I changed the MockNM api, and drain RM events inside the heartbeat method.

        For easy review, the real changes are in these four files: ResourceTrackerService, MockNM, TestResourceTrackerService, MiniYarnCluster and TestMiniYarnClusterNodeUtilization (removed a test case because it is consumed/identical to the other one). All other file changes are simply because of api change in MockNM.

        Thanks in advance!

        Show
        botong Botong Huang added a comment - After fighting through unit tests... in v6 patch: TestAMRMClientContainerRequest.testOpportunisticAndGuaranteedRequests is already failing in trunk, YARN-7199 opened for it TestContainerAllocation.testAMContainerAllocationWhenDNSUnavailable is being tracked under YARN-7044 I need help on TestContainerManagerSecurity.testContainerManager , it seems consistently failing in yetus, but I cannot repro locally at all. Tan, Wangda and Jason Lowe , can you please take a look? Some quick notes in summary: After a more strict responseId check in NM heartbeat, we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event. Instead of going through all the place where nm.nodeHeartbeat is called and add rm.drainEvent afterwards (some already have though), I changed the MockNM api, and drain RM events inside the heartbeat method. For easy review, the real changes are in these four files: ResourceTrackerService, MockNM, TestResourceTrackerService, MiniYarnCluster and TestMiniYarnClusterNodeUtilization (removed a test case because it is consumed/identical to the other one). All other file changes are simply because of api change in MockNM . Thanks in advance!
        Hide
        jlowe Jason Lowe added a comment -

        Sorry for the delay.

        After a more strict responseId check in NM heartbeat, we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event.

        This worries me. The fact that we have to go update a ton of tests makes me think that we're susceptible to seeing incorrect behavior in a "real" cluster when the RM goes into a full GC cycle. If that GC cycle is long enough then I could see this change causing every nodemanager in the cluster to go through a reboot because the RM mistakenly believes the heartbeats are out of sync with the RM.

        IMHO the response ID needs to be handled inline rather than asynchronously – we should never return a response for the current heartbeat request until we are ready to receive the next heartbeat request. It sounds like that's not the case with this patch. I'm OK if we want to use the RMNode as the place where we store this bookkeeping information for each node, but I don't think the response ID handling should be completely asynchronous as it is today especially since this JIRA is going to clamp down on the allowed values.

        Show
        jlowe Jason Lowe added a comment - Sorry for the delay. After a more strict responseId check in NM heartbeat, we need to drain the RM dispatcher events after every MockNM heartbeat. Otherwise, two sequential MockNM heartbeat will fail on the second heartbeat, because RM is still processing the first heartbeat event. This worries me. The fact that we have to go update a ton of tests makes me think that we're susceptible to seeing incorrect behavior in a "real" cluster when the RM goes into a full GC cycle. If that GC cycle is long enough then I could see this change causing every nodemanager in the cluster to go through a reboot because the RM mistakenly believes the heartbeats are out of sync with the RM. IMHO the response ID needs to be handled inline rather than asynchronously – we should never return a response for the current heartbeat request until we are ready to receive the next heartbeat request. It sounds like that's not the case with this patch. I'm OK if we want to use the RMNode as the place where we store this bookkeeping information for each node, but I don't think the response ID handling should be completely asynchronous as it is today especially since this JIRA is going to clamp down on the allowed values.
        Hide
        botong Botong Huang added a comment - - edited

        Thanks Jason Lowe for the review and good point. We don't want RM to resync all nodes if RM becomes slow.

        How about we take one step back, allowing request.responseId > lastResponseId as it is now? We simply fix the overflow problem without changing anything else. Specifically, add one check: if request.responseId == lastResponseId then skip other checks. This would be my initial proposal in YARN-6640 v1 patch:

        	if (request.getResponseId() != lastResponse.getResponseId()) {
        	    if ((request.getResponseId() + 1) == lastResponse.getResponseId()) {
        	          /* heartbeat one step old, simply return lastReponse */
        	          return lastResponse;
        	    } else if (request.getResponseId() + 1 < lastResponse.getResponseId()) {
        	          (resync NM...)
        	    }
        	}
               (process the heartbeat...)
        

        There's still potential for the RM too slow causing NM resync, but only possible for the NMs whose reponseId just wrapped around. This should be fine I guess.

        Show
        botong Botong Huang added a comment - - edited Thanks Jason Lowe for the review and good point. We don't want RM to resync all nodes if RM becomes slow. How about we take one step back, allowing request.responseId > lastResponseId as it is now? We simply fix the overflow problem without changing anything else. Specifically, add one check: if request.responseId == lastResponseId then skip other checks. This would be my initial proposal in YARN-6640 v1 patch: if (request.getResponseId() != lastResponse.getResponseId()) { if ((request.getResponseId() + 1) == lastResponse.getResponseId()) { /* heartbeat one step old, simply return lastReponse */ return lastResponse; } else if (request.getResponseId() + 1 < lastResponse.getResponseId()) { (resync NM...) } } (process the heartbeat...) There's still potential for the RM too slow causing NM resync, but only possible for the NMs whose reponseId just wrapped around. This should be fine I guess.

          People

          • Assignee:
            botong Botong Huang
            Reporter:
            botong Botong Huang
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development