Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6403

Invalid local resource request can raise NPE and make NM exit

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4, 2.8.2
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Recently we found this problem on our testing environment. The app that caused this problem added a invalid local resource request(have no location) into ContainerLaunchContext like this:

          localResources.put("test", LocalResource.newInstance(location,
              LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100,
              System.currentTimeMillis()));
          ContainerLaunchContext amContainer =
              ContainerLaunchContext.newInstance(localResources, environment,
                vargsFinal, null, securityTokens, acls);
      

      The actual value of location was null although app doesn't expect that. This mistake cause several NMs exited with the NPE below and can't restart until the nm recovery dirs were deleted.

      FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.<init>(LocalResourceRequest.java:46)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660)
              at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
              at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
              at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
              at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
              at java.lang.Thread.run(Thread.java:745)
      

      NPE occured when created LocalResourceRequest instance for invalid resource request.

        public LocalResourceRequest(LocalResource resource)
            throws URISyntaxException {
          this(resource.getResource().toPath(),  //NPE occurred here
              resource.getTimestamp(),
              resource.getType(),
              resource.getVisibility(),
              resource.getPattern());
        }
      

      We can't guarantee the validity of local resource request now, but we could avoid damaging the cluster. Perhaps we can verify the resource both in ContainerLaunchContext and LocalResourceRequest? Please feel free to give your suggestions.

      1. YARN-6403.branch-2.8.004.patch
        11 kB
        Tao Yang
      2. YARN-6403.branch-2.8.004.patch
        11 kB
        Jason Lowe
      3. YARN-6403.branch-2.8.003.patch
        8 kB
        Tao Yang
      4. YARN-6403.004.patch
        10 kB
        Tao Yang
      5. YARN-6403.002.patch
        7 kB
        Tao Yang
      6. YARN-6403.001.patch
        3 kB
        Tao Yang

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - 2.8.1 became a security release. Moving fix-version to 2.8.2 after the fact.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11591 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11591/)
          YARN-6403. Invalid local resource request can raise NPE and make NM (jlowe: rev e8071aa249c7b21b1de084ee5a9ca2a44efd3bf0)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestApplicationClientProtocolRecords.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11591 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11591/ ) YARN-6403 . Invalid local resource request can raise NPE and make NM (jlowe: rev e8071aa249c7b21b1de084ee5a9ca2a44efd3bf0) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestApplicationClientProtocolRecords.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java
          Hide
          Tao Yang Tao Yang added a comment -

          Jason Lowe, thanks for review and committing!

          Show
          Tao Yang Tao Yang added a comment - Jason Lowe , thanks for review and committing!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11530 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11530/)
          YARN-6403. Invalid local resource request can raise NPE and make NM (jlowe: rev e8071aa249c7b21b1de084ee5a9ca2a44efd3bf0)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestApplicationClientProtocolRecords.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11530 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11530/ ) YARN-6403 . Invalid local resource request can raise NPE and make NM (jlowe: rev e8071aa249c7b21b1de084ee5a9ca2a44efd3bf0) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerManagerWithLCE.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestApplicationClientProtocolRecords.java
          Hide
          jlowe Jason Lowe added a comment -

          Thanks, Tao Yang! I committed this to trunk, branch-2, and branch-2.8.

          Show
          jlowe Jason Lowe added a comment - Thanks, Tao Yang ! I committed this to trunk, branch-2, and branch-2.8.
          Hide
          jlowe Jason Lowe added a comment -

          +1 for the latest trunk and 2.8 patches. Committing this.

          Show
          jlowe Jason Lowe added a comment - +1 for the latest trunk and 2.8 patches. Committing this.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          0 mvndep 0m 18s Maven dependency ordering for branch
          +1 mvninstall 8m 44s branch-2.8 passed
          +1 compile 1m 50s branch-2.8 passed with JDK v1.8.0_121
          +1 compile 2m 11s branch-2.8 passed with JDK v1.7.0_121
          +1 checkstyle 0m 34s branch-2.8 passed
          +1 mvnsite 1m 8s branch-2.8 passed
          +1 mvneclipse 0m 29s branch-2.8 passed
          +1 findbugs 2m 0s branch-2.8 passed
          +1 javadoc 0m 44s branch-2.8 passed with JDK v1.8.0_121
          +1 javadoc 0m 50s branch-2.8 passed with JDK v1.7.0_121
          0 mvndep 0m 11s Maven dependency ordering for patch
          +1 mvninstall 0m 51s the patch passed
          +1 compile 1m 47s the patch passed with JDK v1.8.0_121
          +1 javac 1m 47s the patch passed
          +1 compile 2m 11s the patch passed with JDK v1.7.0_121
          +1 javac 2m 11s the patch passed
          -0 checkstyle 0m 31s hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 75 unchanged - 0 fixed = 77 total (was 75)
          +1 mvnsite 0m 59s the patch passed
          +1 mvneclipse 0m 26s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 21s the patch passed
          +1 javadoc 0m 41s the patch passed with JDK v1.8.0_121
          +1 javadoc 0m 49s the patch passed with JDK v1.7.0_121
          +1 unit 2m 30s hadoop-yarn-common in the patch passed with JDK v1.7.0_121.
          +1 unit 9m 51s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_121.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          62m 13s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5af2af1
          JIRA Issue YARN-6403
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861974/YARN-6403.branch-2.8.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e34b9fe793d0 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / 9c1cf63
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15527/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15527/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/15527/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. 0 mvndep 0m 18s Maven dependency ordering for branch +1 mvninstall 8m 44s branch-2.8 passed +1 compile 1m 50s branch-2.8 passed with JDK v1.8.0_121 +1 compile 2m 11s branch-2.8 passed with JDK v1.7.0_121 +1 checkstyle 0m 34s branch-2.8 passed +1 mvnsite 1m 8s branch-2.8 passed +1 mvneclipse 0m 29s branch-2.8 passed +1 findbugs 2m 0s branch-2.8 passed +1 javadoc 0m 44s branch-2.8 passed with JDK v1.8.0_121 +1 javadoc 0m 50s branch-2.8 passed with JDK v1.7.0_121 0 mvndep 0m 11s Maven dependency ordering for patch +1 mvninstall 0m 51s the patch passed +1 compile 1m 47s the patch passed with JDK v1.8.0_121 +1 javac 1m 47s the patch passed +1 compile 2m 11s the patch passed with JDK v1.7.0_121 +1 javac 2m 11s the patch passed -0 checkstyle 0m 31s hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 75 unchanged - 0 fixed = 77 total (was 75) +1 mvnsite 0m 59s the patch passed +1 mvneclipse 0m 26s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 21s the patch passed +1 javadoc 0m 41s the patch passed with JDK v1.8.0_121 +1 javadoc 0m 49s the patch passed with JDK v1.7.0_121 +1 unit 2m 30s hadoop-yarn-common in the patch passed with JDK v1.7.0_121. +1 unit 9m 51s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_121. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 62m 13s Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-6403 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861974/YARN-6403.branch-2.8.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e34b9fe793d0 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / 9c1cf63 Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15527/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15527/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15527/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 docker 5m 10s Docker failed to build yetus/hadoop:5af2af1.



          Subsystem Report/Notes
          JIRA Issue YARN-6403
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861974/YARN-6403.branch-2.8.004.patch
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/15514/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 docker 5m 10s Docker failed to build yetus/hadoop:5af2af1. Subsystem Report/Notes JIRA Issue YARN-6403 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861974/YARN-6403.branch-2.8.004.patch Console output https://builds.apache.org/job/PreCommit-YARN-Build/15514/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for updating the patch! Looks good to me. Posting the same branch-2.8 patch again so Jenkins can comment on it, as it will only comment on one patch at a time if many are posted at once.

          Show
          jlowe Jason Lowe added a comment - Thanks for updating the patch! Looks good to me. Posting the same branch-2.8 patch again so Jenkins can comment on it, as it will only comment on one patch at a time if many are posted at once.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          0 mvndep 0m 9s Maven dependency ordering for branch
          +1 mvninstall 12m 58s trunk passed
          +1 compile 10m 18s trunk passed
          +1 checkstyle 0m 52s trunk passed
          +1 mvnsite 1m 7s trunk passed
          +1 mvneclipse 0m 39s trunk passed
          +1 findbugs 1m 50s trunk passed
          +1 javadoc 0m 57s trunk passed
          0 mvndep 0m 9s Maven dependency ordering for patch
          +1 mvninstall 0m 50s the patch passed
          +1 compile 8m 32s the patch passed
          +1 javac 8m 32s the patch passed
          -0 checkstyle 0m 51s hadoop-yarn-project/hadoop-yarn: The patch generated 8 new + 174 unchanged - 0 fixed = 182 total (was 174)
          +1 mvnsite 1m 4s the patch passed
          +1 mvneclipse 0m 37s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 2s the patch passed
          +1 javadoc 0m 56s the patch passed
          +1 unit 2m 22s hadoop-yarn-common in the patch passed.
          +1 unit 12m 50s hadoop-yarn-server-nodemanager in the patch passed.
          +1 asflicense 0m 34s The patch does not generate ASF License warnings.
          68m 35s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:a9ad5d6
          JIRA Issue YARN-6403
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861805/YARN-6403.004.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ff5db228b1b0 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 6eba792
          Default Java 1.8.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15498/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15498/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/15498/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. 0 mvndep 0m 9s Maven dependency ordering for branch +1 mvninstall 12m 58s trunk passed +1 compile 10m 18s trunk passed +1 checkstyle 0m 52s trunk passed +1 mvnsite 1m 7s trunk passed +1 mvneclipse 0m 39s trunk passed +1 findbugs 1m 50s trunk passed +1 javadoc 0m 57s trunk passed 0 mvndep 0m 9s Maven dependency ordering for patch +1 mvninstall 0m 50s the patch passed +1 compile 8m 32s the patch passed +1 javac 8m 32s the patch passed -0 checkstyle 0m 51s hadoop-yarn-project/hadoop-yarn: The patch generated 8 new + 174 unchanged - 0 fixed = 182 total (was 174) +1 mvnsite 1m 4s the patch passed +1 mvneclipse 0m 37s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 2s the patch passed +1 javadoc 0m 56s the patch passed +1 unit 2m 22s hadoop-yarn-common in the patch passed. +1 unit 12m 50s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 34s The patch does not generate ASF License warnings. 68m 35s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue YARN-6403 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861805/YARN-6403.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ff5db228b1b0 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 6eba792 Default Java 1.8.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15498/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15498/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15498/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          Tao Yang Tao Yang added a comment -

          Thanks Jason Lowe for your suggestions. Client-side test is moved to TestApplicationClientProtocolRecords now and TestContainerManagerWithLCE is updated to avoid failure. Attach new patches for branch-2.8 and trunk.

          Show
          Tao Yang Tao Yang added a comment - Thanks Jason Lowe for your suggestions. Client-side test is moved to TestApplicationClientProtocolRecords now and TestContainerManagerWithLCE is updated to avoid failure. Attach new patches for branch-2.8 and trunk.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          0 mvndep 0m 26s Maven dependency ordering for branch
          +1 mvninstall 6m 36s branch-2.8 passed
          +1 compile 1m 56s branch-2.8 passed with JDK v1.8.0_121
          +1 compile 2m 12s branch-2.8 passed with JDK v1.7.0_121
          +1 checkstyle 0m 35s branch-2.8 passed
          +1 mvnsite 1m 1s branch-2.8 passed
          +1 mvneclipse 0m 27s branch-2.8 passed
          +1 findbugs 2m 0s branch-2.8 passed
          +1 javadoc 0m 46s branch-2.8 passed with JDK v1.8.0_121
          +1 javadoc 0m 52s branch-2.8 passed with JDK v1.7.0_121
          0 mvndep 0m 10s Maven dependency ordering for patch
          +1 mvninstall 0m 52s the patch passed
          +1 compile 2m 19s the patch passed with JDK v1.8.0_121
          +1 javac 2m 19s the patch passed
          +1 compile 2m 20s the patch passed with JDK v1.7.0_121
          +1 javac 2m 20s the patch passed
          -0 checkstyle 0m 32s hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 75 unchanged - 0 fixed = 77 total (was 75)
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 24s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 24s the patch passed
          +1 javadoc 0m 48s the patch passed with JDK v1.8.0_121
          +1 javadoc 0m 50s the patch passed with JDK v1.7.0_121
          +1 unit 2m 27s hadoop-yarn-common in the patch passed with JDK v1.7.0_121.
          -1 unit 9m 47s hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_121.
          +1 asflicense 0m 21s The patch does not generate ASF License warnings.
          61m 42s



          Reason Tests
          JDK v1.8.0_121 Failed junit tests hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE
          JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:5af2af1
          JIRA Issue YARN-6403
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861396/YARN-6403.branch-2.8.003.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 4857042744ed 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.8 / 14e5a8e
          Default Java 1.7.0_121
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15455/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/15455/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.7.0_121.txt
          JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15455/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/15455/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. 0 mvndep 0m 26s Maven dependency ordering for branch +1 mvninstall 6m 36s branch-2.8 passed +1 compile 1m 56s branch-2.8 passed with JDK v1.8.0_121 +1 compile 2m 12s branch-2.8 passed with JDK v1.7.0_121 +1 checkstyle 0m 35s branch-2.8 passed +1 mvnsite 1m 1s branch-2.8 passed +1 mvneclipse 0m 27s branch-2.8 passed +1 findbugs 2m 0s branch-2.8 passed +1 javadoc 0m 46s branch-2.8 passed with JDK v1.8.0_121 +1 javadoc 0m 52s branch-2.8 passed with JDK v1.7.0_121 0 mvndep 0m 10s Maven dependency ordering for patch +1 mvninstall 0m 52s the patch passed +1 compile 2m 19s the patch passed with JDK v1.8.0_121 +1 javac 2m 19s the patch passed +1 compile 2m 20s the patch passed with JDK v1.7.0_121 +1 javac 2m 20s the patch passed -0 checkstyle 0m 32s hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 75 unchanged - 0 fixed = 77 total (was 75) +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 24s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 24s the patch passed +1 javadoc 0m 48s the patch passed with JDK v1.8.0_121 +1 javadoc 0m 50s the patch passed with JDK v1.7.0_121 +1 unit 2m 27s hadoop-yarn-common in the patch passed with JDK v1.7.0_121. -1 unit 9m 47s hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_121. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 61m 42s Reason Tests JDK v1.8.0_121 Failed junit tests hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE JDK v1.7.0_121 Failed junit tests hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE Subsystem Report/Notes Docker Image:yetus/hadoop:5af2af1 JIRA Issue YARN-6403 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861396/YARN-6403.branch-2.8.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 4857042744ed 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.8 / 14e5a8e Default Java 1.7.0_121 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_121 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/15455/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/15455/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.7.0_121.txt JDK v1.7.0_121 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/15455/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn Console output https://builds.apache.org/job/PreCommit-YARN-Build/15455/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for updating the patch!

          TestApplicationClientProtocolRecords is not exist in branch-2.8, so is it ok to place the UT for client-side in TestPBImplRecords#testContainerLaunchContextPBImpl?

          I'd rather the change appears in the same file so if there are subsequent modifications to the code it can be cherry-picked. Therefore I agree we need a new patch for branch-2.8 so it can add the new TestApplicationClientProtocolRecords file. Alternatively we can go with just one patch where it adds a new TestContainerLaunchContextPBImpl file that has the test.

          Otherwise changes in the 2.8 patch look good. There will need to be a patch for trunk at a minimum. We'll need a separate one for branch-2.8 if the test goes in TestApplicationClientProtocolRecords instead of a new TestContainerLaunchContextPBImpl file. Either works for me.

          Show
          jlowe Jason Lowe added a comment - Thanks for updating the patch! TestApplicationClientProtocolRecords is not exist in branch-2.8, so is it ok to place the UT for client-side in TestPBImplRecords#testContainerLaunchContextPBImpl? I'd rather the change appears in the same file so if there are subsequent modifications to the code it can be cherry-picked. Therefore I agree we need a new patch for branch-2.8 so it can add the new TestApplicationClientProtocolRecords file. Alternatively we can go with just one patch where it adds a new TestContainerLaunchContextPBImpl file that has the test. Otherwise changes in the 2.8 patch look good. There will need to be a patch for trunk at a minimum. We'll need a separate one for branch-2.8 if the test goes in TestApplicationClientProtocolRecords instead of a new TestContainerLaunchContextPBImpl file. Either works for me.
          Hide
          Tao Yang Tao Yang added a comment -

          Attach new patch for branch-2.8

          Show
          Tao Yang Tao Yang added a comment - Attach new patch for branch-2.8
          Hide
          Tao Yang Tao Yang added a comment -

          Jason Lowe Thanks for your time!

          I believe it's appropriate to throw NPE in our client check code as well rather than a generic RuntimeException. It's a minor point since the net effect will be similar for the client in either case.

          Make sense, sorry for missing the point before.

          TestApplicationClientProtocolRecords looks like a decent place since it's already has another test for ContainerLaunchContextPBImpl there.

          TestApplicationClientProtocolRecords is not exist in branch-2.8, so is it ok to place the UT for client-side in TestPBImplRecords#testContainerLaunchContextPBImpl?
          In addition, the error message and unit test code will be improved in next patch.
          One patch can't fit for all branches, perhaps it's necessary to submit patches for 2.9(branch-2) and 3.0.0-alpha3(trunk)?

          Show
          Tao Yang Tao Yang added a comment - Jason Lowe Thanks for your time! I believe it's appropriate to throw NPE in our client check code as well rather than a generic RuntimeException. It's a minor point since the net effect will be similar for the client in either case. Make sense, sorry for missing the point before. TestApplicationClientProtocolRecords looks like a decent place since it's already has another test for ContainerLaunchContextPBImpl there. TestApplicationClientProtocolRecords is not exist in branch-2.8, so is it ok to place the UT for client-side in TestPBImplRecords#testContainerLaunchContextPBImpl? In addition, the error message and unit test code will be improved in next patch. One patch can't fit for all branches, perhaps it's necessary to submit patches for 2.9(branch-2) and 3.0.0-alpha3(trunk)?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 8s YARN-6403 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



          Subsystem Report/Notes
          JIRA Issue YARN-6403
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861157/YARN-6403.002.patch
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/15435/console
          Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 8s YARN-6403 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue YARN-6403 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12861157/YARN-6403.002.patch Console output https://builds.apache.org/job/PreCommit-YARN-Build/15435/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Submitting patch so Jenkins can comment on it as well.

          For the client-side change, IIUIC the generated protobuf code won't throws NPE for this case actually.

          The generated protobuf code won't throw NPE for this particular case, but it does throw NPE for other fields that you try to set directly to null. For example, if one tried to call setLocalResources(null) on the protobuf (not the PBImpl) then the generated protobuf code explicitly throws NPE. As such, I believe it's appropriate to throw NPE in our client check code as well rather than a generic RuntimeException. It's a minor point since the net effect will be similar for the client in either case.

          The localResources != null check in checkLocalResources is not necessary since the calling code explicitly checks for it already.

          The error message should be a bit more specific. It just logs the local resource as a string, but unfortunately that won't log the fact that the resource itself is null. We should change "Got invalid local resource" to something like "Null resource URL for local resource". Throwing NPE instead of RuntimeException would at least hint to the user that there's a problem with a null field here, but we should also be more explicit in the error message to help them along.

          This test code:

              boolean throwsException = false;
              try {
          [....]
                containerLaunchContext.setLocalResources(localResources);
              } catch (Throwable e) {
                throwsException = true;
                Assert.assertTrue(e.getMessage().contains("Got invalid local resource"));
              }
              Assert.assertTrue(throwsException);
          

          can be simplified to this:

              try {
          [....]
                containerLaunchContext.setLocalResources(localResources);
                Assert.fail("setting an invalid local resource should be an error!");
              } catch (RuntimeException e) {
                Assert.assertTrue(e.getMessage().contains("Got invalid local resource"));
              }
          

          Note that we should be checking for the specific exception type we are throwing in the test rather than Throwable, since this is essentially part of the client API.

          testClientFailureWithInvalidResource does not belong in ContainerManagerImpl since it has nothing to do with ContainerManagerImpl. It's really a test for ContainerLaunchContextPBImpl and should be moved to an appropriate place in the yarn-common project. TestApplicationClientProtocolRecords looks like a decent place since it's already has another test for ContainerLaunchContextPBImpl there. The unit test method should be renamed to something more appropriate when moved there, like testCLCPBImplNullResource.

          Show
          jlowe Jason Lowe added a comment - Submitting patch so Jenkins can comment on it as well. For the client-side change, IIUIC the generated protobuf code won't throws NPE for this case actually. The generated protobuf code won't throw NPE for this particular case, but it does throw NPE for other fields that you try to set directly to null. For example, if one tried to call setLocalResources(null) on the protobuf (not the PBImpl) then the generated protobuf code explicitly throws NPE. As such, I believe it's appropriate to throw NPE in our client check code as well rather than a generic RuntimeException. It's a minor point since the net effect will be similar for the client in either case. The localResources != null check in checkLocalResources is not necessary since the calling code explicitly checks for it already. The error message should be a bit more specific. It just logs the local resource as a string, but unfortunately that won't log the fact that the resource itself is null. We should change "Got invalid local resource" to something like "Null resource URL for local resource". Throwing NPE instead of RuntimeException would at least hint to the user that there's a problem with a null field here, but we should also be more explicit in the error message to help them along. This test code: boolean throwsException = false ; try { [....] containerLaunchContext.setLocalResources(localResources); } catch (Throwable e) { throwsException = true ; Assert.assertTrue(e.getMessage().contains( "Got invalid local resource" )); } Assert.assertTrue(throwsException); can be simplified to this: try { [....] containerLaunchContext.setLocalResources(localResources); Assert.fail( "setting an invalid local resource should be an error!" ); } catch (RuntimeException e) { Assert.assertTrue(e.getMessage().contains( "Got invalid local resource" )); } Note that we should be checking for the specific exception type we are throwing in the test rather than Throwable, since this is essentially part of the client API. testClientFailureWithInvalidResource does not belong in ContainerManagerImpl since it has nothing to do with ContainerManagerImpl. It's really a test for ContainerLaunchContextPBImpl and should be moved to an appropriate place in the yarn-common project. TestApplicationClientProtocolRecords looks like a decent place since it's already has another test for ContainerLaunchContextPBImpl there. The unit test method should be renamed to something more appropriate when moved there, like testCLCPBImplNullResource.
          Hide
          Tao Yang Tao Yang added a comment -

          Jason Lowe Thanks for correcting me.
          The last server-side change is not proper and I corrected it as your mentioned.
          For the client-side change, IIUIC the generated protobuf code won't throws NPE for this case actually.
          Unit tests for both the client and server change is added.
          Attach a new patch for review, please correct me if I missed something.

          Show
          Tao Yang Tao Yang added a comment - Jason Lowe Thanks for correcting me. The last server-side change is not proper and I corrected it as your mentioned. For the client-side change, IIUIC the generated protobuf code won't throws NPE for this case actually. Unit tests for both the client and server change is added. Attach a new patch for review, please correct me if I missed something.
          Hide
          jlowe Jason Lowe added a comment -

          Sorry, I completely missed the server-side change in ContainerImpl.

          I'm not sure that's the correct place to make the server-side change because it's happening so late in the container lifecycle. It would be better if we simply failed the container launch request immediately rather than wait until the container transitions all the way to the localizing state. That way the client gets immediate feedback that their request was malformed rather than wondering why their container launch mysteriously failed sometime later.

          I think it's more appropriate to have ContainerManagerImpl#startContainerInternal sanity check the request (which it already does to some degree, just not for the local resources) and throw a YarnException if the request is malformed. That way the client will receive a failed container start response to their start request, so they will immediately know their request was bad.

          It would be good to add unit tests for both the client and server changes.

          Show
          jlowe Jason Lowe added a comment - Sorry, I completely missed the server-side change in ContainerImpl. I'm not sure that's the correct place to make the server-side change because it's happening so late in the container lifecycle. It would be better if we simply failed the container launch request immediately rather than wait until the container transitions all the way to the localizing state. That way the client gets immediate feedback that their request was malformed rather than wondering why their container launch mysteriously failed sometime later. I think it's more appropriate to have ContainerManagerImpl#startContainerInternal sanity check the request (which it already does to some degree, just not for the local resources) and throw a YarnException if the request is malformed. That way the client will receive a failed container start response to their start request, so they will immediately know their request was bad. It would be good to add unit tests for both the client and server changes.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the patch!

          This patch is changing the client code but not the server code. A client who doesn't have the fix or a malicious client can still construct a malformed protobuf that is missing the resource location. Minimally the server needs to validate the request. The client-side change is nice to have but technically not necessary to fix the issue.

          Nit: Speaking of the client side change, I think NullPointerException is more appropriate to throw in this case. That's what the generated protobuf code already throws when trying to set protobuf fields to null.

          Show
          jlowe Jason Lowe added a comment - Thanks for the patch! This patch is changing the client code but not the server code. A client who doesn't have the fix or a malicious client can still construct a malformed protobuf that is missing the resource location. Minimally the server needs to validate the request. The client-side change is nice to have but technically not necessary to fix the issue. Nit: Speaking of the client side change, I think NullPointerException is more appropriate to throw in this case. That's what the generated protobuf code already throws when trying to set protobuf fields to null.
          Hide
          Tao Yang Tao Yang added a comment -

          Attach a patch for review.

          • Add local resources check in ContainerImpl$RequestResourcesTransition to avoid NM failing, the container with invalid resource will fail to launch in this step.
          • Add local resources check in ContainerLaunchContextPBImpl#setLocalResources to fail the app with invalid resource early in client, as it's a waste for cluster to launch a bound-to-fail app.
          Show
          Tao Yang Tao Yang added a comment - Attach a patch for review. Add local resources check in ContainerImpl$RequestResourcesTransition to avoid NM failing, the container with invalid resource will fail to launch in this step. Add local resources check in ContainerLaunchContextPBImpl#setLocalResources to fail the app with invalid resource early in client, as it's a waste for cluster to launch a bound-to-fail app.
          Hide
          Tao Yang Tao Yang added a comment -

          Naganarasimha G R Yes, I would like to work on this and will submit a patch for review soon.

          Show
          Tao Yang Tao Yang added a comment - Naganarasimha G R Yes, I would like to work on this and will submit a patch for review soon.
          Hide
          Naganarasimha Naganarasimha G R added a comment -

          Agree we need to add checks to ensure that NM not to fail based on client's input. Tao Yang, do you plan to work on this ?

          Show
          Naganarasimha Naganarasimha G R added a comment - Agree we need to add checks to ensure that NM not to fail based on client's input. Tao Yang , do you plan to work on this ?
          Hide
          jlowe Jason Lowe added a comment -

          The NM should definitely be hardened against malformed data being sent from the untrusted clients. The NM should just fail the container launch if fields necessary to perform the launch are missing from the request.

          Show
          jlowe Jason Lowe added a comment - The NM should definitely be hardened against malformed data being sent from the untrusted clients. The NM should just fail the container launch if fields necessary to perform the launch are missing from the request.

            People

            • Assignee:
              Tao Yang Tao Yang
              Reporter:
              Tao Yang Tao Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development