Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      NM dies because of IllegalArgumentException when localize resource.

      2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:

      { hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar, 1416997035456, FILE, null }

      2014-12-29 13:43:58,699 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:

      { hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/, 1419831474153, FILE, null }

      2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
      java.lang.IllegalArgumentException: Can not create a Path from an empty string
      at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
      at org.apache.hadoop.fs.Path.<init>(Path.java:135)
      at org.apache.hadoop.fs.Path.<init>(Path.java:94)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
      at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
      at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
      at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
      at java.lang.Thread.run(Thread.java:745)
      2014-12-29 13:43:58,701 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user hadoop
      2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
      2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting connection close header...

      1. YARN-3011.001.patch
        5 kB
        Varun Saxena
      2. YARN-3011.002.patch
        5 kB
        Varun Saxena
      3. YARN-3011.003.patch
        5 kB
        Varun Saxena
      4. YARN-3011.004.patch
        5 kB
        Varun Saxena

        Activity

        Hide
        wh831019 Wang Hao added a comment -

        I submitted a job to oozie. In my workflow.xml, the value of the tag script is ended with '/' by mistake.
        <workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
        <start to="create_hive"/>

        <action name="create_hive">
        <hive xmlns="uri:oozie:hive-action:0.2">
        <job-tracker>$

        {jobTracker}

        </job-tracker>
        <name-node>$

        {nameNode}

        </name-node>
        <configuration>
        <property>
        <name>oozie.action.sharelib.for.hive</name>
        <value>hive2</value>
        </property>
        <property>
        <name>oozie.launcher.action.main.class</name>
        <value>org.apache.oozie.action.hadoop.Hive2Main</value>
        </property>
        <property>
        <name>mapreduce.job.queuename</name>
        <value>$

        {queueName}

        </value>
        </property>
        </configuration>
        <script>test_ooize_job1.sql/</script>
        <param>hivevar:dbname=offline</param>
        <param>hivevar:partition_date=20141228</param>
        </hive>
        <ok to="end"/>
        <error to="fail"/>
        </action>
        <kill name="fail">
        <message>Hive failed, error message[$

        {wf:errorMessage(wf:lastErrorNode())}

        ]</message>
        </kill>
        <end name="end"/>
        </workflow-app>

        When NM localized resource , the file "test_ooize_job1.sql/" cause a exception in function getPathForLocalization of LocalResourcesTrackerImpl.

        In function getPathForLocalization, when created Path, the second parameter will get null.
        Path localPath = new Path(rPath, req.getPath().getName());

        finally, the exception will cause AsyncDispatcher to shutdown the jvm.
        So, I think we should handle this Exception, otherwise, it will cause lots of NMs die.

        Show
        wh831019 Wang Hao added a comment - I submitted a job to oozie. In my workflow.xml, the value of the tag script is ended with '/' by mistake. <workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf"> <start to="create_hive"/> <action name="create_hive"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>$ {jobTracker} </job-tracker> <name-node>$ {nameNode} </name-node> <configuration> <property> <name>oozie.action.sharelib.for.hive</name> <value>hive2</value> </property> <property> <name>oozie.launcher.action.main.class</name> <value>org.apache.oozie.action.hadoop.Hive2Main</value> </property> <property> <name>mapreduce.job.queuename</name> <value>$ {queueName} </value> </property> </configuration> <script>test_ooize_job1.sql/</script> <param>hivevar:dbname=offline</param> <param>hivevar:partition_date=20141228</param> </hive> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive failed, error message[$ {wf:errorMessage(wf:lastErrorNode())} ]</message> </kill> <end name="end"/> </workflow-app> When NM localized resource , the file "test_ooize_job1.sql/" cause a exception in function getPathForLocalization of LocalResourcesTrackerImpl. In function getPathForLocalization, when created Path, the second parameter will get null. Path localPath = new Path(rPath, req.getPath().getName()); finally, the exception will cause AsyncDispatcher to shutdown the jvm. So, I think we should handle this Exception, otherwise, it will cause lots of NMs die.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        This is a part of YARN-662 - the one about doing sanity-checks. Linking..

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - This is a part of YARN-662 - the one about doing sanity-checks. Linking..
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12691543/YARN-3011.001.patch
        against trunk revision ef3c3a8.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6304//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6304//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12691543/YARN-3011.001.patch against trunk revision ef3c3a8. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6304//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6304//console This message is automatically generated.
        Hide
        varun_saxena Varun Saxena added a comment -
        Show
        varun_saxena Varun Saxena added a comment - Junping Du / Vinod Kumar Vavilapalli , kindly review
        Hide
        varun_saxena Varun Saxena added a comment -

        Someone, kindly review this one

        Show
        varun_saxena Varun Saxena added a comment - Someone, kindly review this one
        Hide
        jianhe Jian He added a comment -

        lgtm overall, IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case?
        one nit on the patch:
        next.getResource().getFile() , I feel using ConverterUtils#getPathFromYarnURL to print the full URL will be more debuggable.

        Show
        jianhe Jian He added a comment - lgtm overall, IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case? one nit on the patch: next.getResource().getFile() , I feel using ConverterUtils#getPathFromYarnURL to print the full URL will be more debuggable.
        Hide
        varun_saxena Varun Saxena added a comment -

        IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case?

        Yes, you are correct.

        Show
        varun_saxena Varun Saxena added a comment - IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case? Yes, you are correct.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12694335/YARN-3011.002.patch
        against trunk revision 8f26d5a.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6407//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6407//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694335/YARN-3011.002.patch against trunk revision 8f26d5a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6407//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6407//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12694336/YARN-3011.002.patch
        against trunk revision 8f26d5a.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6408//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6408//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694336/YARN-3011.002.patch against trunk revision 8f26d5a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6408//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6408//console This message is automatically generated.
        Hide
        jianhe Jian He added a comment -

        Varun Saxena, I tried to commit, but patch seems not applying again. mind rebasing the patch ? thx.

        Show
        jianhe Jian He added a comment - Varun Saxena , I tried to commit, but patch seems not applying again. mind rebasing the patch ? thx.
        Hide
        varun_saxena Varun Saxena added a comment -

        Rebased the patch

        Show
        varun_saxena Varun Saxena added a comment - Rebased the patch
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12694666/YARN-3011.003.patch
        against trunk revision 6f9fe76.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6430//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6430//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694666/YARN-3011.003.patch against trunk revision 6f9fe76. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6430//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6430//console This message is automatically generated.
        Hide
        jianhe Jian He added a comment -

        I feel using ConverterUtils#getPathFromYarnURL to print the full URL will be more debuggable.

        Varun Saxena, sorry, I didn't realize that ConverterUtils.getPathFromYarnURL again throws exception. For simplicity, I think your first patch is good enough. would you like to revert to the first approach?

        Show
        jianhe Jian He added a comment - I feel using ConverterUtils#getPathFromYarnURL to print the full URL will be more debuggable. Varun Saxena , sorry, I didn't realize that ConverterUtils.getPathFromYarnURL again throws exception. For simplicity, I think your first patch is good enough. would you like to revert to the first approach?
        Hide
        varun_saxena Varun Saxena added a comment -

        Jenkins seems to have some problem. Trying to kick Jenkins again

        Show
        varun_saxena Varun Saxena added a comment - Jenkins seems to have some problem. Trying to kick Jenkins again
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12694780/YARN-3011.004.patch
        against trunk revision 1e2d98a.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6435//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6435//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694780/YARN-3011.004.patch against trunk revision 1e2d98a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6435//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6435//console This message is automatically generated.
        Hide
        jianhe Jian He added a comment -

        Committed to trunk and branch-2, thanks Varun Saxena !

        Show
        jianhe Jian He added a comment - Committed to trunk and branch-2, thanks Varun Saxena !
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #6941 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6941/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6941 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6941/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #87 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/87/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #87 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/87/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #821 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/821/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #821 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/821/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #84 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/84/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #84 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/84/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2019 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2019/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #88 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/88/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2038 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2038/)
        YARN-3011. Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2038 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2038/ ) YARN-3011 . Possible IllegalArgumentException in ResourceLocalizationService might lead NM to crash. Contributed by Varun Saxena (jianhe: rev 4e15fc08411318e11152fcd5a4648ed1d6fbb480) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
        Hide
        varun_saxena Varun Saxena added a comment -


        Thanks Jian He for the review and commit.

        Show
        varun_saxena Varun Saxena added a comment - Thanks Jian He for the review and commit.
        Hide
        djp Junping Du added a comment -

        Thanks Varun Saxena and all for fixing this issue.

        IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case?

        Ideally. This should be right. However, I run into this problem also recently in a 2.6 cluster without enabling "yarn.dispatcher.exit-on-error" (default to be disabled except dispatcher in JobHistoryServer), but NMs get shutdown without seeing other exceptions. I think we could miss something here.

        Show
        djp Junping Du added a comment - Thanks Varun Saxena and all for fixing this issue. IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash in this case? Ideally. This should be right. However, I run into this problem also recently in a 2.6 cluster without enabling "yarn.dispatcher.exit-on-error" (default to be disabled except dispatcher in JobHistoryServer), but NMs get shutdown without seeing other exceptions. I think we could miss something here.
        Hide
        varun_saxena Varun Saxena added a comment -

        Junping Du, sorry had missed your comment.
        I was under a similar impression when I wrote the comment in January.

        But actually all daemons including node manager set yarn.dispatcher.exit-on-error configuration explicitly to true in serviceInit.

        conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true);
        

        That means the configuration value is completely disregarded.
        The default value of false is meant for test cases to avoid JVM exit. This is clearly documented in Dispatcher.java. This configuration being an internal configuration is not included in yarn-default.xml either.

          // Configuration to make sure dispatcher crashes but doesn't do system-exit in
          // case of errors. By default, it should be false, so that tests are not
          // affected. For all daemons it should be explicitly set to true so that
          // daemons can crash instead of hanging around.
          public static final String DISPATCHER_EXIT_ON_ERROR_KEY =
              "yarn.dispatcher.exit-on-error";
        

        We can probably set this config to true in daemons only if yarn.dispatcher.exit-on-error config is not set in config file. Thoughts ?
        But is there any real use case for it ? A recoverable exception should be caught and handled and NOT leaked through to AsyncDispatcher. And a non recoverable one should lead to a crash anyways.
        cc Junping Du, Jian He

        Show
        varun_saxena Varun Saxena added a comment - Junping Du , sorry had missed your comment. I was under a similar impression when I wrote the comment in January. But actually all daemons including node manager set yarn.dispatcher.exit-on-error configuration explicitly to true in serviceInit. conf.setBoolean(Dispatcher.DISPATCHER_EXIT_ON_ERROR_KEY, true ); That means the configuration value is completely disregarded. The default value of false is meant for test cases to avoid JVM exit. This is clearly documented in Dispatcher.java. This configuration being an internal configuration is not included in yarn-default.xml either. // Configuration to make sure dispatcher crashes but doesn't do system-exit in // case of errors. By default , it should be false , so that tests are not // affected. For all daemons it should be explicitly set to true so that // daemons can crash instead of hanging around. public static final String DISPATCHER_EXIT_ON_ERROR_KEY = "yarn.dispatcher.exit-on-error" ; We can probably set this config to true in daemons only if yarn.dispatcher.exit-on-error config is not set in config file. Thoughts ? But is there any real use case for it ? A recoverable exception should be caught and handled and NOT leaked through to AsyncDispatcher. And a non recoverable one should lead to a crash anyways. cc Junping Du , Jian He
        Hide
        varun_saxena Varun Saxena added a comment -

        Refer to MAPREDUCE-3634

        Show
        varun_saxena Varun Saxena added a comment - Refer to MAPREDUCE-3634
        Hide
        djp Junping Du added a comment -

        I see. Thanks Varun for reminding on this. "all daemons it should be explicitly set to true so that daemons can crash instead of hanging around" is not wrong but could make system more fragile in case we miss to catch all possible recoverable or unrecoverable (but not global) exceptions like this JIRA case. We may need to think more about this.

        Show
        djp Junping Du added a comment - I see. Thanks Varun for reminding on this. "all daemons it should be explicitly set to true so that daemons can crash instead of hanging around" is not wrong but could make system more fragile in case we miss to catch all possible recoverable or unrecoverable (but not global) exceptions like this JIRA case. We may need to think more about this.
        Hide
        varun_saxena Varun Saxena added a comment -

        Junping Du, the only thing which we can do here is that we can read this value from configuration and set it to true in daemons if not configured.
        This way in production clusters if there is an exception which is leading to the daemon crashing frequently and we find that its not a very big issue(i.e daemon can still work normally), we can atleast set the configuration to false in config file.
        Right now, even that option is not there.
        Thoughts ?

        I can probably raise a JIRA for this and discussion(even if its not fixed) can carry on there.

        Show
        varun_saxena Varun Saxena added a comment - Junping Du , the only thing which we can do here is that we can read this value from configuration and set it to true in daemons if not configured. This way in production clusters if there is an exception which is leading to the daemon crashing frequently and we find that its not a very big issue(i.e daemon can still work normally), we can atleast set the configuration to false in config file. Right now, even that option is not there. Thoughts ? I can probably raise a JIRA for this and discussion(even if its not fixed) can carry on there.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Pulled this into 2.6.1. There were a bit of conflicts in ResourceLocalizationService.java, resolved them. Ran compilation and TestResourceLocalizationService before the push.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Pulled this into 2.6.1. There were a bit of conflicts in ResourceLocalizationService.java, resolved them. Ran compilation and TestResourceLocalizationService before the push.

          People

          • Assignee:
            varun_saxena Varun Saxena
            Reporter:
            wh831019 Wang Hao
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development