Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4530

LocalizedResource trigger a NPE Cause the NodeManager exit

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.7.1
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In our cluster, I found that LocalizedResource download failed trigger a NPE Cause the NodeManager shutdown.

      2015-12-29 17:18:33,706 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://ns3:8020/user/username/projects/user_insight/lookalike/oozie/workflow/conf/hive-site.xml transitioned from DOWNLOADING to FAILED
      2015-12-29 17:18:33,708 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/user_insight_pig_udf-0.0.1-SNAPSHOT-jar-with-dependencies.jar, 1451380519635, FILE, null }
      2015-12-29 17:18:33,710 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc { { hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/unilever_support_udf-0.0.1-SNAPSHOT.jar, 1451380519452, FILE, null },pending,[(container_1451039893865_261670_01_000578)],42332661980495938,DOWNLOADING}
      java.io.IOException: Resource hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/unilever_support_udf-0.0.1-SNAPSHOT.jar changed on src filesystem (expected 1451380519452, was 1451380611793
      	at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:276)
      	at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      2015-12-29 17:18:33,710 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://ns3/user/username/projects/user_insight/lookalike/oozie/workflow/lib/unilever_support_udf-0.0.1-SNAPSHOT.jar transitioned from DOWNLOADING to FAILED
      2015-12-29 17:18:33,710 FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Error: Shutting down
      java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712)
      2015-12-29 17:18:33,710 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
      

        Activity

        Hide
        tangshangwen tangshangwen added a comment -

        when the assoc is null and the completed.get() throw a ExecutionException,This will happen, right?

        ResourceLocalizationService.java
        try {
                    Future<Path> completed = queue.take();
                    LocalizerResourceRequestEvent assoc = pending.remove(completed);
                    try {
                      Path local = completed.get();
                      if (null == assoc) {
                        LOG.error("Localized unkonwn resource to " + completed);
                        // TODO delete
                        return;
                      }
                      LocalResourceRequest key = assoc.getResource().getRequest();
                      publicRsrc.handle(new ResourceLocalizedEvent(key, local, FileUtil
                        .getDU(new File(local.toUri()))));
                      assoc.getResource().unlock();
                    } catch (ExecutionException e) {
                      LOG.info("Failed to download rsrc " + assoc.getResource(),
                          e.getCause());
                      LocalResourceRequest req = assoc.getResource().getRequest();
                      publicRsrc.handle(new ResourceFailedLocalizationEvent(req,
                          e.getMessage()));
                      assoc.getResource().unlock();
                    } catch (CancellationException e) {
                      // ignore; shutting down
                    }
        
        Show
        tangshangwen tangshangwen added a comment - when the assoc is null and the completed.get() throw a ExecutionException,This will happen, right? ResourceLocalizationService.java try { Future<Path> completed = queue.take(); LocalizerResourceRequestEvent assoc = pending.remove(completed); try { Path local = completed.get(); if ( null == assoc) { LOG.error( "Localized unkonwn resource to " + completed); // TODO delete return ; } LocalResourceRequest key = assoc.getResource().getRequest(); publicRsrc.handle( new ResourceLocalizedEvent(key, local, FileUtil .getDU( new File(local.toUri())))); assoc.getResource().unlock(); } catch (ExecutionException e) { LOG.info( "Failed to download rsrc " + assoc.getResource(), e.getCause()); LocalResourceRequest req = assoc.getResource().getRequest(); publicRsrc.handle( new ResourceFailedLocalizationEvent(req, e.getMessage())); assoc.getResource().unlock(); } catch (CancellationException e) { // ignore; shutting down }
        Hide
        tangshangwen tangshangwen added a comment -

        I think I can fix it

        Show
        tangshangwen tangshangwen added a comment - I think I can fix it
        Hide
        tangshangwen tangshangwen added a comment -

        I found 2.7.1 have the same problem,I submitted a patch.

        Show
        tangshangwen tangshangwen added a comment - I found 2.7.1 have the same problem,I submitted a patch.
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        +1 LGTM, pending jenkins

        Show
        rohithsharma Rohith Sharma K S added a comment - +1 LGTM, pending jenkins
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        Changing state Patch Avaiable to run HadoopQA

        Show
        rohithsharma Rohith Sharma K S added a comment - Changing state Patch Avaiable to run HadoopQA
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 7m 32s trunk passed
        +1 compile 0m 24s trunk passed with JDK v1.8.0_66
        +1 compile 0m 28s trunk passed with JDK v1.7.0_91
        +1 checkstyle 0m 12s trunk passed
        +1 mvnsite 0m 29s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 55s trunk passed
        +1 javadoc 0m 19s trunk passed with JDK v1.8.0_66
        +1 javadoc 0m 22s trunk passed with JDK v1.7.0_91
        +1 mvninstall 0m 24s the patch passed
        +1 compile 0m 21s the patch passed with JDK v1.8.0_66
        +1 javac 0m 21s the patch passed
        +1 compile 0m 25s the patch passed with JDK v1.7.0_91
        +1 javac 0m 25s the patch passed
        +1 checkstyle 0m 12s the patch passed
        +1 mvnsite 0m 27s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 0s the patch passed
        +1 javadoc 0m 15s the patch passed with JDK v1.8.0_66
        +1 javadoc 0m 20s the patch passed with JDK v1.7.0_91
        +1 unit 8m 34s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
        +1 unit 9m 10s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
        +1 asflicense 0m 18s Patch does not generate ASF License warnings.
        33m 38s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12780090/YARN-4530.1.patch
        JIRA Issue YARN-4530
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 8f9dafc8ed22 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 4e4b3a8
        Default Java 1.7.0_91
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
        findbugs v3.0.0
        JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10132/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Max memory used 75MB
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10132/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 32s trunk passed +1 compile 0m 24s trunk passed with JDK v1.8.0_66 +1 compile 0m 28s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 12s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 55s trunk passed +1 javadoc 0m 19s trunk passed with JDK v1.8.0_66 +1 javadoc 0m 22s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 24s the patch passed +1 compile 0m 21s the patch passed with JDK v1.8.0_66 +1 javac 0m 21s the patch passed +1 compile 0m 25s the patch passed with JDK v1.7.0_91 +1 javac 0m 25s the patch passed +1 checkstyle 0m 12s the patch passed +1 mvnsite 0m 27s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 0s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_66 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_91 +1 unit 8m 34s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 9m 10s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 33m 38s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12780090/YARN-4530.1.patch JIRA Issue YARN-4530 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 8f9dafc8ed22 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4e4b3a8 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10132/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Max memory used 75MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10132/console This message was automatically generated.
        Hide
        tangshangwen tangshangwen added a comment -

        Hi Rohith Sharma K S , In this patch,If assoc is null return directly, when completed.get() throw an ExecutionException,assoc will not be null,I think this patch is not need a new test cases

        ResourceLocalizationService.java
                    try {
                      if (null == assoc) {
                        LOG.error("Localized unknown resource to " + completed);
                        // TODO delete
                        return;
                      }
                      Path local = completed.get();
                      LocalResourceRequest key = assoc.getResource().getRequest();
                      publicRsrc.handle(new ResourceLocalizedEvent(key, local, FileUtil
                        .getDU(new File(local.toUri()))));
                      assoc.getResource().unlock();
                    } catch (ExecutionException e) {
                      LOG.info("Failed to download resource " + assoc.getResource(),
                          e.getCause());
                      LocalResourceRequest req = assoc.getResource().getRequest();
                      publicRsrc.handle(new ResourceFailedLocalizationEvent(req,
                          e.getMessage()));
                      assoc.getResource().unlock();
                    }
        
        Show
        tangshangwen tangshangwen added a comment - Hi Rohith Sharma K S , In this patch,If assoc is null return directly, when completed.get() throw an ExecutionException,assoc will not be null,I think this patch is not need a new test cases ResourceLocalizationService.java try { if ( null == assoc) { LOG.error( "Localized unknown resource to " + completed); // TODO delete return ; } Path local = completed.get(); LocalResourceRequest key = assoc.getResource().getRequest(); publicRsrc.handle( new ResourceLocalizedEvent(key, local, FileUtil .getDU( new File(local.toUri())))); assoc.getResource().unlock(); } catch (ExecutionException e) { LOG.info( "Failed to download resource " + assoc.getResource(), e.getCause()); LocalResourceRequest req = assoc.getResource().getRequest(); publicRsrc.handle( new ResourceFailedLocalizationEvent(req, e.getMessage())); assoc.getResource().unlock(); }
        Hide
        tangshangwen tangshangwen added a comment -

        Hi Rohith Sharma K S , I need to write a test case ?

        Show
        tangshangwen tangshangwen added a comment - Hi Rohith Sharma K S , I need to write a test case ?
        Hide
        rohithsharma Rohith Sharma K S added a comment -

        committed to trunk/branch-2. thanks tangshangwen for your contributions!!

        I have added you for contributors list, keep contributing

        Show
        rohithsharma Rohith Sharma K S added a comment - committed to trunk/branch-2. thanks tangshangwen for your contributions!! I have added you for contributors list, keep contributing
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9042 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9042/)
        YARN-4530. LocalizedResource trigger a NPE Cause the NodeManager exit. (rohithsharmaks: rev f9e36dea96f592d09f159e521379e426e7f07ec9)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9042 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9042/ ) YARN-4530 . LocalizedResource trigger a NPE Cause the NodeManager exit. (rohithsharmaks: rev f9e36dea96f592d09f159e521379e426e7f07ec9) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java hadoop-yarn-project/CHANGES.txt
        Hide
        leftnoteasy Wangda Tan added a comment -

        Hi Rohith Sharma K S,
        Do you think is better to add this to branch-2.8?

        Show
        leftnoteasy Wangda Tan added a comment - Hi Rohith Sharma K S , Do you think is better to add this to branch-2.8?

          People

          • Assignee:
            tangshangwen tangshangwen
            Reporter:
            tangshangwen tangshangwen
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development