Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4924

NM recovery race can lead to container not cleaned up

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.2, 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: nodemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      It's probably a small window but we observed a case where the NM crashed and then a container was not properly cleaned up during recovery.

      I will add details in first comment.

      1. YARN-4924.01.patch
        16 kB
        sandflee
      2. YARN-4924.02.patch
        16 kB
        sandflee
      3. YARN-4924.03.patch
        15 kB
        sandflee
      4. YARN-4924.04.patch
        15 kB
        sandflee
      5. YARN-4924.05.patch
        16 kB
        sandflee

        Activity

        Hide
        nroberts Nathan Roberts added a comment -

        Observed the following race with NM recovery.

        1) ContainerManager handles a FINISH_APPS event causing storeFinishedApplication() to be recorded in state store (e.g. if RM kills application)
        2) Prior to cleaning up the containers associated with this application, the NM dies
        3) When NM restarts it attempts to recover the Application, Containers, and FinishedApplication events all associated with this application, in that order
        4) This leads to a NEW to DONE transition for the containers, which will not try to cleanup the actual container since this is supposed to be a pre-LAUNCHED transition

        iiuc, this happens because when the application transitions from NEW to INITING during Application recovery, the containerInitEvents aren't actually dispatched yet. They are delayed until the AppInitDoneTransition. However, the AppInitDoneTransition may not occur until after the recovery code has handled the FinishedApplicationEvent and queued up KILL_CONTAINER events. So, in effect, the containerKillEvents passed up the containerInitEvents leading to the NEW to DONE transition.

        2016-04-04 18:20:45,513 [main] INFO application.ApplicationImpl: Application application_1458666253602_2367938 transitioned from NEW to INITING
        2016-04-04 18:20:56,437 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Adding container_e08_1458666253602_2367938_01_000004 to application application_1458666253602_2367938
        2016-04-04 18:20:57,062 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1458666253602_2367938 transitioned from INITING to FINISHING_CONTAINERS_WAIT
        2016-04-04 18:20:57,095 [AsyncDispatcher event handler] INFO container.ContainerImpl: Container container_e08_1458666253602_2367938_01_000004 transitioned from NEW to DONE
        2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Removing container_e08_1458666253602_2367938_01_000004 from application application_1458666253602_2367938
        2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1458666253602_2367938 transitioned from FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
        
        Show
        nroberts Nathan Roberts added a comment - Observed the following race with NM recovery. 1) ContainerManager handles a FINISH_APPS event causing storeFinishedApplication() to be recorded in state store (e.g. if RM kills application) 2) Prior to cleaning up the containers associated with this application, the NM dies 3) When NM restarts it attempts to recover the Application, Containers, and FinishedApplication events all associated with this application, in that order 4) This leads to a NEW to DONE transition for the containers, which will not try to cleanup the actual container since this is supposed to be a pre-LAUNCHED transition iiuc, this happens because when the application transitions from NEW to INITING during Application recovery, the containerInitEvents aren't actually dispatched yet. They are delayed until the AppInitDoneTransition. However, the AppInitDoneTransition may not occur until after the recovery code has handled the FinishedApplicationEvent and queued up KILL_CONTAINER events. So, in effect, the containerKillEvents passed up the containerInitEvents leading to the NEW to DONE transition. 2016-04-04 18:20:45,513 [main] INFO application.ApplicationImpl: Application application_1458666253602_2367938 transitioned from NEW to INITING 2016-04-04 18:20:56,437 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Adding container_e08_1458666253602_2367938_01_000004 to application application_1458666253602_2367938 2016-04-04 18:20:57,062 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1458666253602_2367938 transitioned from INITING to FINISHING_CONTAINERS_WAIT 2016-04-04 18:20:57,095 [AsyncDispatcher event handler] INFO container.ContainerImpl: Container container_e08_1458666253602_2367938_01_000004 transitioned from NEW to DONE 2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Removing container_e08_1458666253602_2367938_01_000004 from application application_1458666253602_2367938 2016-04-04 18:20:57,120 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1458666253602_2367938 transitioned from FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
        Hide
        sandflee sandflee added a comment -

        In YARN-4051, we also had containers leak from NEW to DONE transition while recovering. The difference is in YARN-4051 NM received FINISH_APP event after NM register while containers are not recovered.
        As in YARN-4051, pending kill event is not suggested , may be we could delay the sending of FINISH_APP event after containers are recovered. If this is acceptable, I could update the patch in YARN-4051 to fix this.

        Show
        sandflee sandflee added a comment - In YARN-4051 , we also had containers leak from NEW to DONE transition while recovering. The difference is in YARN-4051 NM received FINISH_APP event after NM register while containers are not recovered. As in YARN-4051 , pending kill event is not suggested , may be we could delay the sending of FINISH_APP event after containers are recovered. If this is acceptable, I could update the patch in YARN-4051 to fix this.
        Hide
        jlowe Jason Lowe added a comment -

        I agree with sandflee that postponing the finish app event dispatch until after we've waited for the containers to complete recovering would be an appropriate fix.

        Show
        jlowe Jason Lowe added a comment - I agree with sandflee that postponing the finish app event dispatch until after we've waited for the containers to complete recovering would be an appropriate fix.
        Hide
        nroberts Nathan Roberts added a comment -

        Thanks sandflee, Jason Lowe for the suggestion. I'll work up a fix soon.

        Show
        nroberts Nathan Roberts added a comment - Thanks sandflee , Jason Lowe for the suggestion. I'll work up a fix soon.
        Hide
        nroberts Nathan Roberts added a comment -

        Sorry sandflee. I missed your comment about updating YARN-4051. That seems fine with me!

        Show
        nroberts Nathan Roberts added a comment - Sorry sandflee . I missed your comment about updating YARN-4051 . That seems fine with me!
        Hide
        sandflee sandflee added a comment -

        thanks Nathan Roberts, another thought, seems it's not nessesary for NM to store FINISH_APP event, for RM will check the running app when NM register, we just make sure that when NM register RM, it had recovered all containers with best, yes? Jason Lowe

        Show
        sandflee sandflee added a comment - thanks Nathan Roberts , another thought, seems it's not nessesary for NM to store FINISH_APP event, for RM will check the running app when NM register, we just make sure that when NM register RM, it had recovered all containers with best, yes? Jason Lowe
        Hide
        jlowe Jason Lowe added a comment -

        Yeah, now that the NM registers with the list of apps it thinks are active and the RM tells it to finish any apps that shouldn't be active we should be covered. We'll need to leave in some recovery code for finished apps so we can clean up any lingering finished app events from the state store, but we can remove the code to store the events.

        Show
        jlowe Jason Lowe added a comment - Yeah, now that the NM registers with the list of apps it thinks are active and the RM tells it to finish any apps that shouldn't be active we should be covered. We'll need to leave in some recovery code for finished apps so we can clean up any lingering finished app events from the state store, but we can remove the code to store the events.
        Hide
        sandflee sandflee added a comment -

        remove FINISH_APP related code in NM

        Show
        sandflee sandflee added a comment - remove FINISH_APP related code in NM
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 6m 45s trunk passed
        +1 compile 0m 23s trunk passed with JDK v1.8.0_77
        +1 compile 0m 26s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 18s trunk passed
        +1 mvnsite 0m 27s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 50s trunk passed
        +1 javadoc 0m 17s trunk passed with JDK v1.8.0_77
        +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 25s the patch passed
        +1 compile 0m 20s the patch passed with JDK v1.8.0_77
        +1 javac 0m 20s the patch passed
        +1 compile 0m 24s the patch passed with JDK v1.7.0_95
        +1 javac 0m 24s the patch passed
        +1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 131 unchanged - 1 fixed = 131 total (was 132)
        +1 mvnsite 0m 26s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 1s Patch has no whitespace issues.
        +1 findbugs 1m 1s the patch passed
        +1 javadoc 0m 16s the patch passed with JDK v1.8.0_77
        +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95
        +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77.
        +1 unit 9m 34s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 18s Patch does not generate ASF License warnings.
        33m 43s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797687/YARN-4924.01.patch
        JIRA Issue YARN-4924
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 237ee677dd1b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 594c70f
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10991/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/10991/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 45s trunk passed +1 compile 0m 23s trunk passed with JDK v1.8.0_77 +1 compile 0m 26s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 50s trunk passed +1 javadoc 0m 17s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 25s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_77 +1 javac 0m 20s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_95 +1 javac 0m 24s the patch passed +1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 131 unchanged - 1 fixed = 131 total (was 132) +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 1s Patch has no whitespace issues. +1 findbugs 1m 1s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95 +1 unit 9m 2s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. +1 unit 9m 34s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 33m 43s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797687/YARN-4924.01.patch JIRA Issue YARN-4924 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 237ee677dd1b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 594c70f Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10991/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/10991/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for the patch!

        I don't think removeDeprecatedKeys is an appropriate API in the state store. It's too generic of a name, and it's being invoked in a very specific place during recovery that may not be appropriate for other keys that may become deprecated. IMHO this doesn't need to be in the API at all – the state store implementation can clean up these keys as part of loading the application state.

        If we want to get sophisticated we can increment the minor state store version (i.e.: still compatible) and only clean up these keys when we're loading the old version, but I'm not sure we need to go through all that for this. The key delete pass should be very fast during recovery if it's already been done.

        Other parts of the patch look good.

        Show
        jlowe Jason Lowe added a comment - Thanks for the patch! I don't think removeDeprecatedKeys is an appropriate API in the state store. It's too generic of a name, and it's being invoked in a very specific place during recovery that may not be appropriate for other keys that may become deprecated. IMHO this doesn't need to be in the API at all – the state store implementation can clean up these keys as part of loading the application state. If we want to get sophisticated we can increment the minor state store version (i.e.: still compatible) and only clean up these keys when we're loading the old version, but I'm not sure we need to go through all that for this. The key delete pass should be very fast during recovery if it's already been done. Other parts of the patch look good.
        Hide
        sandflee sandflee added a comment -

        I don't think removeDeprecatedKeys is an appropriate API in the state store. It's too generic of a name, and it's being invoked in a very specific place during recovery that may not be appropriate for other keys that may become deprecated. IMHO this doesn't need to be in the API at all – the state store implementation can clean up these keys as part of loading the application state.

        agree, and update the patch

        Show
        sandflee sandflee added a comment - I don't think removeDeprecatedKeys is an appropriate API in the state store. It's too generic of a name, and it's being invoked in a very specific place during recovery that may not be appropriate for other keys that may become deprecated. IMHO this doesn't need to be in the API at all – the state store implementation can clean up these keys as part of loading the application state. agree, and update the patch
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 10s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 6m 39s trunk passed
        +1 compile 0m 22s trunk passed with JDK v1.8.0_77
        +1 compile 0m 26s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 17s trunk passed
        +1 mvnsite 0m 27s trunk passed
        +1 mvneclipse 0m 12s trunk passed
        +1 findbugs 0m 48s trunk passed
        +1 javadoc 0m 18s trunk passed with JDK v1.8.0_77
        +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 21s the patch passed with JDK v1.8.0_77
        +1 javac 0m 21s the patch passed
        +1 compile 0m 23s the patch passed with JDK v1.7.0_95
        +1 javac 0m 23s the patch passed
        +1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 1s the patch passed
        +1 javadoc 0m 16s the patch passed with JDK v1.8.0_77
        +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95
        +1 unit 9m 14s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77.
        +1 unit 9m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 17s Patch does not generate ASF License warnings.
        33m 35s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797811/YARN-4924.02.patch
        JIRA Issue YARN-4924
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 587a70592476 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / ec06957
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11004/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/11004/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 39s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_77 +1 compile 0m 26s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 0m 48s trunk passed +1 javadoc 0m 18s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 23s the patch passed +1 compile 0m 21s the patch passed with JDK v1.8.0_77 +1 javac 0m 21s the patch passed +1 compile 0m 23s the patch passed with JDK v1.7.0_95 +1 javac 0m 23s the patch passed +1 checkstyle 0m 13s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 1s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95 +1 unit 9m 14s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. +1 unit 9m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 17s Patch does not generate ASF License warnings. 33m 35s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797811/YARN-4924.02.patch JIRA Issue YARN-4924 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 587a70592476 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ec06957 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11004/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11004/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for updating the patch!

        It may not be clear to others reading the code why we're removing the finished apps keys. I'd recommend factoring out the few lines in loadApplicationState into a separate function called something like cleanupDeprecatedFinishedApps or something similar so it should be very clear what's going on.

        This will leak resources associated with the patch if iter.close throws:

          private void removeKeysWithPrefix(String prefix) throws IOException {
            LeveldbIterator iter = null;
            WriteBatch batch = null;
            try {
              iter = new LeveldbIterator(db);
              iter.seek(bytes(prefix));
              batch = db.createWriteBatch();
        [...]
            } catch (DBException e) {
              throw new IOException(e);
            } finally {
              if (iter != null) {
                iter.close();
              }
              if (batch != null) {
                batch.close();
              }
            }
        

        Something like this would handle it:

          private void removeKeysWithPrefix(String prefix) throws IOException {
            LeveldbIterator iter = null;
            WriteBatch batch = null;
            iter = new LeveldbIterator(db);
            try {
              iter.seek(bytes(prefix));
              try {
                batch = db.createWriteBatch();
        [...]
              } catch (DBException e) {
                throw new IOException(e);
              } finally {
                if (batch != null) {
                  batch.close();
                }
              }
            } finally {
              iter.close();
            }
        

        Do we really want to have removeKeysWithPrefix log at the info level? We normally don't log the removal of apps, containers, etc. I could see cases where nodes have leaked many thousands of finished app events before YARN-4520 was fixed. I think we should either make this a debug log and/or log a single info log stating all keys with the specified prefix are being removed.

        Show
        jlowe Jason Lowe added a comment - Thanks for updating the patch! It may not be clear to others reading the code why we're removing the finished apps keys. I'd recommend factoring out the few lines in loadApplicationState into a separate function called something like cleanupDeprecatedFinishedApps or something similar so it should be very clear what's going on. This will leak resources associated with the patch if iter.close throws: private void removeKeysWithPrefix( String prefix) throws IOException { LeveldbIterator iter = null ; WriteBatch batch = null ; try { iter = new LeveldbIterator(db); iter.seek(bytes(prefix)); batch = db.createWriteBatch(); [...] } catch (DBException e) { throw new IOException(e); } finally { if (iter != null ) { iter.close(); } if (batch != null ) { batch.close(); } } Something like this would handle it: private void removeKeysWithPrefix( String prefix) throws IOException { LeveldbIterator iter = null ; WriteBatch batch = null ; iter = new LeveldbIterator(db); try { iter.seek(bytes(prefix)); try { batch = db.createWriteBatch(); [...] } catch (DBException e) { throw new IOException(e); } finally { if (batch != null ) { batch.close(); } } } finally { iter.close(); } Do we really want to have removeKeysWithPrefix log at the info level? We normally don't log the removal of apps, containers, etc. I could see cases where nodes have leaked many thousands of finished app events before YARN-4520 was fixed. I think we should either make this a debug log and/or log a single info log stating all keys with the specified prefix are being removed.
        Hide
        sandflee sandflee added a comment -

        thanks Jason Lowe, I added @Deprecated to FINISHED_APP_KEY_PREFIX, but it's more clear to add a function, update a patch to fix all.

        Show
        sandflee sandflee added a comment - thanks Jason Lowe , I added @Deprecated to FINISHED_APP_KEY_PREFIX, but it's more clear to add a function, update a patch to fix all.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 11s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 6m 40s trunk passed
        +1 compile 0m 22s trunk passed with JDK v1.8.0_77
        +1 compile 0m 25s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 17s trunk passed
        +1 mvnsite 0m 27s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 49s trunk passed
        +1 javadoc 0m 17s trunk passed with JDK v1.8.0_77
        +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 23s the patch passed
        +1 compile 0m 20s the patch passed with JDK v1.8.0_77
        +1 javac 0m 20s the patch passed
        +1 compile 0m 23s the patch passed with JDK v1.7.0_95
        +1 javac 0m 23s the patch passed
        +1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 0m 59s the patch passed
        +1 javadoc 0m 15s the patch passed with JDK v1.8.0_77
        +1 javadoc 0m 20s the patch passed with JDK v1.7.0_95
        +1 unit 9m 0s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77.
        +1 unit 9m 37s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 19s Patch does not generate ASF License warnings.
        33m 23s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798048/YARN-4924.03.patch
        JIRA Issue YARN-4924
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux ce3b25ce0471 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 2a5da97
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11027/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/11027/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 40s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_77 +1 compile 0m 25s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 49s trunk passed +1 javadoc 0m 17s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 23s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_77 +1 javac 0m 20s the patch passed +1 compile 0m 23s the patch passed with JDK v1.7.0_95 +1 javac 0m 23s the patch passed +1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 59s the patch passed +1 javadoc 0m 15s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 20s the patch passed with JDK v1.7.0_95 +1 unit 9m 0s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. +1 unit 9m 37s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 33m 23s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798048/YARN-4924.03.patch JIRA Issue YARN-4924 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ce3b25ce0471 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2a5da97 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11027/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11027/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for updating the patch!

        cleanupKeysWithPrefix can now let the DBException escape when it couldn't before because the createWriteBatch call was moved outside of the try block that translates that exception.

        Show
        jlowe Jason Lowe added a comment - Thanks for updating the patch! cleanupKeysWithPrefix can now let the DBException escape when it couldn't before because the createWriteBatch call was moved outside of the try block that translates that exception.
        Hide
        sandflee sandflee added a comment -

        From the interface of DB, createWriteBatch didn't not throw exception.

        Show
        sandflee sandflee added a comment - From the interface of DB, createWriteBatch didn't not throw exception.
        Hide
        sandflee sandflee added a comment -

        in case of createWriteBatch throws runtime Exception, seems more safer to mv createWriteBatch in try block. I'll update the patch.

        Show
        sandflee sandflee added a comment - in case of createWriteBatch throws runtime Exception, seems more safer to mv createWriteBatch in try block. I'll update the patch.
        Hide
        jlowe Jason Lowe added a comment -

        org.iq80.levedb.DBException (the one we're interested in catching) is a RuntimeException, and therefore doesn't have to be declared. Having JniDB throw an IOException-derived database exception only to have it translated into a Runtime-derived exception by the org.iq80 DB wrapper isn't the most ideal API for our use case since we actually try to handle the I/O errors without them being fatal.

        I don't think createWriteBatch will throw that particular exception in practice, so we're probably OK. However it's safer to put it in the try block given we're at the mercy of the implementation to not throw that exception in the future.

        Show
        jlowe Jason Lowe added a comment - org.iq80.levedb.DBException (the one we're interested in catching) is a RuntimeException, and therefore doesn't have to be declared. Having JniDB throw an IOException-derived database exception only to have it translated into a Runtime-derived exception by the org.iq80 DB wrapper isn't the most ideal API for our use case since we actually try to handle the I/O errors without them being fatal. I don't think createWriteBatch will throw that particular exception in practice, so we're probably OK. However it's safer to put it in the try block given we're at the mercy of the implementation to not throw that exception in the future.
        Hide
        sandflee sandflee added a comment -

        Thanks Jason Lowe, not noticed that DBException is a RUNTIME exception, have updated the patch.

        Show
        sandflee sandflee added a comment - Thanks Jason Lowe , not noticed that DBException is a RUNTIME exception, have updated the patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 15s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 6m 48s trunk passed
        +1 compile 0m 21s trunk passed with JDK v1.8.0_77
        +1 compile 0m 25s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 18s trunk passed
        +1 mvnsite 0m 28s trunk passed
        +1 mvneclipse 0m 12s trunk passed
        +1 findbugs 0m 49s trunk passed
        +1 javadoc 0m 17s trunk passed with JDK v1.8.0_77
        +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 24s the patch passed
        +1 compile 0m 20s the patch passed with JDK v1.8.0_77
        +1 javac 0m 20s the patch passed
        +1 compile 0m 23s the patch passed with JDK v1.7.0_95
        +1 javac 0m 23s the patch passed
        +1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132)
        +1 mvnsite 0m 25s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 0m 59s the patch passed
        +1 javadoc 0m 16s the patch passed with JDK v1.8.0_77
        +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95
        +1 unit 9m 9s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77.
        +1 unit 9m 38s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 18s Patch does not generate ASF License warnings.
        33m 51s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798121/YARN-4924.04.patch
        JIRA Issue YARN-4924
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c0f2a8ed2c6f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 44bbc50
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11037/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/11037/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 6m 48s trunk passed +1 compile 0m 21s trunk passed with JDK v1.8.0_77 +1 compile 0m 25s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 0m 49s trunk passed +1 javadoc 0m 17s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 24s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_77 +1 javac 0m 20s the patch passed +1 compile 0m 23s the patch passed with JDK v1.7.0_95 +1 javac 0m 23s the patch passed +1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132) +1 mvnsite 0m 25s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 59s the patch passed +1 javadoc 0m 16s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 19s the patch passed with JDK v1.7.0_95 +1 unit 9m 9s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. +1 unit 9m 38s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 33m 51s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798121/YARN-4924.04.patch JIRA Issue YARN-4924 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c0f2a8ed2c6f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 44bbc50 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11037/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11037/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for updating the patch!

        If createWriteBatch does ever throw the runtime DBException we want that translated to the IOException to avoid the exception bubbling up and becoming fatal to the NM. Therefore the createWriteBatch call needs to be in the inner try that will translate DBException->IOException. The sample code I wrote above should cover the cases.

        Show
        jlowe Jason Lowe added a comment - Thanks for updating the patch! If createWriteBatch does ever throw the runtime DBException we want that translated to the IOException to avoid the exception bubbling up and becoming fatal to the NM. Therefore the createWriteBatch call needs to be in the inner try that will translate DBException->IOException. The sample code I wrote above should cover the cases.
        Hide
        sandflee sandflee added a comment -

        Thanks Jason Lowe, I had catch All exception in cleanupDeprecatedFinishedApps, but let DBException escape seems not a good design. and in case others call cleanupKeysWithPrefix in future, I will put createWriteBatch in try block.

        + private void cleanupDeprecatedFinishedApps() {
        + try

        Unknown macro: { + cleanupKeysWithPrefix(FINISHED_APPS_KEY_PREFIX); + }

        catch (Exception e)

        Unknown macro: { + LOG.warn("cleanup keys with prefix " + FINISHED_APPS_KEY_PREFIX + + " from leveldb failed", e); + }

        + }

        Show
        sandflee sandflee added a comment - Thanks Jason Lowe , I had catch All exception in cleanupDeprecatedFinishedApps, but let DBException escape seems not a good design. and in case others call cleanupKeysWithPrefix in future, I will put createWriteBatch in try block. + private void cleanupDeprecatedFinishedApps() { + try Unknown macro: { + cleanupKeysWithPrefix(FINISHED_APPS_KEY_PREFIX); + } catch ( Exception e) Unknown macro: { + LOG.warn("cleanup keys with prefix " + FINISHED_APPS_KEY_PREFIX + + " from leveldb failed", e); + } + }
        Hide
        sandflee sandflee added a comment -

        leveldbIterator may also throws DBException, yes?

        Show
        sandflee sandflee added a comment - leveldbIterator may also throws DBException, yes?
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 19s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
        +1 mvninstall 7m 47s trunk passed
        +1 compile 0m 32s trunk passed with JDK v1.8.0_77
        +1 compile 0m 28s trunk passed with JDK v1.7.0_95
        +1 checkstyle 0m 19s trunk passed
        +1 mvnsite 0m 30s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 55s trunk passed
        +1 javadoc 0m 25s trunk passed with JDK v1.8.0_77
        +1 javadoc 0m 24s trunk passed with JDK v1.7.0_95
        +1 mvninstall 0m 27s the patch passed
        +1 compile 0m 29s the patch passed with JDK v1.8.0_77
        +1 javac 0m 29s the patch passed
        +1 compile 0m 26s the patch passed with JDK v1.7.0_95
        +1 javac 0m 26s the patch passed
        +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132)
        +1 mvnsite 0m 28s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 1m 5s the patch passed
        +1 javadoc 0m 23s the patch passed with JDK v1.8.0_77
        +1 javadoc 0m 21s the patch passed with JDK v1.7.0_95
        +1 unit 10m 6s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77.
        +1 unit 10m 4s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95.
        +1 asflicense 0m 19s Patch does not generate ASF License warnings.
        37m 33s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:fbe3e86
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798770/YARN-4924.05.patch
        JIRA Issue YARN-4924
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux baaaa5438a45 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 0d1c115
        Default Java 1.7.0_95
        Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
        findbugs v3.0.0
        JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11079/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/11079/console
        Powered by Apache Yetus 0.2.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 7m 47s trunk passed +1 compile 0m 32s trunk passed with JDK v1.8.0_77 +1 compile 0m 28s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 30s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 55s trunk passed +1 javadoc 0m 25s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 24s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 27s the patch passed +1 compile 0m 29s the patch passed with JDK v1.8.0_77 +1 javac 0m 29s the patch passed +1 compile 0m 26s the patch passed with JDK v1.7.0_95 +1 javac 0m 26s the patch passed +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: patch generated 0 new + 130 unchanged - 2 fixed = 130 total (was 132) +1 mvnsite 0m 28s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 5s the patch passed +1 javadoc 0m 23s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 21s the patch passed with JDK v1.7.0_95 +1 unit 10m 6s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_77. +1 unit 10m 4s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 37m 33s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798770/YARN-4924.05.patch JIRA Issue YARN-4924 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux baaaa5438a45 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0d1c115 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11079/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11079/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        leveldbIterator may also throws DBException, yes?

        Yes, if the constructor throws. That's a bug in LeveldbIterator, since the whole point of that class is to wrap the underlying iterators and translate the runtime DBExceptions into IOExceptions. Arguably we should do the same for DB so clients don't have to keep catching and translating DBException, but that's for another JIRA.

        +1 for the latest patch. Committing this.

        Show
        jlowe Jason Lowe added a comment - leveldbIterator may also throws DBException, yes? Yes, if the constructor throws. That's a bug in LeveldbIterator, since the whole point of that class is to wrap the underlying iterators and translate the runtime DBExceptions into IOExceptions. Arguably we should do the same for DB so clients don't have to keep catching and translating DBException, but that's for another JIRA. +1 for the latest patch. Committing this.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #9615 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9615/)
        YARN-4924. NM recovery race can lead to container not cleaned up. (jlowe: rev 3150ae8108a1fc40a67926be6254824c1e37cb38)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9615 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9615/ ) YARN-4924 . NM recovery race can lead to container not cleaned up. (jlowe: rev 3150ae8108a1fc40a67926be6254824c1e37cb38) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
        Hide
        jlowe Jason Lowe added a comment -

        Thanks, sandflee! I committed this to trunk, branch-2, branch-2.8, and branch-2.7.

        Show
        jlowe Jason Lowe added a comment - Thanks, sandflee ! I committed this to trunk, branch-2, branch-2.8, and branch-2.7.
        Hide
        jlowe Jason Lowe added a comment -

        Filed YARN-4960 for the LeveldbIterator constructor issue and YARN-4961 for the DB wrapper improvement.

        Show
        jlowe Jason Lowe added a comment - Filed YARN-4960 for the LeveldbIterator constructor issue and YARN-4961 for the DB wrapper improvement.
        Hide
        sandflee sandflee added a comment -

        thanks Jason Lowe for viewing and suggest, thanks Nathan Roberts for reporting this.

        Show
        sandflee sandflee added a comment - thanks Jason Lowe for viewing and suggest, thanks Nathan Roberts for reporting this.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        Closing the JIRA as part of 2.7.3 release.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.

          People

          • Assignee:
            sandflee sandflee
            Reporter:
            nroberts Nathan Roberts
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development