Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6846

Nodemanager can fail to fully delete application local directories when applications are killed

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.8.1
    • Fix Version/s: 2.9.0, 3.0.0-beta1, 2.8.2
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:

      Description

      When an application is killed all of the running containers are killed and the app waits for the containers to complete before cleaning up. As each container completes the container directory is deleted via the DeletionService. After all containers have completed the app completes and the app directory is deleted. If the app completes quickly enough then the deletion of the container and app directories can race against each other. If the container deletion executor deletes a file just before the application deletion executor then it can cause the application deletion executor to fail, leaving the remaining entries in the application directory lingering.

      1. YARN-6846.001.patch
        8 kB
        Jason Lowe
      2. YARN-6846.002.patch
        8 kB
        Jason Lowe
      3. YARN-6846.003.patch
        8 kB
        Jason Lowe

        Activity

        Hide
        jlowe Jason Lowe added a comment -

        Sample log from a 2.8-based release. In this case I believe nftw is returning FTW_NS as a file type since the file was in the directory list but is no longer stat-able because it has been removed by the other container-executor. FTW_NS is not handled by the switch statement in nftw_cb and results in the "Internal error" message.

        2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1496686551678_5664018 transitioned from FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP
        2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_e03_1496686551678_5664018_01_027791
        2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1496686551678_5664018
        2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO yarn.YarnShuffleService: Stopping container container_e03_1496686551678_5664018_01_027791
        2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1496686551678_5664018
        2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO yarn.YarnShuffleService: Stopping application application_1496686551678_5664018
        2017-06-30 13:43:47,397 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1496686551678_5664018 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
        2017-06-30 13:43:47,734 [DeletionService #3] INFO nodemanager.LinuxContainerExecutor: Deleting absolute path : /.../appcache/application_1496686551678_5664018/container_e03_1496686551678_5664018_01_027791
        2017-06-30 13:43:47,746 [DeletionService #0] INFO nodemanager.LinuxContainerExecutor: Deleting absolute path : /.../appcache/application_1496686551678_5664018
        2017-06-30 13:43:48,990 [DeletionService #0] WARN privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 255. Privileged Execution Operation Output: 
        main : command provided 3
        main : run as user is ...
        main : requested yarn user is ...
        Internal error deleting /.../appcache/application_1496686551678_5664018/container_e03_1496686551678_5664018_01_027791
        Error in nftw while deleting /.../appcache/application_1496686551678_5664018
        Couldn't delete directory /.../appcache/application_1496686551678_5664018 - Directory not empty
        

        The deletion code has changed in 2.9, but I believe it too will fail if files are deleted out from underneath it. Minimally we need to make the deletion more robust to errors, and it should try to delete as much of the directory tree as possible rather than giving up on the first error and leaking the rest of the tree.

        Show
        jlowe Jason Lowe added a comment - Sample log from a 2.8-based release. In this case I believe nftw is returning FTW_NS as a file type since the file was in the directory list but is no longer stat-able because it has been removed by the other container-executor. FTW_NS is not handled by the switch statement in nftw_cb and results in the "Internal error" message. 2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1496686551678_5664018 transitioned from FINISHING_CONTAINERS_WAIT to APPLICATION_RESOURCES_CLEANINGUP 2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_e03_1496686551678_5664018_01_027791 2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1496686551678_5664018 2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO yarn.YarnShuffleService: Stopping container container_e03_1496686551678_5664018_01_027791 2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1496686551678_5664018 2017-06-30 13:43:47,396 [AsyncDispatcher event handler] INFO yarn.YarnShuffleService: Stopping application application_1496686551678_5664018 2017-06-30 13:43:47,397 [AsyncDispatcher event handler] INFO application.ApplicationImpl: Application application_1496686551678_5664018 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED 2017-06-30 13:43:47,734 [DeletionService #3] INFO nodemanager.LinuxContainerExecutor: Deleting absolute path : /.../appcache/application_1496686551678_5664018/container_e03_1496686551678_5664018_01_027791 2017-06-30 13:43:47,746 [DeletionService #0] INFO nodemanager.LinuxContainerExecutor: Deleting absolute path : /.../appcache/application_1496686551678_5664018 2017-06-30 13:43:48,990 [DeletionService #0] WARN privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 255. Privileged Execution Operation Output: main : command provided 3 main : run as user is ... main : requested yarn user is ... Internal error deleting /.../appcache/application_1496686551678_5664018/container_e03_1496686551678_5664018_01_027791 Error in nftw while deleting /.../appcache/application_1496686551678_5664018 Couldn't delete directory /.../appcache/application_1496686551678_5664018 - Directory not empty The deletion code has changed in 2.9, but I believe it too will fail if files are deleted out from underneath it. Minimally we need to make the deletion more robust to errors, and it should try to delete as much of the directory tree as possible rather than giving up on the first error and leaking the rest of the tree.
        Hide
        jlowe Jason Lowe added a comment -

        Attaching a patch that makes the container-executor more tolerant of paths being already deleted when trying to delete a hierarchy. It also changes the deletion code to be best-effort by attempting to delete other entries even if unlinking one of the entries encountered an error.

        Show
        jlowe Jason Lowe added a comment - Attaching a patch that makes the container-executor more tolerant of paths being already deleted when trying to delete a hierarchy. It also changes the deletion code to be best-effort by attempting to delete other entries even if unlinking one of the entries encountered an error.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 19s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
              trunk Compile Tests
        +1 mvninstall 15m 22s trunk passed
        +1 compile 0m 33s trunk passed
        +1 mvnsite 0m 29s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 30s the patch passed
        +1 compile 0m 30s the patch passed
        +1 cc 0m 30s the patch passed
        +1 javac 0m 30s the patch passed
        +1 mvnsite 0m 28s the patch passed
        -1 whitespace 0m 0s The patch 2 line(s) with tabs.
              Other Tests
        +1 unit 13m 20s hadoop-yarn-server-nodemanager in the patch passed.
        +1 asflicense 0m 24s The patch does not generate ASF License warnings.
        32m 20s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue YARN-6846
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878367/YARN-6846.001.patch
        Optional Tests asflicense compile cc mvnsite javac unit
        uname Linux 9e24cebe138f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 8e3a992
        Default Java 1.8.0_131
        whitespace https://builds.apache.org/job/PreCommit-YARN-Build/16512/artifact/patchprocess/whitespace-tabs.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16512/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/16512/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 15m 22s trunk passed +1 compile 0m 33s trunk passed +1 mvnsite 0m 29s trunk passed       Patch Compile Tests +1 mvninstall 0m 30s the patch passed +1 compile 0m 30s the patch passed +1 cc 0m 30s the patch passed +1 javac 0m 30s the patch passed +1 mvnsite 0m 28s the patch passed -1 whitespace 0m 0s The patch 2 line(s) with tabs.       Other Tests +1 unit 13m 20s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 24s The patch does not generate ASF License warnings. 32m 20s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6846 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878367/YARN-6846.001.patch Optional Tests asflicense compile cc mvnsite javac unit uname Linux 9e24cebe138f 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8e3a992 Default Java 1.8.0_131 whitespace https://builds.apache.org/job/PreCommit-YARN-Build/16512/artifact/patchprocess/whitespace-tabs.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16512/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16512/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Patch to address the whitespace issues.

        Show
        jlowe Jason Lowe added a comment - Patch to address the whitespace issues.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
              trunk Compile Tests
        +1 mvninstall 14m 34s trunk passed
        +1 compile 0m 29s trunk passed
        +1 mvnsite 0m 27s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 24s the patch passed
        +1 compile 0m 26s the patch passed
        +1 cc 0m 26s the patch passed
        +1 javac 0m 26s the patch passed
        +1 mvnsite 0m 25s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
              Other Tests
        +1 unit 13m 2s hadoop-yarn-server-nodemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        30m 41s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue YARN-6846
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878379/YARN-6846.002.patch
        Optional Tests asflicense compile cc mvnsite javac unit
        uname Linux 62b68d74e024 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 8e3a992
        Default Java 1.8.0_131
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16514/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/16514/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 14m 34s trunk passed +1 compile 0m 29s trunk passed +1 mvnsite 0m 27s trunk passed       Patch Compile Tests +1 mvninstall 0m 24s the patch passed +1 compile 0m 26s the patch passed +1 cc 0m 26s the patch passed +1 javac 0m 26s the patch passed +1 mvnsite 0m 25s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues.       Other Tests +1 unit 13m 2s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 30m 41s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6846 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878379/YARN-6846.002.patch Optional Tests asflicense compile cc mvnsite javac unit uname Linux 62b68d74e024 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8e3a992 Default Java 1.8.0_131 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16514/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16514/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Tweaking the patch to remove some C99-isms so the patch is easier to pull back to older branches.

        Show
        jlowe Jason Lowe added a comment - Tweaking the patch to remove some C99-isms so the patch is easier to pull back to older branches.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
              trunk Compile Tests
        -1 mvninstall 16m 12s root in trunk failed.
        +1 compile 0m 30s trunk passed
        +1 mvnsite 0m 29s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 24s the patch passed
        +1 compile 0m 26s the patch passed
        +1 cc 0m 26s the patch passed
        +1 javac 0m 26s the patch passed
        +1 mvnsite 0m 27s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
              Other Tests
        +1 unit 13m 11s hadoop-yarn-server-nodemanager in the patch passed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        32m 34s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue YARN-6846
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878410/YARN-6846.003.patch
        Optional Tests asflicense compile cc mvnsite javac unit
        uname Linux 840b3db025d4 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 465c213
        Default Java 1.8.0_131
        mvninstall https://builds.apache.org/job/PreCommit-YARN-Build/16518/artifact/patchprocess/branch-mvninstall-root.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16518/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/16518/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests -1 mvninstall 16m 12s root in trunk failed. +1 compile 0m 30s trunk passed +1 mvnsite 0m 29s trunk passed       Patch Compile Tests +1 mvninstall 0m 24s the patch passed +1 compile 0m 26s the patch passed +1 cc 0m 26s the patch passed +1 javac 0m 26s the patch passed +1 mvnsite 0m 27s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues.       Other Tests +1 unit 13m 11s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 32m 34s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6846 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12878410/YARN-6846.003.patch Optional Tests asflicense compile cc mvnsite javac unit uname Linux 840b3db025d4 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 465c213 Default Java 1.8.0_131 mvninstall https://builds.apache.org/job/PreCommit-YARN-Build/16518/artifact/patchprocess/branch-mvninstall-root.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16518/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16518/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        ebadger Eric Badger added a comment -

        Patch looks good overall to me. I only have 1 potential concern, which is that the unit test doesn't actually force the race, just relies on enough iterations eventually hitting the race condition. However, in practice the test has always failed for me without the fix and passed with it so I think it's ok. Other than chameleon coding along with some weird existing style (e.g. ret == -ENOENT vs ret == ENOENT), everything else looks good. I'll give this my +1 (non-binding)

        Show
        ebadger Eric Badger added a comment - Patch looks good overall to me. I only have 1 potential concern, which is that the unit test doesn't actually force the race, just relies on enough iterations eventually hitting the race condition. However, in practice the test has always failed for me without the fix and passed with it so I think it's ok. Other than chameleon coding along with some weird existing style (e.g. ret == -ENOENT vs ret == ENOENT ), everything else looks good. I'll give this my +1 (non-binding)
        Hide
        eepayne Eric Payne added a comment -

        The patch looks good in general, but there is one thing I'd like to clear up.

        If I'm reading the man pages correctly for geteuid(), seteuid(), and readdir(), they don't generate ENOENT. If that is true, then the following changes are not necessary:

        @@ -1837,7 +1837,7 @@ static int rmdir_as_nm(const char* path) {
        @@ -1985,7 +2005,7 @@ static int recursive_unlink_helper(int dirfd, const char *name,
        

        Thoughts?

        Show
        eepayne Eric Payne added a comment - The patch looks good in general, but there is one thing I'd like to clear up. If I'm reading the man pages correctly for geteuid() , seteuid() , and readdir() , they don't generate ENOENT . If that is true, then the following changes are not necessary: @@ -1837,7 +1837,7 @@ static int rmdir_as_nm(const char* path) { @@ -1985,7 +2005,7 @@ static int recursive_unlink_helper(int dirfd, const char *name, Thoughts?
        Hide
        ebadger Eric Badger added a comment -

        If I'm reading the man pages correctly for geteuid(), seteuid(), and readdir(), they don't generate ENOENT

        For geteuid() and seteuid(), these aren't the methods that are setting errno in the code change in the first block referenced (1837).

        -    if (rmdir(path) != 0) {
        +    if (rmdir(path) != 0 && errno != ENOENT) {
        

        rmdir(path) is what sets errno here and can return ENOENT.

        As far as readdir() goes, it looks like posix has it returning ENOENT, while Linux doesn't. I think it's better to go with Posix here, but I'll refer to Jason Lowe on that.
        http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
        http://man7.org/linux/man-pages/man3/readdir.3.html

        Show
        ebadger Eric Badger added a comment - If I'm reading the man pages correctly for geteuid(), seteuid(), and readdir(), they don't generate ENOENT For geteuid() and seteuid() , these aren't the methods that are setting errno in the code change in the first block referenced (1837). - if (rmdir(path) != 0) { + if (rmdir(path) != 0 && errno != ENOENT) { rmdir(path) is what sets errno here and can return ENOENT . As far as readdir() goes, it looks like posix has it returning ENOENT , while Linux doesn't. I think it's better to go with Posix here, but I'll refer to Jason Lowe on that. http://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html http://man7.org/linux/man-pages/man3/readdir.3.html
        Hide
        eepayne Eric Payne added a comment -

        these aren't the methods that are setting errno in the code change in the first block referenced (1837).

        -    if (rmdir(path) != 0) {
        +    if (rmdir(path) != 0 && errno != ENOENT) {
        

        rmdir(path) is what sets errno here and can return ENOENT.

        Thanks Eric Badger. You are correct. I misread the code.

        Show
        eepayne Eric Payne added a comment - these aren't the methods that are setting errno in the code change in the first block referenced (1837). - if (rmdir(path) != 0) { + if (rmdir(path) != 0 && errno != ENOENT) { rmdir(path) is what sets errno here and can return ENOENT. Thanks Eric Badger . You are correct. I misread the code.
        Hide
        jlowe Jason Lowe added a comment -

        I think it's better to go with Posix here

        Agreed. It's harmless to the code execution if it does not return ENOENT, but if it does (e.g.: this code is ported to a POSIX-compliant platform) then this prevents the executor from flagging an error from something being already deleted when trying to do a delete.

        Show
        jlowe Jason Lowe added a comment - I think it's better to go with Posix here Agreed. It's harmless to the code execution if it does not return ENOENT, but if it does (e.g.: this code is ported to a POSIX-compliant platform) then this prevents the executor from flagging an error from something being already deleted when trying to do a delete.
        Hide
        eepayne Eric Payne added a comment -

        Agreed. It's harmless to the code execution if it does not return ENOENT, but if it does (e.g.: this code is ported to a POSIX-compliant platform) then this prevents the executor from flagging an error from something being already deleted when trying to do a delete.

        Great. Thanks Jason Lowe for the fix and Eric Badger for the review.

        +1

        I will commit this shortly.

        Show
        eepayne Eric Payne added a comment - Agreed. It's harmless to the code execution if it does not return ENOENT, but if it does (e.g.: this code is ported to a POSIX-compliant platform) then this prevents the executor from flagging an error from something being already deleted when trying to do a delete. Great. Thanks Jason Lowe for the fix and Eric Badger for the review. +1 I will commit this shortly.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12103 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12103/)
        YARN-6846. Nodemanager can fail to fully delete application local (epayne: rev 48899134d2a77935a821072b5388ab1b1b7b399c)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12103 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12103/ ) YARN-6846 . Nodemanager can fail to fully delete application local (epayne: rev 48899134d2a77935a821072b5388ab1b1b7b399c) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c

          People

          • Assignee:
            jlowe Jason Lowe
            Reporter:
            jlowe Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development