Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4594

container-executor fails to remove directory tree when chmod required

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha1, 2.8.2
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      test-container-executor.c doesn't work:

      • It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually /usr/bin/ls on many systems.
      • The recursive delete logic in container-executor.c fails – nftw does the wrong thing when confronted with directories with the wrong mode (permission bits), leading to an attempt to run rmdir on a non-empty directory.
      1. YARN-4594.001.patch
        9 kB
        Colin P. McCabe
      2. YARN-4594.002.patch
        9 kB
        Colin P. McCabe
      3. YARN-4594.003.patch
        9 kB
        Colin P. McCabe
      4. YARN-4594.004.patch
        9 kB
        Colin P. McCabe

        Issue Links

          Activity

          Hide
          jlowe Jason Lowe added a comment -

          I committed this to branch-2.8 and branch-2.8.2 as well.

          Show
          jlowe Jason Lowe added a comment - I committed this to branch-2.8 and branch-2.8.2 as well.
          Hide
          aw Allen Wittenauer added a comment -

          As far as I know, INFRA doesn't have a license to run MacOS in a virtual machine, so we can't do precommit testing on the existing infrastructure.

          Fixed.

          https://builds.apache.org/view/H-L/view/Hadoop/job/Precommit-HADOOP-OSX

          Show
          aw Allen Wittenauer added a comment - As far as I know, INFRA doesn't have a license to run MacOS in a virtual machine, so we can't do precommit testing on the existing infrastructure. Fixed. https://builds.apache.org/view/H-L/view/Hadoop/job/Precommit-HADOOP-OSX
          Hide
          jlowe Jason Lowe added a comment -

          It would be nice to have the ability to target a specific Mac OS X SDK version in general. However I agree with Colin that it doesn't really make sense to have the Mac build compiling a program that's never intended to run on the Mac in the first place. It may happen to compile by chance, but we don't want to get into a situation where we keep accidentally breaking it as happened here. Therefore I think the real long-term fix is to patch the Hadoop build to avoid compiling container-executor on the Mac.

          Show
          jlowe Jason Lowe added a comment - It would be nice to have the ability to target a specific Mac OS X SDK version in general. However I agree with Colin that it doesn't really make sense to have the Mac build compiling a program that's never intended to run on the Mac in the first place. It may happen to compile by chance, but we don't want to get into a situation where we keep accidentally breaking it as happened here. Therefore I think the real long-term fix is to patch the Hadoop build to avoid compiling container-executor on the Mac.
          Hide
          cnauroth Chris Nauroth added a comment -

          Colin P. McCabe, no worries, and no finger pointing intended. I only meant to document what I had found here in case other Mac users stumble on the same issue.

          FWIW, I'd prefer not to patch the Hadoop source at all and instead find some external way to target the 10.10 SDK, where the constant is defined.

          Show
          cnauroth Chris Nauroth added a comment - Colin P. McCabe , no worries, and no finger pointing intended. I only meant to document what I had found here in case other Mac users stumble on the same issue. FWIW, I'd prefer not to patch the Hadoop source at all and instead find some external way to target the 10.10 SDK, where the constant is defined.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Sorry for any inconvenience, Chris Nauroth. Unfortunately, I don't own or have access to a Mac. As far as I know, INFRA doesn't have a license to run MacOS in a virtual machine, so we can't do precommit testing on the existing infrastructure. If you post a patch to skip compilation on Mac I will review it.

          Show
          cmccabe Colin P. McCabe added a comment - Sorry for any inconvenience, Chris Nauroth . Unfortunately, I don't own or have access to a Mac. As far as I know, INFRA doesn't have a license to run MacOS in a virtual machine, so we can't do precommit testing on the existing infrastructure. If you post a patch to skip compilation on Mac I will review it.
          Hide
          cnauroth Chris Nauroth added a comment -

          I don't actually run it on Mac. The impact though is that I can no longer do a full distro build of the source tree with -Pnative. libhadoop.dylib is at least partially functional.

          Show
          cnauroth Chris Nauroth added a comment - I don't actually run it on Mac. The impact though is that I can no longer do a full distro build of the source tree with -Pnative . libhadoop.dylib is at least partially functional.
          Hide
          cmccabe Colin P. McCabe added a comment -

          I think this application doesn't work on Mac anyway, since it is using cgroups stuff which Mac won't have. So skipping should be completely fine.

          Show
          cmccabe Colin P. McCabe added a comment - I think this application doesn't work on Mac anyway, since it is using cgroups stuff which Mac won't have. So skipping should be completely fine.
          Hide
          cnauroth Chris Nauroth added a comment -

          This patch broke compilation on Mac OS X 10.9, where the SDK does not include a definition of AT_REMOVEDIR in fcntl.h or unistd.h. If you have the 10.10 SDK installed, then the header does have AT_REMOVEDIR. (See /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/sys/fcntl.h). I haven't yet figured out if there is an easy way Mac users can set it to cross-compile with the 10.10 SDK as a workaround. For now, I'm just going to have to skip this part of the build on Mac.

          Show
          cnauroth Chris Nauroth added a comment - This patch broke compilation on Mac OS X 10.9, where the SDK does not include a definition of AT_REMOVEDIR in fcntl.h or unistd.h. If you have the 10.10 SDK installed, then the header does have AT_REMOVEDIR . (See /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/include/sys/fcntl.h). I haven't yet figured out if there is an easy way Mac users can set it to cross-compile with the 10.10 SDK as a workaround. For now, I'm just going to have to skip this part of the build on Mac.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, Jason Lowe.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, Jason Lowe .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9239 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9239/)
          YARN-4594. container-executor fails to remove directory tree when chmod (jlowe: rev fa328e2d39eda1c479389b99a5c121e640a1e0ad)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
          • hadoop-yarn-project/CHANGES.txt
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9239 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9239/ ) YARN-4594 . container-executor fails to remove directory tree when chmod (jlowe: rev fa328e2d39eda1c479389b99a5c121e640a1e0ad) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
          Hide
          jlowe Jason Lowe added a comment -

          Thanks, Colin! I committed this to trunk and branch-2.

          Show
          jlowe Jason Lowe added a comment - Thanks, Colin! I committed this to trunk and branch-2.
          Hide
          jlowe Jason Lowe added a comment -

          +1 lgtm. Committing this.

          Show
          jlowe Jason Lowe added a comment - +1 lgtm. Committing this.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 8s Maven dependency ordering for branch
          +1 mvninstall 6m 49s trunk passed
          +1 compile 0m 22s trunk passed with JDK v1.8.0_66
          +1 compile 0m 25s trunk passed with JDK v1.7.0_91
          +1 mvnsite 0m 28s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          0 mvndep 0m 7s Maven dependency ordering for patch
          +1 mvninstall 0m 24s the patch passed
          +1 compile 0m 22s the patch passed with JDK v1.8.0_66
          +1 cc 0m 22s the patch passed
          +1 javac 0m 22s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.7.0_91
          +1 cc 0m 24s the patch passed
          +1 javac 0m 24s the patch passed
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 unit 8m 48s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 9m 17s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 16s Patch does not generate ASF License warnings.
          29m 6s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12786037/YARN-4594.004.patch
          JIRA Issue YARN-4594
          Optional Tests asflicense compile cc mvnsite javac unit
          uname Linux 5021c6c9ce40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 1adb64e
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10482/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10482/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 8s Maven dependency ordering for branch +1 mvninstall 6m 49s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_66 +1 compile 0m 25s trunk passed with JDK v1.7.0_91 +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 12s trunk passed 0 mvndep 0m 7s Maven dependency ordering for patch +1 mvninstall 0m 24s the patch passed +1 compile 0m 22s the patch passed with JDK v1.8.0_66 +1 cc 0m 22s the patch passed +1 javac 0m 22s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_91 +1 cc 0m 24s the patch passed +1 javac 0m 24s the patch passed +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 unit 8m 48s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 9m 17s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 16s Patch does not generate ASF License warnings. 29m 6s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12786037/YARN-4594.004.patch JIRA Issue YARN-4594 Optional Tests asflicense compile cc mvnsite javac unit uname Linux 5021c6c9ce40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1adb64e Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10482/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10482/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Sigh. The negation is a really hard habit to break... it's the pattern for how errors are handled in the kernel. This should fix it.

          I also changed it to use "fullpath" when printing error messages, to make it easier to figure out which file had a problem.

          Fixed the whitespace nits as well.

          Thanks

          Show
          cmccabe Colin P. McCabe added a comment - Sigh. The negation is a really hard habit to break... it's the pattern for how errors are handled in the kernel. This should fix it. I also changed it to use "fullpath" when printing error messages, to make it easier to figure out which file had a problem. Fixed the whitespace nits as well. Thanks
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for updating the patch! There's just a couple of remaining bugs, both related to remnants from when error codes were negated:

            ret = recursive_unlink_children(full_path);
            if (ret == ENOENT) {
              return 0;
            }
            if (ret != 0) {
              fprintf(LOGFILE, "Error while deleting %s: %d (%s)\n",
                      full_path, -ret, strerror(-ret));
          

          It's negating ret when it shouldn't at the fprintf call. Same thing for the following instance:

            if (rmdir(full_path) != 0) {
              ret = errno;
              if (ret != ENOENT) {
                fprintf(LOGFILE, "Couldn't delete directory %s - %s\n",
                        full_path, strerror(-ret));
          

          It would also be nice to cleanup the whitespace nits, although it's no trouble cleaning those up as part of the commit.

          Show
          jlowe Jason Lowe added a comment - Thanks for updating the patch! There's just a couple of remaining bugs, both related to remnants from when error codes were negated: ret = recursive_unlink_children(full_path); if (ret == ENOENT) { return 0; } if (ret != 0) { fprintf(LOGFILE, "Error while deleting %s: %d (%s)\n" , full_path, -ret, strerror(-ret)); It's negating ret when it shouldn't at the fprintf call. Same thing for the following instance: if (rmdir(full_path) != 0) { ret = errno; if (ret != ENOENT) { fprintf(LOGFILE, "Couldn't delete directory %s - %s\n" , full_path, strerror(-ret)); It would also be nice to cleanup the whitespace nits, although it's no trouble cleaning those up as part of the commit.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          0 mvndep 0m 12s Maven dependency ordering for branch
          +1 mvninstall 6m 33s trunk passed
          +1 compile 0m 22s trunk passed with JDK v1.8.0_66
          +1 compile 0m 26s trunk passed with JDK v1.7.0_91
          +1 mvnsite 0m 28s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          0 mvndep 0m 8s Maven dependency ordering for patch
          +1 mvninstall 0m 24s the patch passed
          +1 compile 0m 20s the patch passed with JDK v1.8.0_66
          +1 cc 0m 20s the patch passed
          +1 javac 0m 20s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.7.0_91
          +1 cc 0m 23s the patch passed
          +1 javac 0m 23s the patch passed
          +1 mvnsite 0m 27s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 unit 8m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 9m 8s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 17s Patch does not generate ASF License warnings.
          28m 30s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785636/YARN-4594.003.patch
          JIRA Issue YARN-4594
          Optional Tests asflicense compile cc mvnsite javac unit
          uname Linux 9339e8578e55 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / ed55950
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/10464/artifact/patchprocess/whitespace-eol.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10464/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10464/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. 0 mvndep 0m 12s Maven dependency ordering for branch +1 mvninstall 6m 33s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_66 +1 compile 0m 26s trunk passed with JDK v1.7.0_91 +1 mvnsite 0m 28s trunk passed +1 mvneclipse 0m 13s trunk passed 0 mvndep 0m 8s Maven dependency ordering for patch +1 mvninstall 0m 24s the patch passed +1 compile 0m 20s the patch passed with JDK v1.8.0_66 +1 cc 0m 20s the patch passed +1 javac 0m 20s the patch passed +1 compile 0m 23s the patch passed with JDK v1.7.0_91 +1 cc 0m 23s the patch passed +1 javac 0m 23s the patch passed +1 mvnsite 0m 27s the patch passed +1 mvneclipse 0m 10s the patch passed -1 whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 unit 8m 35s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 9m 8s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 17s Patch does not generate ASF License warnings. 28m 30s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785636/YARN-4594.003.patch JIRA Issue YARN-4594 Optional Tests asflicense compile cc mvnsite javac unit uname Linux 9339e8578e55 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ed55950 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 whitespace https://builds.apache.org/job/PreCommit-YARN-Build/10464/artifact/patchprocess/whitespace-eol.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10464/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10464/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for the careful review, Jason Lowe.

          The unit test fails, probably because we now deference NULL instead of terminating the loop in this code. When readdir returns NULL and there's no error, the code will try to dereference a null de.

          Ah, sorry. There is a missing break, let me fix that.

          The code can end up returning success despite an allocation failure because it doesn't initialize ret in this error case:

          Good catch. ret is initialized at the top of the function to prevent undefined behavior, but it should be changed to ENOMEM there to reflect the memory allocation failure.

          This code would be simpler to read if it didn't negate the error when trying to deal with it:

          changed

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for the careful review, Jason Lowe . The unit test fails, probably because we now deference NULL instead of terminating the loop in this code. When readdir returns NULL and there's no error, the code will try to dereference a null de. Ah, sorry. There is a missing break , let me fix that. The code can end up returning success despite an allocation failure because it doesn't initialize ret in this error case: Good catch. ret is initialized at the top of the function to prevent undefined behavior, but it should be changed to ENOMEM there to reflect the memory allocation failure. This code would be simpler to read if it didn't negate the error when trying to deal with it: changed
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for updating the patch!

          I can see places in the code that rely on cgroups, which is a kernel feature that MacOS just doesn't have (and may never have).

          Yeah, it cannot work in practice on MacOS from this and other stuff in there. Sorry I should have noticed that. I suppose there's a chance it happens to compile in case someone is doing a native build on MacOS X, but arguably we should just have those builds skip container-executor if they already aren't doing so.

          Comments on the latest patch:

          The unit test fails, probably because we now deference NULL instead of terminating the loop in this code. When readdir returns NULL and there's no error, the code will try to dereference a null de.

              while (1) {
                struct dirent *de;
                char *new_fullpath = NULL;
                
                errno = 0;
                de = readdir(dfd);
                if (!de) {
                  ret = errno;
                  if (ret) {
                    fprintf(LOGFILE, "readdir(%s) failed: %s\n", name, strerror(ret));
                    goto done;
                  }
                }
                if (!strcmp(de->d_name, ".")) {
          

          The code can end up returning success despite an allocation failure because it doesn't initialize ret in this error case:

                if (asprintf(&new_fullpath, "%s/%s", fullpath, de->d_name) < 0) {
                  fprintf(LOGFILE, "Failed to allocate string for %s/%s.\n",
                          fullpath, de->d_name); 
                  goto done;
                }
          

          recursive_unlink_helper and recursive_unlink_children now returns normal error codes, but this is still expecting negative codes:

            ret = recursive_unlink_children(full_path);
            if (ret == -ENOENT) {
              return 0;
            }
          

          This code would be simpler to read if it didn't negate the error when trying to deal with it:

            if (rmdir(full_path) != 0) {
              ret = -errno;
              if (ret != -ENOENT) {
                fprintf(LOGFILE, "Couldn't delete directory %s - %s\n",
                        full_path, strerror(-ret));
                return -1;
              }
            }
          
          Show
          jlowe Jason Lowe added a comment - Thanks for updating the patch! I can see places in the code that rely on cgroups, which is a kernel feature that MacOS just doesn't have (and may never have). Yeah, it cannot work in practice on MacOS from this and other stuff in there. Sorry I should have noticed that. I suppose there's a chance it happens to compile in case someone is doing a native build on MacOS X, but arguably we should just have those builds skip container-executor if they already aren't doing so. Comments on the latest patch: The unit test fails, probably because we now deference NULL instead of terminating the loop in this code. When readdir returns NULL and there's no error, the code will try to dereference a null de. while (1) { struct dirent *de; char *new_fullpath = NULL; errno = 0; de = readdir(dfd); if (!de) { ret = errno; if (ret) { fprintf(LOGFILE, "readdir(%s) failed: %s\n" , name, strerror(ret)); goto done; } } if (!strcmp(de->d_name, "." )) { The code can end up returning success despite an allocation failure because it doesn't initialize ret in this error case: if (asprintf(&new_fullpath, "%s/%s" , fullpath, de->d_name) < 0) { fprintf(LOGFILE, "Failed to allocate string for %s/%s.\n" , fullpath, de->d_name); goto done; } recursive_unlink_helper and recursive_unlink_children now returns normal error codes, but this is still expecting negative codes: ret = recursive_unlink_children(full_path); if (ret == -ENOENT) { return 0; } This code would be simpler to read if it didn't negate the error when trying to deal with it: if (rmdir(full_path) != 0) { ret = -errno; if (ret != -ENOENT) { fprintf(LOGFILE, "Couldn't delete directory %s - %s\n" , full_path, strerror(-ret)); return -1; } }
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 25s trunk passed
          +1 compile 0m 22s trunk passed with JDK v1.8.0_66
          +1 compile 0m 25s trunk passed with JDK v1.7.0_91
          +1 mvnsite 0m 29s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 mvninstall 0m 23s the patch passed
          +1 compile 0m 19s the patch passed with JDK v1.8.0_66
          +1 cc 0m 19s the patch passed
          +1 javac 0m 19s the patch passed
          +1 compile 0m 23s the patch passed with JDK v1.7.0_91
          +1 cc 0m 23s the patch passed
          +1 javac 0m 23s the patch passed
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix.
          -1 unit 8m 27s hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_66.
          -1 unit 9m 0s hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 17s Patch does not generate ASF License warnings.
          27m 45s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785255/YARN-4594.002.patch
          JIRA Issue YARN-4594
          Optional Tests asflicense compile cc mvnsite javac unit
          uname Linux 9c29722bdde3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 772ea7b
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/10442/artifact/patchprocess/whitespace-eol.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10442/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/10442/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10442/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10442/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 25s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_66 +1 compile 0m 25s trunk passed with JDK v1.7.0_91 +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 13s trunk passed +1 mvninstall 0m 23s the patch passed +1 compile 0m 19s the patch passed with JDK v1.8.0_66 +1 cc 0m 19s the patch passed +1 javac 0m 19s the patch passed +1 compile 0m 23s the patch passed with JDK v1.7.0_91 +1 cc 0m 23s the patch passed +1 javac 0m 23s the patch passed +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 10s the patch passed -1 whitespace 0m 1s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. -1 unit 8m 27s hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_66. -1 unit 9m 0s hadoop-yarn-server-nodemanager in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 17s Patch does not generate ASF License warnings. 27m 45s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12785255/YARN-4594.002.patch JIRA Issue YARN-4594 Optional Tests asflicense compile cc mvnsite javac unit uname Linux 9c29722bdde3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 772ea7b Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 whitespace https://builds.apache.org/job/PreCommit-YARN-Build/10442/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10442/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/10442/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10442/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10442/console This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks for the review, Jason Lowe.

          I noticed we're using openat, fchmodat, and unlinkat for the first time. I suspect most other POSIX-like distributions support this, but I think these were only recently added to MacOS X (in 10.9 Yosemite). I'm not sure if anyone uses container-executor for MacOS X (or if container-executor even compiles/works for MacOS X today). but adding these could break the native build for those using older MacOS X versions.

          Hmm. Correct me if I'm wrong, but I don't think container-executor is supported at all on MacOS. I can see places in the code that rely on cgroups, which is a kernel feature that MacOS just doesn't have (and may never have). You can see a bunch of code in container-executor.c dealing with very linux specific files in /proc, and so forth. My thinking is that if we ever do support MacOS in container-executor, we will support a new enough version that using the newer POSIX functions is not a problem.

          I think this needs to use strerror(-fd):

          Fixed

          There's no check for an error being encountered by readdir and therefore no logging if it does occur

          Fixed

          Sometimes recursive_unlink_helper is returning errno and sometimes it is returning -errno. For example, the "failed to stat" path will set ret=errno and return -ret as -errno, but the "failed to unlink" path will set ret=-errno and thus return errno.

          Good catch. Let's return positive error codes everywhere, except in the very specific case of open_helper where non-negative returns mean "file descriptor".

          Show
          cmccabe Colin P. McCabe added a comment - Thanks for the review, Jason Lowe . I noticed we're using openat, fchmodat, and unlinkat for the first time. I suspect most other POSIX-like distributions support this, but I think these were only recently added to MacOS X (in 10.9 Yosemite). I'm not sure if anyone uses container-executor for MacOS X (or if container-executor even compiles/works for MacOS X today). but adding these could break the native build for those using older MacOS X versions. Hmm. Correct me if I'm wrong, but I don't think container-executor is supported at all on MacOS. I can see places in the code that rely on cgroups, which is a kernel feature that MacOS just doesn't have (and may never have). You can see a bunch of code in container-executor.c dealing with very linux specific files in /proc , and so forth. My thinking is that if we ever do support MacOS in container-executor , we will support a new enough version that using the newer POSIX functions is not a problem. I think this needs to use strerror(-fd): Fixed There's no check for an error being encountered by readdir and therefore no logging if it does occur Fixed Sometimes recursive_unlink_helper is returning errno and sometimes it is returning -errno. For example, the "failed to stat" path will set ret=errno and return -ret as -errno, but the "failed to unlink" path will set ret=-errno and thus return errno. Good catch. Let's return positive error codes everywhere, except in the very specific case of open_helper where non-negative returns mean "file descriptor".
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the patch, Colin!

          I noticed we're using openat, fchmodat, and unlinkat for the first time. I suspect most other POSIX-like distributions support this, but I think these were only recently added to MacOS X (in 10.9 Yosemite). I'm not sure if anyone uses container-executor for MacOS X (or if container-executor even compiles/works for MacOS X today). but adding these could break the native build for those using older MacOS X versions. One alternative would be a double-pass with ftw where we walk just the directories first changing permissions then followed by the walk it does today. The directory trees involved are going to be very shallow, so it's probably not a problem in practice if we decided to go that route.

          If we stick with the custom walker then here are some comments on the patch:

          • I think this needs to use strerror(-fd):
              if (fd < 0) {
                fprintf(LOGFILE, "error opening %s: %s\n", name, strerror(ret));
                goto done;
              }
            
          • There's no check for an error being encountered by readdir and therefore no logging if it does occur
          • Sometimes recursive_unlink_helper is returning errno and sometimes it is returning -errno. For example, the "failed to stat" path will set ret=errno and return -ret as -errno, but the "failed to unlink" path will set ret=-errno and thus return errno.
          Show
          jlowe Jason Lowe added a comment - Thanks for the patch, Colin! I noticed we're using openat, fchmodat, and unlinkat for the first time. I suspect most other POSIX-like distributions support this, but I think these were only recently added to MacOS X (in 10.9 Yosemite). I'm not sure if anyone uses container-executor for MacOS X (or if container-executor even compiles/works for MacOS X today). but adding these could break the native build for those using older MacOS X versions. One alternative would be a double-pass with ftw where we walk just the directories first changing permissions then followed by the walk it does today. The directory trees involved are going to be very shallow, so it's probably not a problem in practice if we decided to go that route. If we stick with the custom walker then here are some comments on the patch: I think this needs to use strerror(-fd): if (fd < 0) { fprintf(LOGFILE, "error opening %s: %s\n" , name, strerror(ret)); goto done; } There's no check for an error being encountered by readdir and therefore no logging if it does occur Sometimes recursive_unlink_helper is returning errno and sometimes it is returning -errno. For example, the "failed to stat" path will set ret=errno and return -ret as -errno, but the "failed to unlink" path will set ret=-errno and thus return errno.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 9m 16s trunk passed
          +1 compile 0m 26s trunk passed with JDK v1.8.0_66
          +1 compile 0m 26s trunk passed with JDK v1.7.0_91
          +1 mvnsite 0m 30s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 mvninstall 0m 24s the patch passed
          +1 compile 0m 21s the patch passed with JDK v1.8.0_66
          +1 cc 0m 21s the patch passed
          +1 javac 0m 21s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.7.0_91
          +1 cc 0m 24s the patch passed
          +1 javac 0m 24s the patch passed
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 unit 9m 4s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66.
          +1 unit 9m 15s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          31m 48s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12782374/YARN-4594.001.patch
          JIRA Issue YARN-4594
          Optional Tests asflicense compile cc mvnsite javac unit
          uname Linux e66c5cac93b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / cdf8895
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10292/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10292/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 9m 16s trunk passed +1 compile 0m 26s trunk passed with JDK v1.8.0_66 +1 compile 0m 26s trunk passed with JDK v1.7.0_91 +1 mvnsite 0m 30s trunk passed +1 mvneclipse 0m 14s trunk passed +1 mvninstall 0m 24s the patch passed +1 compile 0m 21s the patch passed with JDK v1.8.0_66 +1 cc 0m 21s the patch passed +1 javac 0m 21s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_91 +1 cc 0m 24s the patch passed +1 javac 0m 24s the patch passed +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 unit 9m 4s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. +1 unit 9m 15s hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 31m 48s Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12782374/YARN-4594.001.patch JIRA Issue YARN-4594 Optional Tests asflicense compile cc mvnsite javac unit uname Linux e66c5cac93b8 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cdf8895 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10292/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-YARN-Build/10292/console This message was automatically generated.

            People

            • Assignee:
              cmccabe Colin P. McCabe
              Reporter:
              cmccabe Colin P. McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development