Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7014

container-executor has off-by-one error which can corrupt the heap

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-beta1
    • Fix Version/s: 3.0.0-beta1
    • Component/s: yarn
    • Labels:
      None

      Description

      test-container-executor is failing in trunk.

      [INFO] 
      [INFO] --- hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test (test-container-executor) @ hadoop-yarn-server-nodemanager ---
      [INFO] -------------------------------------------------------
      [INFO]  C M A K E B U I L D E R    T E S T
      [INFO] -------------------------------------------------------
      [INFO] test-container-executor: running /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
      [INFO] with extra environment variables {}
      [INFO] STATUS: ERROR CODE 134 after 3714 millisecond(s).
      [INFO] -------------------------------------------------------
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD FAILURE
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time: 13:47 min
      [INFO] Finished at: 2017-08-12T12:58:55+00:00
      [INFO] Final Memory: 19M/296M
      [INFO] ------------------------------------------------------------------------
      [WARNING] The requested profile "parallel-tests" could not be activated because it does not exist.
      [WARNING] The requested profile "yarn-ui" could not be activated because it does not exist.
      [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test (test-container-executor) on project hadoop-yarn-server-nodemanager: Test /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor returned ERROR CODE 134 -> [Help 1]
      [ERROR] 
      [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
      [ERROR] Re-run Maven using the -X switch to enable full debug logging.
      [ERROR] 
      [ERROR] For more information about the errors and possible solutions, please read the following articles:
      [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
      

        Issue Links

          Activity

          Hide
          leftnoteasy Wangda Tan added a comment -

          Thanks Jason Lowe/Shane Kumpf/Eric Badger/Nathan Roberts to fix this issue. It was caused by my previous mistake: native unit test result doesn't show in test-report (we can only see it from unit test output), so I commit the problematic patch, apologize to cause the issue.

          Show
          leftnoteasy Wangda Tan added a comment - Thanks Jason Lowe / Shane Kumpf / Eric Badger / Nathan Roberts to fix this issue. It was caused by my previous mistake: native unit test result doesn't show in test-report (we can only see it from unit test output), so I commit the problematic patch, apologize to cause the issue.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12191 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12191/)
          YARN-7014. Fix off-by-one error causing heap corruption (Jason Lowe via (nroberts: rev d265459024b8e5f5eccf421627f684ca8f162112)

          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/string-utils.c
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12191 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12191/ ) YARN-7014 . Fix off-by-one error causing heap corruption (Jason Lowe via (nroberts: rev d265459024b8e5f5eccf421627f684ca8f162112) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/string-utils.c
          Hide
          nroberts Nathan Roberts added a comment -

          +1 on the patch. I will commit shortly.
          Thanks Jason Lowe for the patch and Eric Badger and Shane Kumpf for the reviews!

          Show
          nroberts Nathan Roberts added a comment - +1 on the patch. I will commit shortly. Thanks Jason Lowe for the patch and Eric Badger and Shane Kumpf for the reviews!
          Hide
          ebadger Eric Badger added a comment -

          I'm +1 (non-binding) for the change. Because it's such a specific test and is already covered by the validate_container_id test failing, I'm on board with using the existing test. However, it might not be a bad idea to just generally add in some tests to test-container-executor to make sure that we aren't leaking memory or susceptible to simple overflows from the invocation

          Show
          ebadger Eric Badger added a comment - I'm +1 (non-binding) for the change. Because it's such a specific test and is already covered by the validate_container_id test failing, I'm on board with using the existing test. However, it might not be a bad idea to just generally add in some tests to test-container-executor to make sure that we aren't leaking memory or susceptible to simple overflows from the invocation
          Hide
          shanekumpf@gmail.com Shane Kumpf added a comment -

          Thanks for identifying the issue and the patch Jason Lowe. I have tested the patch locally and it resolves the issue with the test. +1 from me.

          [INFO]
          [INFO] --- hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test (test-container-executor) @ hadoop-yarn-server-nodemanager ---
          [INFO] -------------------------------------------------------
          [INFO]  C M A K E B U I L D E R    T E S T
          [INFO] -------------------------------------------------------
          [INFO] test-container-executor: running /hadoop_staging/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/target/usr/local/bin/test-container-executor
          [INFO] with extra environment variables {}
          [INFO] STATUS: SUCCESS after 5309 millisecond(s).
          [INFO] -------------------------------------------------------
          
          Show
          shanekumpf@gmail.com Shane Kumpf added a comment - Thanks for identifying the issue and the patch Jason Lowe . I have tested the patch locally and it resolves the issue with the test. +1 from me. [INFO] [INFO] --- hadoop-maven-plugins:3.0.0-beta1-SNAPSHOT:cmake-test (test-container-executor) @ hadoop-yarn-server-nodemanager --- [INFO] ------------------------------------------------------- [INFO] C M A K E B U I L D E R T E S T [INFO] ------------------------------------------------------- [INFO] test-container-executor: running /hadoop_staging/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/ native /target/usr/local/bin/test-container-executor [INFO] with extra environment variables {} [INFO] STATUS: SUCCESS after 5309 millisecond(s). [INFO] -------------------------------------------------------
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 18s Docker mode activated.
                Prechecks
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
                trunk Compile Tests
          +1 mvninstall 18m 15s trunk passed
          +1 compile 0m 54s trunk passed
          +1 mvnsite 0m 37s trunk passed
                Patch Compile Tests
          +1 mvninstall 0m 33s the patch passed
          +1 compile 0m 53s the patch passed
          +1 cc 0m 53s the patch passed
          +1 javac 0m 53s the patch passed
          +1 mvnsite 0m 38s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
                Other Tests
          +1 unit 14m 34s hadoop-yarn-server-nodemanager in the patch passed.
          +1 asflicense 0m 19s The patch does not generate ASF License warnings.
          37m 28s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:14b5c93
          JIRA Issue YARN-7014
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12881943/YARN-7014.001.patch
          Optional Tests asflicense compile cc mvnsite javac unit
          uname Linux 561e6f7cca19 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 2e43c28
          Default Java 1.8.0_144
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16909/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/16909/console
          Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 18m 15s trunk passed +1 compile 0m 54s trunk passed +1 mvnsite 0m 37s trunk passed       Patch Compile Tests +1 mvninstall 0m 33s the patch passed +1 compile 0m 53s the patch passed +1 cc 0m 53s the patch passed +1 javac 0m 53s the patch passed +1 mvnsite 0m 38s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues.       Other Tests +1 unit 14m 34s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 37m 28s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-7014 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12881943/YARN-7014.001.patch Optional Tests asflicense compile cc mvnsite javac unit uname Linux 561e6f7cca19 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 2e43c28 Default Java 1.8.0_144 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16909/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16909/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Attaching a patch that uses strdup instead of separate strlen/malloc/strcpy calls. Arguably the existing unit test can be leveraged here since it was failing before.

          Show
          jlowe Jason Lowe added a comment - Attaching a patch that uses strdup instead of separate strlen/malloc/strcpy calls. Arguably the existing unit test can be leveraged here since it was failing before.
          Hide
          jlowe Jason Lowe added a comment -

          The test is failing because there's a memory corruption error in the container-executor code, and that triggers a crash in malloc. At least one problem is in string_utils.c which has this classic off-by-one error:

            char* input_cpy = malloc(strlen(input));
            strcpy(input_cpy, input);
          

          strlen does not account for the terminating NUL character, so we end up allocating one byte less than we need and then promptly scribble past the end of the allocation.

          Looks like this was introduced by YARN-6726.

          Show
          jlowe Jason Lowe added a comment - The test is failing because there's a memory corruption error in the container-executor code, and that triggers a crash in malloc. At least one problem is in string_utils.c which has this classic off-by-one error: char * input_cpy = malloc(strlen(input)); strcpy(input_cpy, input); strlen does not account for the terminating NUL character, so we end up allocating one byte less than we need and then promptly scribble past the end of the allocation. Looks like this was introduced by YARN-6726 .

            People

            • Assignee:
              jlowe Jason Lowe
              Reporter:
              shanekumpf@gmail.com Shane Kumpf
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Development