Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-6967

Limit application attempt's diagnostic message size thoroughly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.8.1
    • Fix Version/s: 2.9.0, 3.0.0-beta1
    • Component/s: resourcemanager
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      YARN-6125 implemented BoundedAppender and applied to the field diagnostics to limit the diagnostic message's size.

      However, some code bypasses this limit. In RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(...), a local variable diags will finally be written into ZooKeeper if ZKRMStateStore is used.

      A simple fix is to also use BoundedAppender for the local variable.

        Activity

        Hide
        chengbing.liu Chengbing Liu added a comment -

        Attached the patch.

        Show
        chengbing.liu Chengbing Liu added a comment - Attached the patch.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
              trunk Compile Tests
        +1 mvninstall 14m 11s trunk passed
        +1 compile 0m 41s trunk passed
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 0m 43s trunk passed
        +1 findbugs 1m 12s trunk passed
        +1 javadoc 0m 23s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 36s the patch passed
        +1 compile 0m 33s the patch passed
        +1 javac 0m 33s the patch passed
        +1 checkstyle 0m 24s the patch passed
        +1 mvnsite 0m 36s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 9s the patch passed
        +1 javadoc 0m 19s the patch passed
              Other Tests
        +1 unit 44m 28s hadoop-yarn-server-resourcemanager in the patch passed.
        +1 asflicense 0m 14s The patch does not generate ASF License warnings.
        67m 29s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue YARN-6967
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880817/YARN-6967.01.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 864f1b87af67 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9891295
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16770/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/16770/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.       trunk Compile Tests +1 mvninstall 14m 11s trunk passed +1 compile 0m 41s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 43s trunk passed +1 findbugs 1m 12s trunk passed +1 javadoc 0m 23s trunk passed       Patch Compile Tests +1 mvninstall 0m 36s the patch passed +1 compile 0m 33s the patch passed +1 javac 0m 33s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 36s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 9s the patch passed +1 javadoc 0m 19s the patch passed       Other Tests +1 unit 44m 28s hadoop-yarn-server-resourcemanager in the patch passed. +1 asflicense 0m 14s The patch does not generate ASF License warnings. 67m 29s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-6967 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880817/YARN-6967.01.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 864f1b87af67 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9891295 Default Java 1.8.0_131 findbugs v3.1.0-RC1 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16770/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16770/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        templedf Daniel Templeton added a comment -

        Thanks for catching that, Chengbing Liu. In the original code the diags string is only set, never appended to. In the patch, you're only appending. Seems like that might change the diag messages and tend to make them longer on average (though never longer than the limit).

        Show
        templedf Daniel Templeton added a comment - Thanks for catching that, Chengbing Liu . In the original code the diags string is only set, never appended to. In the patch, you're only appending. Seems like that might change the diag messages and tend to make them longer on average (though never longer than the limit).
        Hide
        chengbing.liu Chengbing Liu added a comment -

        Hi Daniel Templeton, I don't think the message will be any longer. In the patch, the local BoundedAppender diags is independent from the field variable diagnostics of RMAppAttemptImpl, and starts as a new BoundedAppender.

        We have seen cases where a Spark application throws a large exception message, which is too large for ZooKeeper. Then both the AppMaster and the ResourceManager keep trying and failing, making the RM irresponsive in the end.

        Show
        chengbing.liu Chengbing Liu added a comment - Hi Daniel Templeton , I don't think the message will be any longer. In the patch, the local BoundedAppender diags is independent from the field variable diagnostics of RMAppAttemptImpl , and starts as a new BoundedAppender . We have seen cases where a Spark application throws a large exception message, which is too large for ZooKeeper. Then both the AppMaster and the ResourceManager keep trying and failing, making the RM irresponsive in the end.
        Hide
        chengbing.liu Chengbing Liu added a comment -

        Daniel Templeton Andras Piros Could you review the patch again? Thanks!

        Show
        chengbing.liu Chengbing Liu added a comment - Daniel Templeton Andras Piros Could you review the patch again? Thanks!
        Hide
        templedf Daniel Templeton added a comment -

        OK, LGTM +1

        Show
        templedf Daniel Templeton added a comment - OK, LGTM +1
        Hide
        templedf Daniel Templeton added a comment -

        Thanks for the patch, Chengbing Liu. Committed to trunk and branch-2.

        Show
        templedf Daniel Templeton added a comment - Thanks for the patch, Chengbing Liu . Committed to trunk and branch-2.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12170 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12170/)
        YARN-6967. Limit application attempt's diagnostic message size (templedf: rev 65364defb4a633ca20b39ebc38cd9c0db63a5835)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12170 (See https://builds.apache.org/job/Hadoop-trunk-Commit/12170/ ) YARN-6967 . Limit application attempt's diagnostic message size (templedf: rev 65364defb4a633ca20b39ebc38cd9c0db63a5835) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java

          People

          • Assignee:
            chengbing.liu Chengbing Liu
            Reporter:
            chengbing.liu Chengbing Liu
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development