Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3254

HealthReport should include disk full information

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.6.0
    • Fix Version/s: 3.0.0-beta1
    • Component/s: nodemanager
    • Labels:
      None

      Description

      When a NodeManager's local disk gets almost full, the NodeManager sends a health report to ResourceManager that "local/log dir is bad" and the message is displayed on ResourceManager Web UI. It's difficult for users to detect why the dir is bad.

      1. Screen Shot 2015-02-24 at 17.57.39.png
        97 kB
        Akira Ajisaka
      2. Screen Shot 2015-02-25 at 14.38.10.png
        113 kB
        Akira Ajisaka
      3. YARN-3254-001.patch
        9 kB
        Akira Ajisaka
      4. YARN-3254-002.patch
        9 kB
        Akira Ajisaka
      5. YARN-3254-003.patch
        8 kB
        Suma Shivaprasad

        Activity

        Hide
        ajisakaa Akira Ajisaka added a comment -

        Attaching a screenshot when the NodeManager's disk is almost full.

        Show
        ajisakaa Akira Ajisaka added a comment - Attaching a screenshot when the NodeManager's disk is almost full.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        Attaching a patch to add a new public method getDisksHealthReport(boolean, boolean), and deprecate the existing method getDisksHealthReport(boolean) for backward compatibility. I'll attach a screen shot later.

        Show
        ajisakaa Akira Ajisaka added a comment - Attaching a patch to add a new public method getDisksHealthReport(boolean, boolean) , and deprecate the existing method getDisksHealthReport(boolean) for backward compatibility. I'll attach a screen shot later.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        v2 patch fixes log formatting.

        Show
        ajisakaa Akira Ajisaka added a comment - v2 patch fixes log formatting.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12700863/YARN-3254-001.patch
        against trunk revision 5731c0e.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6741//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6741//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700863/YARN-3254-001.patch against trunk revision 5731c0e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6741//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6741//console This message is automatically generated.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        Attaching a screenshot after applying v2 patch.

        Show
        ajisakaa Akira Ajisaka added a comment - Attaching a screenshot after applying v2 patch.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12700871/YARN-3254-002.patch
        against trunk revision caa42ad.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

        Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6742//testReport/
        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6742//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700871/YARN-3254-002.patch against trunk revision caa42ad. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6742//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6742//console This message is automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12700883/Screen%20Shot%202015-02-25%20at%2014.38.10.png
        against trunk revision caa42ad.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6745//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700883/Screen%20Shot%202015-02-25%20at%2014.38.10.png against trunk revision caa42ad. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6745//console This message is automatically generated.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        I reconsidered that, after all the issue is not a problem because

        • Admin can read NodeManager's log and find the message as follows:
          2015-02-26 07:34:22,485 WARN org.apache.hadoop.yarn.server.nodemanager.Directory
          Collection: Directory /usr/local/20150225-YARN-3254-2/logs/userlogs error, used 
          space above threshold of 90.0%, removing from list of valid directories
          
        • This patch is still incompatible as jmx information is actually changed.
        Show
        ajisakaa Akira Ajisaka added a comment - I reconsidered that, after all the issue is not a problem because Admin can read NodeManager's log and find the message as follows: 2015-02-26 07:34:22,485 WARN org.apache.hadoop.yarn.server.nodemanager.Directory Collection: Directory /usr/local/20150225-YARN-3254-2/logs/userlogs error, used space above threshold of 90.0%, removing from list of valid directories This patch is still incompatible as jmx information is actually changed.
        Hide
        ajisakaa Akira Ajisaka added a comment -

        Closing. If someone really wants to fix this, please reopen this.

        Show
        ajisakaa Akira Ajisaka added a comment - Closing. If someone really wants to fix this, please reopen this.
        Hide
        suma.shivaprasad Suma Shivaprasad added a comment -

        Akira Ajisaka Although the Nodemanager's log shows the reason for directories being unhealthy, it would be useful to have the NodeManager's UI health report display if the directories have errors/are full. Would you mind if I take over this JIRA if you are not working on this currently ? Also, can you please explain why the patch is incompatible with jmx information being changed ?

        Show
        suma.shivaprasad Suma Shivaprasad added a comment - Akira Ajisaka Although the Nodemanager's log shows the reason for directories being unhealthy, it would be useful to have the NodeManager's UI health report display if the directories have errors/are full. Would you mind if I take over this JIRA if you are not working on this currently ? Also, can you please explain why the patch is incompatible with jmx information being changed ?
        Hide
        ajisakaa Akira Ajisaka added a comment -

        Would you mind if I take over this JIRA if you are not working on this currently ?

        No. You can take it over.

        Also, can you please explain why the patch is incompatible with jmx information being changed ?

        This health report is obtained from jmx. Accoding to the documentation (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#MetricsJMX), changing the jmx information breaks compatibility. Now I'm thinking you can add the message to the Web UI if you create an additional jmx information and use it.

        Show
        ajisakaa Akira Ajisaka added a comment - Would you mind if I take over this JIRA if you are not working on this currently ? No. You can take it over. Also, can you please explain why the patch is incompatible with jmx information being changed ? This health report is obtained from jmx. Accoding to the documentation ( https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html#MetricsJMX ), changing the jmx information breaks compatibility. Now I'm thinking you can add the message to the Web UI if you create an additional jmx information and use it.
        Hide
        leftnoteasy Wangda Tan added a comment -

        Akira Ajisaka,

        I think the original definition of JMX metrics compatibility is overkill, I can understand the "semantic consistency" part, for example, change unit (from GB to MB, etc.). For text fields such as report, we should be able to change them if needed.

        For example, YARN-90/YARN-6302 have already changed report field, if we strictly follow the rule, we probably need to revert these patches as well.

        I would prefer to continue improving the text field. And start discussion to modify the compatibility rule.

        cc: Karthik Kambatla (original author of the compatibility doc) and Daniel Templeton.

        Show
        leftnoteasy Wangda Tan added a comment - Akira Ajisaka , I think the original definition of JMX metrics compatibility is overkill, I can understand the "semantic consistency" part, for example, change unit (from GB to MB, etc.). For text fields such as report, we should be able to change them if needed. For example, YARN-90 / YARN-6302 have already changed report field, if we strictly follow the rule, we probably need to revert these patches as well. I would prefer to continue improving the text field. And start discussion to modify the compatibility rule. cc: Karthik Kambatla (original author of the compatibility doc) and Daniel Templeton .
        Hide
        suma.shivaprasad Suma Shivaprasad added a comment -

        Attached a patch which bifurcates disks health failure reports into failed vs errored disks, which is already available as part of existing disk health checks

        Show
        suma.shivaprasad Suma Shivaprasad added a comment - Attached a patch which bifurcates disks health failure reports into failed vs errored disks, which is already available as part of existing disk health checks
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
              Prechecks
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
              trunk Compile Tests
        +1 mvninstall 13m 56s trunk passed
        +1 compile 0m 34s trunk passed
        +1 checkstyle 0m 19s trunk passed
        +1 mvnsite 0m 31s trunk passed
        -1 findbugs 0m 49s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings.
        +1 javadoc 0m 22s trunk passed
              Patch Compile Tests
        +1 mvninstall 0m 28s the patch passed
        +1 compile 0m 30s the patch passed
        +1 javac 0m 30s the patch passed
        -0 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 35 unchanged - 0 fixed = 38 total (was 35)
        +1 mvnsite 0m 30s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 51s the patch passed
        +1 javadoc 0m 15s the patch passed
              Other Tests
        +1 unit 12m 53s hadoop-yarn-server-nodemanager in the patch passed.
        +1 asflicense 0m 19s The patch does not generate ASF License warnings.
        34m 10s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue YARN-3254
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877669/YARN-3254-003.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux d361379b9de5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / b0e78ae
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16471/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16471/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16471/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/16471/console
        Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated.       Prechecks +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.       trunk Compile Tests +1 mvninstall 13m 56s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 31s trunk passed -1 findbugs 0m 49s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 5 extant Findbugs warnings. +1 javadoc 0m 22s trunk passed       Patch Compile Tests +1 mvninstall 0m 28s the patch passed +1 compile 0m 30s the patch passed +1 javac 0m 30s the patch passed -0 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 3 new + 35 unchanged - 0 fixed = 38 total (was 35) +1 mvnsite 0m 30s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 51s the patch passed +1 javadoc 0m 15s the patch passed       Other Tests +1 unit 12m 53s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 19s The patch does not generate ASF License warnings. 34m 10s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue YARN-3254 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877669/YARN-3254-003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d361379b9de5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b0e78ae Default Java 1.8.0_131 findbugs v3.1.0-RC1 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/16471/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/16471/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/16471/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/16471/console Powered by Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.

          People

          • Assignee:
            suma.shivaprasad Suma Shivaprasad
            Reporter:
            ajisakaa Akira Ajisaka
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development