Hadoop Common
  1. Hadoop Common
  2. HADOOP-6858

Enable rotateable JVM garbage collection logs for Hadoop daemons

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.22.0
    • Fix Version/s: None
    • Component/s: scripts
    • Labels:
      None

      Description

      The purpose of this enhancement is to make it easier to collect garbage collection logs and insure that they persist across restarts in the same way that the standard output files of Hadoop daemon JVM's currently does.

      Garbage collection logs are a vital debugging tool for administrators and developers. In our production environments, at some point or another, every single type of Hadoop daemon has OOM'ed or experienced other significant issues related to GC and/or lack of heap memory. For the longest time, we have put in garbage collection logs in our HADOOP_NAMENODE_OPTS, HADOOP_JOBTRACKER_OPTS, etc. by using options like "-XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:$HADOOP_LOG_DIR/jobtracker.gc.log".

      Unfortunately, these logs don't survive a restart of the node, so if a node OOM's and then is restarted automatically, or manually by someone who is unaware, we lose the GC logs forever. We also have to manually add GC log options to each daemon. This patch:
      1) Creates a single, optional, off by default, parameter for specifying GC logging.
      2) If that parameter is set, automatically enables GC logging for all daemons in the cluster. The parameter is flexible enough to allow for the different ways various vendor's JVM's require garbage collection logging to be specified.
      3) If GC logging is on, insures that the GC log files for each daemon are rotated with up to 5 copies kept, same as the .out files currently.

      We are currently running a variation of this patch in our 0.20 install. This patch actually includes changes to common, mapred, and hdfs, so it obviously cannot be applied as-is, but is included here for review and comments.

        Activity

        Hide
        Harsh J added a comment -

        Hi Andrew,

        Can you rebase your patch for trunk please? Also, when rotating the logs over, perhaps an additional option, if set, can also help remove older logs (max-backup-index-like), thereby supporting both log roll and retention?

        Show
        Harsh J added a comment - Hi Andrew, Can you rebase your patch for trunk please? Also, when rotating the logs over, perhaps an additional option, if set, can also help remove older logs (max-backup-index-like), thereby supporting both log roll and retention?
        Hide
        Harsh J added a comment -

        This would certainly help HBase, given http://hbase.apache.org/book/trouble.log.html

        I'm not confident if what's being done is the best approach though (But looks reasonable to me) - so I'll leave the review to more able people. We'd still like to have this, if it is also sufficiently documented so that people use this way instead of adding in opts.

        Show
        Harsh J added a comment - This would certainly help HBase, given http://hbase.apache.org/book/trouble.log.html I'm not confident if what's being done is the best approach though (But looks reasonable to me) - so I'll leave the review to more able people. We'd still like to have this, if it is also sufficiently documented so that people use this way instead of adding in opts.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12449386/HADOOP-6858.patch
        against trunk revision 1071364.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/269//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449386/HADOOP-6858.patch against trunk revision 1071364. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/269//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12449386/HADOOP-6858.patch
        against trunk revision 1031422.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/30//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449386/HADOOP-6858.patch against trunk revision 1031422. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/30//console This message is automatically generated.
        Hide
        Andrew Ryan added a comment -

        We chose not to go that route, because date stamping in that way would address the "logs get blown away on startup" problem, but then it creates a "gc logs are never cleaned up automatically by hadoop" problem. But there are certainly many ways to solve this problem. We're hoping to reach some kind of consensus.

        Show
        Andrew Ryan added a comment - We chose not to go that route, because date stamping in that way would address the "logs get blown away on startup" problem, but then it creates a "gc logs are never cleaned up automatically by hadoop" problem. But there are certainly many ways to solve this problem. We're hoping to reach some kind of consensus.
        Hide
        Allen Wittenauer added a comment -

        We just put a date on our logs.

        i.e., gc.log.`date +blahblah`.

        Show
        Allen Wittenauer added a comment - We just put a date on our logs. i.e., gc.log.`date +blahblah`.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12449386/HADOOP-6858.patch
        against trunk revision 963593.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/611/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449386/HADOOP-6858.patch against trunk revision 963593. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/611/console This message is automatically generated.
        Hide
        Andrew Ryan added a comment -

        Patch submitted for review. This won't apply, because it needs patching common, mapred, and hdfs. If there is agreement on how to move forward, I can regenerate different patches for each.

        Show
        Andrew Ryan added a comment - Patch submitted for review. This won't apply, because it needs patching common, mapred, and hdfs. If there is agreement on how to move forward, I can regenerate different patches for each.
        Hide
        Andrew Ryan added a comment -

        This patch requires changes in common, mapred, and hdfs. So this patch will not actually apply to Hudson yet.

        Show
        Andrew Ryan added a comment - This patch requires changes in common, mapred, and hdfs. So this patch will not actually apply to Hudson yet.

          People

          • Assignee:
            Unassigned
            Reporter:
            Andrew Ryan
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development