Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14960

Add GC time percentage monitor/alerter

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0, 2.10.0
    • None
    • None
    • Reviewed

    Description

      Currently class org.apache.hadoop.metrics2.source.JvmMetrics provides several metrics related to GC. Unfortunately, all these metrics are not as useful as they could be, because they don't answer the first and most important question related to GC and JVM health: what percentage of time my JVM is paused in GC? This percentage, calculated as the sum of the GC pauses over some period, like 1 minute, divided by that period - is the most convenient measure of the GC health because:

      • it is just one number, and it's clear that, say, 1..5% is good, but 80..90% is really bad
      • it allows for easy apple-to-apple comparison between runs, even between different apps
      • when this metric reaches some critical value like 70%, it almost always indicates a "GC death spiral", from which the app can recover only if it drops some task(s) etc.

      The existing "total GC time", "total number of GCs" etc. metrics only give numbers that can be used to rougly estimate this percentage. Thus it is suggested to add a new metric to this class, and possibly allow users to register handlers that will be automatically invoked if this metric reaches the specified threshold.

      Attachments

        1. HADOOP-14960.01.patch
          15 kB
          Misha Dmitriev
        2. HADOOP-14960.02.patch
          16 kB
          Misha Dmitriev
        3. HADOOP-14960.03.patch
          18 kB
          Misha Dmitriev
        4. HADOOP-14960.04.patch
          18 kB
          Misha Dmitriev

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            misha@cloudera.com Misha Dmitriev
            misha@cloudera.com Misha Dmitriev
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment