Hadoop Common
  1. Hadoop Common
  2. HADOOP-1406

Metrics based on Map-Reduce Counters are not cleaned up

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None

      Description

      When map-reduce jobs are finished, the metrics corresponding to their counters are not cleaned up. This is a memory leak, but worse it means that an ever-increasing amount of metric data is sent to the metrics system (if one is enabled).

      The fix is for JobInProgress to clean up the metrics it created when the job is complete.

      1. 1406.patch
        5 kB
        David Bowen

        Activity

        David Bowen created issue -
        Hide
        David Bowen added a comment -

        As part of fixing this, I would like to propose two small API changes in org.apache.hadoop.metrics:

        [1] Currently, to remove metrics records from the metrics library's internal table you have to iterate over them one by one, since removal requires the record name and all of its tags (name/value pairs). I propose to generalize this so that you can specify the name and a subset of the tags - saving the client code the burden of having to remember all the tag/values that it has used. This doesn't change any API signatures, but it changes the meaning of MetricsRecord.remove().

        [2] Also, as a convenience, I propose to add a method MetricsRecord.removeTag(String tagName). This method would be added to the MetricsRecordImpl class in the spi package also. This enables the use pattern:

        On module initialization:
        create a metrics record named foo
        set long-lived tag (e.g. job id)

        Then repeatedly:
        set short-lived tags (e.g. counter name)
        set metric values
        call MetricsRecord.update

        And finally, on cleanup:
        remove the short-lived tags
        call MetricsRecord.remove to clean up all the metric data created

        Both of these should be binary compatible changes for people using the public package o.a.h.metrics.

        Show
        David Bowen added a comment - As part of fixing this, I would like to propose two small API changes in org.apache.hadoop.metrics: [1] Currently, to remove metrics records from the metrics library's internal table you have to iterate over them one by one, since removal requires the record name and all of its tags (name/value pairs). I propose to generalize this so that you can specify the name and a subset of the tags - saving the client code the burden of having to remember all the tag/values that it has used. This doesn't change any API signatures, but it changes the meaning of MetricsRecord.remove(). [2] Also, as a convenience, I propose to add a method MetricsRecord.removeTag(String tagName). This method would be added to the MetricsRecordImpl class in the spi package also. This enables the use pattern: On module initialization: create a metrics record named foo set long-lived tag (e.g. job id) Then repeatedly: set short-lived tags (e.g. counter name) set metric values call MetricsRecord.update And finally, on cleanup: remove the short-lived tags call MetricsRecord.remove to clean up all the metric data created Both of these should be binary compatible changes for people using the public package o.a.h.metrics.
        Hide
        David Bowen added a comment -

        This patch contains the API changes described above, and code which uses the modified API to clean up the counter metric data after a job is completed.

        Also, it adds an extra "jobId" tag to the counter metric data, so as to avoid problems that would arise in the case where a user has two jobs with the same name running at the same time (on the same jobtracker).

        Show
        David Bowen added a comment - This patch contains the API changes described above, and code which uses the modified API to clean up the counter metric data after a job is completed. Also, it adds an extra "jobId" tag to the counter metric data, so as to avoid problems that would arise in the case where a user has two jobs with the same name running at the same time (on the same jobtracker).
        David Bowen made changes -
        Field Original Value New Value
        Attachment 1406.patch [ 12357825 ]
        David Bowen made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Show
        Hadoop QA added a comment - +1 http://issues.apache.org/jira/secure/attachment/12357825/1406.patch applied and successfully tested against trunk revision r540359. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/174/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/174/console
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, David!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, David!
        Doug Cutting made changes -
        Fix Version/s 0.14.0 [ 12312474 ]
        Resolution Fixed [ 1 ]
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hide
        Hadoop QA added a comment -
        Show
        Hadoop QA added a comment - Integrated in Hadoop-Nightly #98 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/98/ )
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            David Bowen
            Reporter:
            David Bowen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development