Hadoop Common
  1. Hadoop Common
  2. HADOOP-954

Metrics should offer complete set of static report methods or none at all

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.12.0
    • Component/s: metrics
    • Labels:
      None

      Description

      org.apache.hadoop.metrics.Metrics currently has one report method. I should either have report methods for all underlying MetricsRecord or no report methods at all.

      1. hadoop-954.patch
        30 kB
        David Bowen
      2. HADOOP-954-2.patch
        36 kB
        Nigel Daley
      3. HADOOP-954-3.patch
        37 kB
        Nigel Daley

        Activity

        Hide
        David Bowen added a comment -

        The comments on this Metrics class say that it is a utility class (and the log name is org.apache.hadoop.util.MetricsUtil), but the name of the class implies that it is meant to be the main interface to the package.

        I think that the comments at least should be changed to say that this class is intended to provide a simplified interface to a subset of the package's functionality. It assumes that there is always exactly one tag in a record, defaulting to the hostname.

        It also seems to assume that there is only one metric in a record, and that it is a gauge of type long. This takes the simplified interface approach to a bit of an extreme.

        My preference would be to move this class out of the metrics package and put it in dfs. It doesn't make sense to have the apparent main interface to the metrics package omit most of its functionality.

        Show
        David Bowen added a comment - The comments on this Metrics class say that it is a utility class (and the log name is org.apache.hadoop.util.MetricsUtil), but the name of the class implies that it is meant to be the main interface to the package. I think that the comments at least should be changed to say that this class is intended to provide a simplified interface to a subset of the package's functionality. It assumes that there is always exactly one tag in a record, defaulting to the hostname. It also seems to assume that there is only one metric in a record, and that it is a gauge of type long. This takes the simplified interface approach to a bit of an extreme. My preference would be to move this class out of the metrics package and put it in dfs. It doesn't make sense to have the apparent main interface to the metrics package omit most of its functionality.
        Hide
        Doug Cutting added a comment -

        Since it is used in more than one place (dfs and mapred) then it should probably remain in the metrics package. (The "util" package should only be used as a last resort.) But improving its documentation and making it more general both sound like good moves.

        Show
        Doug Cutting added a comment - Since it is used in more than one place (dfs and mapred) then it should probably remain in the metrics package. (The "util" package should only be used as a last resort.) But improving its documentation and making it more general both sound like good moves.
        Hide
        David Bowen added a comment -

        A bigger question here is whether this simplified interface is
        appropriate even for dfs and mapred. There are a couple of issues:

        1. Most or all of the data being reported should be using counters
        rather than gauges. This would allow us to get graphs of things
        like bytes-read per second, as opposed to simply a graph of the
        cumulative number of bytes read (which will soon overflow).
        2. The current code is a little inefficient if used in inner loops.
        Each call to the report method (for a single metric) will cause
        all the metrics in that record to be copied from the MetricsRecord
        into the library's internal table. Also, the report method cannot
        be easily modified to work with counters, since it would be
        necessary to reinitialize the record every time.

        The preferable way to use the API is to have each "event" (counter
        increment) simply increment a counter, and to use a call-back to report
        the counter values to the metrics library periodically (every 5 seconds
        in our case).

        • David
        Show
        David Bowen added a comment - A bigger question here is whether this simplified interface is appropriate even for dfs and mapred. There are a couple of issues: 1. Most or all of the data being reported should be using counters rather than gauges. This would allow us to get graphs of things like bytes-read per second, as opposed to simply a graph of the cumulative number of bytes read (which will soon overflow). 2. The current code is a little inefficient if used in inner loops. Each call to the report method (for a single metric) will cause all the metrics in that record to be copied from the MetricsRecord into the library's internal table. Also, the report method cannot be easily modified to work with counters, since it would be necessary to reinitialize the record every time. The preferable way to use the API is to have each "event" (counter increment) simply increment a counter, and to use a call-back to report the counter values to the metrics library periodically (every 5 seconds in our case). David
        Hide
        Doug Cutting added a comment -

        > The preferable way to use the API is to have each "event" simply increment a counter

        This class is the result of someone trying to figure out the preferable way. So let's not throw it out, but rather improve it. There should be an object that encapsulates these counters, right? If that's not a MetricsRecord, what should it be? Perhaps we need a standard wrapper on a MetricsRecord? Please advise.

        Show
        Doug Cutting added a comment - > The preferable way to use the API is to have each "event" simply increment a counter This class is the result of someone trying to figure out the preferable way. So let's not throw it out, but rather improve it. There should be an object that encapsulates these counters, right? If that's not a MetricsRecord, what should it be? Perhaps we need a standard wrapper on a MetricsRecord? Please advise.
        Hide
        David Bowen added a comment -

        Another goal is to keep the metrics package usable stand-alone,
        independent of Hadoop. Therefore, if Hadoop currently needs only a
        subset of the functionality, it would make sense to create any
        encapsulation of that subset somewhere outside of the metrics package,
        unless it is clear that a large percentage of non-Hadoop users will want
        that particular subset. But let us put aside this API issue for now,
        and focus on the higher priority issue which is getting useful metric
        data out of Hadoop. For that, we need to use counters, and counters
        won't work with the Metrics.report method.

        We have a few classes which each wrap a MetricsRecord: MapTaskMetrics,
        TaskTrackerMetrics etc. Some use Metrics.report and others use the
        MetricsRecord API directly. The latter should be OK. We should get rid
        of the report method (there are about 2 dozen calls) and use the
        MetricsRecord API instead. As an alternative to using a callback, we
        could just split the MetricsRecords into smaller ones, where each
        corresponds to an "event". E.g. we currently have a MetricsRecord
        called "map" containing metrics input_records, input_bytes,
        output_records and output_bytes. This could be split into two, one for
        input data and one for output data. This makes sense because the input
        related metrics and output related metrics are updated at different times.

        This splitting approach would work for everything, but would be a bit
        inelegant in the case of e.g. JobTrackerMetrics where there are 6
        counters that need to be independently incremented. We could shorten
        the code by using a callback.

        Should I submit a patch with these changes?

        Show
        David Bowen added a comment - Another goal is to keep the metrics package usable stand-alone, independent of Hadoop. Therefore, if Hadoop currently needs only a subset of the functionality, it would make sense to create any encapsulation of that subset somewhere outside of the metrics package, unless it is clear that a large percentage of non-Hadoop users will want that particular subset. But let us put aside this API issue for now, and focus on the higher priority issue which is getting useful metric data out of Hadoop. For that, we need to use counters, and counters won't work with the Metrics.report method. We have a few classes which each wrap a MetricsRecord: MapTaskMetrics, TaskTrackerMetrics etc. Some use Metrics.report and others use the MetricsRecord API directly. The latter should be OK. We should get rid of the report method (there are about 2 dozen calls) and use the MetricsRecord API instead. As an alternative to using a callback, we could just split the MetricsRecords into smaller ones, where each corresponds to an "event". E.g. we currently have a MetricsRecord called "map" containing metrics input_records, input_bytes, output_records and output_bytes. This could be split into two, one for input data and one for output data. This makes sense because the input related metrics and output related metrics are updated at different times. This splitting approach would work for everything, but would be a bit inelegant in the case of e.g. JobTrackerMetrics where there are 6 counters that need to be independently incremented. We could shorten the code by using a callback. Should I submit a patch with these changes?
        Hide
        Doug Cutting added a comment -

        > if Hadoop currently needs only a subset of the functionality, it would make sense to create any
        encapsulation of that subset somewhere outside of the metrics package

        The metrics package is a part of Hadoop. It's architecturally wise to make it standalone, but if there are utilities based on it that are generally useful, they should be included with metrics, and not in some other utility package. If these utilities are poorly designed then they should be fixed or discarded, not simply moved elsewhere.

        > We should get rid of the report method (there are about 2 dozen calls) and use the MetricsRecord API instead.

        I'm okay with that, so long as it doesn't lead to a bunch of duplicated code. If there are more than a few lines that are duplicated by different uses of the metrics API, then these should be shared through a utility class, no?

        > Should I submit a patch with these changes?

        Yes, please. Thanks! It would be great to have the designer of the metrics system make Hadoop's use of it exemplary.

        Show
        Doug Cutting added a comment - > if Hadoop currently needs only a subset of the functionality, it would make sense to create any encapsulation of that subset somewhere outside of the metrics package The metrics package is a part of Hadoop. It's architecturally wise to make it standalone, but if there are utilities based on it that are generally useful, they should be included with metrics, and not in some other utility package. If these utilities are poorly designed then they should be fixed or discarded, not simply moved elsewhere. > We should get rid of the report method (there are about 2 dozen calls) and use the MetricsRecord API instead. I'm okay with that, so long as it doesn't lead to a bunch of duplicated code. If there are more than a few lines that are duplicated by different uses of the metrics API, then these should be shared through a utility class, no? > Should I submit a patch with these changes? Yes, please. Thanks! It would be great to have the designer of the metrics system make Hadoop's use of it exemplary.
        Hide
        David Bowen added a comment -


        Here's a patch that modifies how the metrics package is used. It compiles, but has not been tested yet. Comments and questions are invited.

        Show
        David Bowen added a comment - Here's a patch that modifies how the metrics package is used. It compiles, but has not been tested yet. Comments and questions are invited.
        Hide
        David Bowen added a comment -

        Oops. I used "submit patch" when I meant "attach file".

        Show
        David Bowen added a comment - Oops. I used "submit patch" when I meant "attach file".
        Hide
        Hadoop QA added a comment -

        -1, because the patch command could not apply the latest attachment (http://issues.apache.org) as a patch to trunk revision r502030. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

        Show
        Hadoop QA added a comment - -1, because the patch command could not apply the latest attachment ( http://issues.apache.org ) as a patch to trunk revision r502030. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.
        Hide
        Doug Cutting added a comment -

        > Comments and questions are invited.

        Looks like a good direction to me. Thanks!

        Show
        Doug Cutting added a comment - > Comments and questions are invited. Looks like a good direction to me. Thanks!
        Hide
        Doug Cutting added a comment -

        I'll mark this as "Open" until the patch is ready for commit.

        Show
        Doug Cutting added a comment - I'll mark this as "Open" until the patch is ready for commit.
        Hide
        David Bowen added a comment -

        What's up with this? My workspace is up-to-date at revision 502037.

        Show
        David Bowen added a comment - What's up with this? My workspace is up-to-date at revision 502037.
        Hide
        David Bowen added a comment -

        Oops, I botched a file renaming (Metrics.java ->MetricsUtil.java). I'm not used to svn and nor is Netbeans.

        Show
        David Bowen added a comment - Oops, I botched a file renaming (Metrics.java ->MetricsUtil.java). I'm not used to svn and nor is Netbeans.
        Hide
        Nigel Daley added a comment -

        Updated patch. Includes job and shuffle metrics. Also fixes some bugs in previous patch.

        Show
        Nigel Daley added a comment - Updated patch. Includes job and shuffle metrics. Also fixes some bugs in previous patch.
        Hide
        Nigel Daley added a comment -

        New patch fixes initialization of map and reduce metrics.

        Show
        Nigel Daley added a comment - New patch fixes initialization of map and reduce metrics.
        Hide
        David Bowen added a comment -

        +1. These changes look fine.

        I think we are going to hit overflows with counting bytes. We should change this to counting megabytes (using a float rather than an int), but that can be another Jira ticket.

        Show
        David Bowen added a comment - +1. These changes look fine. I think we are going to hit overflows with counting bytes. We should change this to counting megabytes (using a float rather than an int), but that can be another Jira ticket.
        Hide
        Owen O'Malley added a comment -

        Of course it will overflow while counting bytes. However, I strongly disagree on replacing the int with a float. It is far better to support longs.

        Show
        Owen O'Malley added a comment - Of course it will overflow while counting bytes. However, I strongly disagree on replacing the int with a float. It is far better to support longs.
        Hide
        David Bowen added a comment -

        > Of course it will overflow while counting bytes. However, I strongly disagree on replacing the int
        > with a float. It is far better to support longs.

        This conversation perhaps belongs somewhere else.

        Longs aren't currently supported in hadoop metrics, so perhaps we need a Jira ticket for that. But before deciding to make that a priority, I think it is worth trying to use the functionality that already exists. It may be sufficient.

        Show
        David Bowen added a comment - > Of course it will overflow while counting bytes. However, I strongly disagree on replacing the int > with a float. It is far better to support longs. This conversation perhaps belongs somewhere else. Longs aren't currently supported in hadoop metrics, so perhaps we need a Jira ticket for that. But before deciding to make that a priority, I think it is worth trying to use the functionality that already exists. It may be sufficient.
        Hide
        Hadoop QA added a comment -

        +1, because http://issues.apache.org/jira/secure/attachment/12350600/HADOOP-954-3.patch applied and successfully tested against trunk revision r504682.

        Show
        Hadoop QA added a comment - +1, because http://issues.apache.org/jira/secure/attachment/12350600/HADOOP-954-3.patch applied and successfully tested against trunk revision r504682.
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, David & Nigel!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, David & Nigel!

          People

          • Assignee:
            David Bowen
            Reporter:
            Nigel Daley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development