HBase
  1. HBase
  2. HBASE-3614

Expose per-region request rate metrics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.95.0
    • Component/s: metrics, regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      We currently export metrics on request rates for each region server, and this can help with identifying uneven load at a high level. But once you see a given server under high load, you're forced to extrapolate based on your application patterns and the data it's serving what the likely culprit is. This can and should be much easier if we just exported request rate metrics per-region on each server.

      Dynamically updating the metrics keys based on assigned regions may pose some minor challenges, but this seems a very valuable diagnostic tool to have available.

      1. HBASE-3614-0.patch
        14 kB
        Elliott Clark
      2. HBASE-3614-1.patch
        15 kB
        Elliott Clark
      3. HBASE-3614-2.patch
        16 kB
        Elliott Clark
      4. HBASE-3614-3.patch
        21 kB
        Elliott Clark
      5. HBASE-3614-4.patch
        22 kB
        Elliott Clark
      6. HBASE-3614-5.patch
        24 kB
        Elliott Clark
      7. HBASE-3614-6.patch
        24 kB
        Elliott Clark
      8. HBASE-3614-7.patch
        24 kB
        Elliott Clark
      9. HBASE-3614-8.patch
        24 kB
        Elliott Clark
      10. HBASE-3614-9.patch
        24 kB
        Elliott Clark
      11. Screen Shot 2012-04-17 at 2.41.27 PM.png
        163 kB
        Elliott Clark

        Issue Links

          Activity

          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #20 (See https://builds.apache.org/job/HBase-0.94-security/20/)
          HBASE-5836 Backport per region metrics from HBASE-3614 to 0.94.1 (Revision 1329958)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #20 (See https://builds.apache.org/job/HBase-0.94-security/20/ ) HBASE-5836 Backport per region metrics from HBASE-3614 to 0.94.1 (Revision 1329958) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #145 (See https://builds.apache.org/job/HBase-0.94/145/)
          HBASE-5836 Backport per region metrics from HBASE-3614 to 0.94.1 (Revision 1329958)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #145 (See https://builds.apache.org/job/HBase-0.94/145/ ) HBASE-5836 Backport per region metrics from HBASE-3614 to 0.94.1 (Revision 1329958) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          Elliott Clark added a comment -

          @Todd the time period is configurable. It's set through the hadoop-metrics hbase.period. Whenever the thread that publishes metrics comes around the averages are re-set.

          Show
          Elliott Clark added a comment - @Todd the time period is configurable. It's set through the hadoop-metrics hbase.period. Whenever the thread that publishes metrics comes around the averages are re-set.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #176 (See https://builds.apache.org/job/HBase-TRUNK-security/176/)
          HBASE-3614 Expose per-region request rate metrics (Revision 1328140)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
          • /hbase/trunk/src/main/resources/hbase-default.xml
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #176 (See https://builds.apache.org/job/HBase-TRUNK-security/176/ ) HBASE-3614 Expose per-region request rate metrics (Revision 1328140) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java /hbase/trunk/src/main/resources/hbase-default.xml /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          Todd Lipcon added a comment -

          Gotcha. Apologies for my cluelessness Seems fine. Though I think Shaneal's percentile stuff would be a nice improvement, what we've got now is at least useful.

          Show
          Todd Lipcon added a comment - Gotcha. Apologies for my cluelessness Seems fine. Though I think Shaneal's percentile stuff would be a nice improvement, what we've got now is at least useful.
          Hide
          stack added a comment -

          @Todd This issue just exposes metrics that were already being collected per region. I believe its over the metrics reporting period (5 seconds?). Want that changed? Metrics could do w/ a revamp/edit for sure.

          Show
          stack added a comment - @Todd This issue just exposes metrics that were already being collected per region. I believe its over the metrics reporting period (5 seconds?). Want that changed? Metrics could do w/ a revamp/edit for sure.
          Hide
          Todd Lipcon added a comment -

          Just getting to this after it's committed.. but: what's the time scale of "avgtime" on these metrics? Averages that are since boot are kind of useless. We should either track the sum of the time, or do a time-biased metric like Shaneal added recently in other places.

          Show
          Todd Lipcon added a comment - Just getting to this after it's committed.. but: what's the time scale of "avgtime" on these metrics? Averages that are since boot are kind of useless. We should either track the sum of the time, or do a time-biased metric like Shaneal added recently in other places.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2788 (See https://builds.apache.org/job/HBase-TRUNK/2788/)
          HBASE-3614 Expose per-region request rate metrics (Revision 1328140)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
          • /hbase/trunk/src/main/resources/hbase-default.xml
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2788 (See https://builds.apache.org/job/HBase-TRUNK/2788/ ) HBASE-3614 Expose per-region request rate metrics (Revision 1328140) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java /hbase/trunk/src/main/resources/hbase-default.xml /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          stack added a comment -

          Committed to trunk (936 tests passed, about same as in previous hangs). Thanks for the patch Elliott. Let me open an issue against 0.94.1 for backporting to see if we can get it in there.

          Show
          stack added a comment - Committed to trunk (936 tests passed, about same as in previous hangs). Thanks for the patch Elliott. Let me open an issue against 0.94.1 for backporting to see if we can get it in there.
          Hide
          Elliott Clark added a comment -

          Address the findbugs warning in HRegion. Not really a concern as that constructor is only used in unit tests.

          Show
          Elliott Clark added a comment - Address the findbugs warning in HRegion. Not really a concern as that constructor is only used in unit tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12523401/HBASE-3614-8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1582//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1582//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1582//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523401/HBASE-3614-8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1582//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1582//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1582//console This message is automatically generated.
          Hide
          stack added a comment -

          Trying against hadoopqa

          Show
          stack added a comment - Trying against hadoopqa
          Hide
          Elliott Clark added a comment -

          And I beat you to the rename is opMetrics alright.

          Show
          Elliott Clark added a comment - And I beat you to the rename is opMetrics alright.
          Hide
          Elliott Clark added a comment -

          Fixed the spelling.

          The null is there because we can get a multi put where each of the puts is for different cf's. Since we only time the total operation we're not sure which cf to assign the operation time to. It was put in __unknown previously.

          Show
          Elliott Clark added a comment - Fixed the spelling. The null is there because we can get a multi put where each of the puts is for different cf's. Since we only time the total operation we're not sure which cf to assign the operation time to. It was put in __unknown previously.
          Hide
          stack added a comment -

          Since you renamed RegionOperationMetrics, is this right now:

          +  private final OperationMetrics regionMetrics;
          

          Should it be named metrics or operationMetrics?

          Whats 'unknown' in the following? + //null will be treated as unknown.

          We are updating metrics w/o attributing them to a cf?

          Fix misspell 'Inctement' in hbase-site change

          Patch is good to go after addressing above. Good stuff.

          Show
          stack added a comment - Since you renamed RegionOperationMetrics, is this right now: + private final OperationMetrics regionMetrics; Should it be named metrics or operationMetrics? Whats 'unknown' in the following? + //null will be treated as unknown. We are updating metrics w/o attributing them to a cf? Fix misspell 'Inctement' in hbase-site change Patch is good to go after addressing above. Good stuff.
          Hide
          Elliott Clark added a comment -

          Rename the field in HRegion to mirror the new class name.

          Show
          Elliott Clark added a comment - Rename the field in HRegion to mirror the new class name.
          Hide
          Elliott Clark added a comment -

          Added the final to OperationMetrics.

          Show
          Elliott Clark added a comment - Added the final to OperationMetrics.
          Hide
          Elliott Clark added a comment -

          Added more comments about multi-put.
          Fixed a bug that Stack was asking about. (left off getFamilyMap().keySet() when comparing cf sets)
          Renamed RegionOperationMetrics to OperationMetrics since it also exposes metrics per cf.
          Added more tests around Multiputs

          I didn't rename the conf key since it turns off all reporting of operation timing metrics and not just those for cf's.

          Show
          Elliott Clark added a comment - Added more comments about multi-put. Fixed a bug that Stack was asking about. (left off getFamilyMap().keySet() when comparing cf sets) Renamed RegionOperationMetrics to OperationMetrics since it also exposes metrics per cf. Added more tests around Multiputs I didn't rename the conf key since it turns off all reporting of operation timing metrics and not just those for cf's.
          Hide
          stack added a comment -

          This could be final: '+ private RegionOperationMetrics regionMetrics;'?

          100 chars per line.

          Just pass HRegionInfo altogether to the below?

          +    this.regionMetrics = new RegionOperationMetrics(conf, this.regionInfo.getTableNameAsString(), this.regionInfo.getEncodedName());
          

          Err... your replacement is better than what was there previously in the below:

          -    final String metricPrefix = SchemaMetrics.generateSchemaMetricsPrefix(
          -        getTableDesc().getNameAsString(), familyMap.keySet());
          -    if (!metricPrefix.isEmpty()) {
          -      RegionMetricsStorage.incrTimeVaryingMetric(metricPrefix + "delete_", after - now);
          -    }
          +    this.regionMetrics.updateDeleteMetrics(familyMap.keySet(), after-now);
          

          Whats happening here?

          +        if (cfSet == null) {
          +          cfSet = put.getFamilyMap().keySet();
          +        } else {
          +          cfSetConsistent = cfSetConsistent && put.equals(cfSet);
          

          Do we have to get the column family set each time through? It never changes (currently) while the region is open.

          Whats a cfSetConsistent? A comment would help?

          Yeah, I don't follow this stuff:

          +      //See if the column families were consistent through the whole thing.
          +      //if they were then keep them.  If they were not then pass a null.
          +      //null will be treated as unknown.
          

          Should be hbase.metrics.region.exposeOperationTimes instead of hbase.metrics.exposeOperationTimes to convey its on/off for per-region metrics?

          This patch is great.

          Show
          stack added a comment - This could be final: '+ private RegionOperationMetrics regionMetrics;'? 100 chars per line. Just pass HRegionInfo altogether to the below? + this .regionMetrics = new RegionOperationMetrics(conf, this .regionInfo.getTableNameAsString(), this .regionInfo.getEncodedName()); Err... your replacement is better than what was there previously in the below: - final String metricPrefix = SchemaMetrics.generateSchemaMetricsPrefix( - getTableDesc().getNameAsString(), familyMap.keySet()); - if (!metricPrefix.isEmpty()) { - RegionMetricsStorage.incrTimeVaryingMetric(metricPrefix + "delete_" , after - now); - } + this .regionMetrics.updateDeleteMetrics(familyMap.keySet(), after-now); Whats happening here? + if (cfSet == null ) { + cfSet = put.getFamilyMap().keySet(); + } else { + cfSetConsistent = cfSetConsistent && put.equals(cfSet); Do we have to get the column family set each time through? It never changes (currently) while the region is open. Whats a cfSetConsistent? A comment would help? Yeah, I don't follow this stuff: + //See if the column families were consistent through the whole thing. + // if they were then keep them. If they were not then pass a null . + // null will be treated as unknown. Should be hbase.metrics.region.exposeOperationTimes instead of hbase.metrics.exposeOperationTimes to convey its on/off for per-region metrics? This patch is great.
          Hide
          Elliott Clark added a comment -

          Here's a better version. It has the ability to remove the metrics.

          get, set, increment, append, and delete all report their metrics per cf and per region with a pretty good api.

          @Enis you're absolutely correct about SchemaMetrics needing to get static state out. I'll file an issue and try and get some time to take a look.

          Show
          Elliott Clark added a comment - Here's a better version. It has the ability to remove the metrics. get, set, increment, append, and delete all report their metrics per cf and per region with a pretty good api. @Enis you're absolutely correct about SchemaMetrics needing to get static state out. I'll file an issue and try and get some time to take a look.
          Hide
          Elliott Clark added a comment -

          More tests. Still needs the confs.

          Show
          Elliott Clark added a comment - More tests. Still needs the confs.
          Hide
          Elliott Clark added a comment -

          I think per region and per cf metrics being configurable is probably the right way to go. I'll add that in the next go around.

          Show
          Elliott Clark added a comment - I think per region and per cf metrics being configurable is probably the right way to go. I'll add that in the next go around.
          Hide
          Enis Soztutar added a comment -

          This is fine for JMX, but these per-region metrics will be exposed via the Hadoop metrics system as well right? If so, the large number of regions and the fact that regions are dynamic would make it useless for ganglia (or similar). So, I think we should either not expose them through the hadoop metrics, but do itJMX only, or make per-region metrics configurable IMO.

          Show
          Enis Soztutar added a comment - This is fine for JMX, but these per-region metrics will be exposed via the Hadoop metrics system as well right? If so, the large number of regions and the fact that regions are dynamic would make it useless for ganglia (or similar). So, I think we should either not expose them through the hadoop metrics, but do itJMX only, or make per-region metrics configurable IMO.
          Hide
          Elliott Clark added a comment -

          More progress.

          This is not yet ready for integration. It needs tests to cover the new metrics.

          Show
          Elliott Clark added a comment - More progress. This is not yet ready for integration. It needs tests to cover the new metrics.
          Hide
          Elliott Clark added a comment -

          Screen shot of jmx stats after some of Stack's comments.

          Show
          Elliott Clark added a comment - Screen shot of jmx stats after some of Stack's comments.
          Hide
          stack added a comment -

          FYI 100 chars per line max and space around operators (this won't fly: "cfSetConsistent?cfSet:null")

          I like how you are removing metrics stuff from HRegion out to a region scoped metrics class.

          '+public class RegionMetrics {' needs a class comment saying what its all about. Does the class need to be public? Can it be scoped to this package only?

          Collect all the data members at the top of the class. Thats whats usually done in this code base.

          So put the tablename etc. in RegionMetric before the constructor etc. rather than after.

          Does this need to be public generateRegionMetricsPrefix?

          What do these new metrics look like? Is this all it takes to expose them?

          Some regionnames are going to be really long. Should you use the region encoded name instead of the full name? Do you think we even need the table name as prefix?

          Good stuff Elliott.

          Show
          stack added a comment - FYI 100 chars per line max and space around operators (this won't fly: "cfSetConsistent?cfSet:null") I like how you are removing metrics stuff from HRegion out to a region scoped metrics class. '+public class RegionMetrics {' needs a class comment saying what its all about. Does the class need to be public? Can it be scoped to this package only? Collect all the data members at the top of the class. Thats whats usually done in this code base. So put the tablename etc. in RegionMetric before the constructor etc. rather than after. Does this need to be public generateRegionMetricsPrefix? What do these new metrics look like? Is this all it takes to expose them? Some regionnames are going to be really long. Should you use the region encoded name instead of the full name? Do you think we even need the table name as prefix? Good stuff Elliott.
          Hide
          Elliott Clark added a comment -

          This patch fixes the sizing tests. HRegion gets one new object reference.

          Show
          Elliott Clark added a comment - This patch fixes the sizing tests. HRegion gets one new object reference.
          Hide
          Elliott Clark added a comment -

          This is the first cut at getting per region metrics into the metrics framework. It adds metrics related to get/put/multiput/delete/icv. It does not add metrics around fsread times yet.

          Show
          Elliott Clark added a comment - This is the first cut at getting per region metrics into the metrics framework. It adds metrics related to get/put/multiput/delete/icv. It does not add metrics around fsread times yet.
          Hide
          Jonathan Gray added a comment -

          I'm not sure if there is a JIRA yet, but some guys at FB did a bunch of work on doing per-family metrics. They did work to dynamically generate new metric names, etc.

          I think we could work on this at the same time we start to think about using the info for better load balancing and such. This could obviously come first.

          Show
          Jonathan Gray added a comment - I'm not sure if there is a JIRA yet, but some guys at FB did a bunch of work on doing per-family metrics. They did work to dynamically generate new metric names, etc. I think we could work on this at the same time we start to think about using the info for better load balancing and such. This could obviously come first.

            People

            • Assignee:
              Elliott Clark
              Reporter:
              Gary Helmling
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development