HBase
  1. HBase
  2. HBASE-5862

After Region Close remove the Operation Metrics.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixes the case where when a region closed, its metrics did not; they stayed up associated w/ old hosting server even though region may have moved elsewhere.

      Description

      If a region is closed then Hadoop metrics shouldn't still be reporting about that region.

      1. TSD.png
        221 kB
        Elliott Clark
      2. HBASE-5862-94-3.patch
        9 kB
        Elliott Clark
      3. HBASE-5862-4.patch
        10 kB
        Elliott Clark
      4. HBASE-5862-3.patch
        9 kB
        Elliott Clark
      5. HBASE-5862-2.patch
        9 kB
        Elliott Clark
      6. HBASE-5862-1.patch
        9 kB
        Elliott Clark
      7. HBASE-5862-0.patch
        6 kB
        Elliott Clark

        Issue Links

          Activity

          Hide
          Tianying Chang added a comment -

          Elliott,

          I was working on adding some other region level metrics. I think your work around of clearing all dynamic metrics for all the regions when one region is closed works for most of the cases, but for metrics of "numericPersistentMetrics", seems not appropriate. Because clearing the metrics will lose the accumulated values which can never be recovered, especially for the live regions that are not to be closed.

          I want to open a bug for addressing the numericPersistentMetrics metrics, and would like to hear comments from you. Thanks

          Show
          Tianying Chang added a comment - Elliott, I was working on adding some other region level metrics. I think your work around of clearing all dynamic metrics for all the regions when one region is closed works for most of the cases, but for metrics of "numericPersistentMetrics", seems not appropriate. Because clearing the metrics will lose the accumulated values which can never be recovered, especially for the live regions that are not to be closed. I want to open a bug for addressing the numericPersistentMetrics metrics, and would like to hear comments from you. Thanks
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #186 (See https://builds.apache.org/job/HBase-TRUNK-security/186/)
          HBASE-5862 After Region Close remove the Operation Metrics (Revision 1330997)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #186 (See https://builds.apache.org/job/HBase-TRUNK-security/186/ ) HBASE-5862 After Region Close remove the Operation Metrics (Revision 1330997) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #22 (See https://builds.apache.org/job/HBase-0.94-security/22/)
          HBASE-5862 After Region Close remove the Operation Metrics; ADDENDUM – missing import (Revision 1331040)
          HBASE-5862 After Region Close remove the Operation Metrics (Revision 1330998)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java

          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #22 (See https://builds.apache.org/job/HBase-0.94-security/22/ ) HBASE-5862 After Region Close remove the Operation Metrics; ADDENDUM – missing import (Revision 1331040) HBASE-5862 After Region Close remove the Operation Metrics (Revision 1330998) Result = FAILURE stack : Files : /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/OperationMetrics.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionMetricsStorage.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerDynamicMetrics.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #151 (See https://builds.apache.org/job/HBase-0.94/151/)
          HBASE-5862 After Region Close remove the Operation Metrics; ADDENDUM – missing import (Revision 1331040)

          Result = ABORTED
          stack :
          Files :

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #151 (See https://builds.apache.org/job/HBase-0.94/151/ ) HBASE-5862 After Region Close remove the Operation Metrics; ADDENDUM – missing import (Revision 1331040) Result = ABORTED stack : Files : /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionServerMetrics.java
          Hide
          Lars Hofhansl added a comment -

          Thanks Elliot.

          Show
          Lars Hofhansl added a comment - Thanks Elliot.
          Hide
          Elliott Clark added a comment -

          @lars Yes there's a missing api. Hadoop metrics keeps a copy of all metrics created. That copy is used to expose the data to jmx and other consumers.
          There is no remove function. HADOOP-8313 was filed to correct this. However until that changes reflection was the only way.

          Show
          Elliott Clark added a comment - @lars Yes there's a missing api. Hadoop metrics keeps a copy of all metrics created. That copy is used to expose the data to jmx and other consumers. There is no remove function. HADOOP-8313 was filed to correct this. However until that changes reflection was the only way.
          Hide
          Lars Hofhansl added a comment -

          I don't get the Hadoop private field accessor stuff. Why do we need to clear out private fields? Is there an API missing for this in Hadoop?

          Show
          Lars Hofhansl added a comment - I don't get the Hadoop private field accessor stuff. Why do we need to clear out private fields? Is there an API missing for this in Hadoop?
          Hide
          Ted Yu added a comment -
          +    //per hfile.  Figuring out which cfs, hfiles, ...
          

          Should cfs be in expanded form (column families) ?

          +    //and on the next tick of the metrics everything that is still relevant will be
          +    //re-added.
          

          're-added' -> 'added' or 'added again'

          The initialization work in clear() should be moved to RegionServerDynamicMetrics ctor because it is one time operation.

          Show
          Ted Yu added a comment - + //per hfile. Figuring out which cfs, hfiles, ... Should cfs be in expanded form (column families) ? + //and on the next tick of the metrics everything that is still relevant will be + //re-added. 're-added' -> 'added' or 'added again' The initialization work in clear() should be moved to RegionServerDynamicMetrics ctor because it is one time operation.
          Hide
          stack added a comment -

          Committed trunk and 0.94. Thanks for the patch Elliott

          Show
          stack added a comment - Committed trunk and 0.94. Thanks for the patch Elliott
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524470/HBASE-5862-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1658//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524470/HBASE-5862-4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1658//console This message is automatically generated.
          Hide
          Elliott Clark added a comment -

          Patch with more comments.
          Also added a return if reflection was un-successful as there is no need to try more.

          Show
          Elliott Clark added a comment - Patch with more comments. Also added a return if reflection was un-successful as there is no need to try more.
          Hide
          Elliott Clark added a comment -

          Stack wanted a screen shot of what all the recent region metrics work has enabled.

          Here's a shot of TSDB showing average time of a multi put over all the regions of a table. It illustrates regions being split and new regions showing up.

          stack had some comments about places that needed some extra info I'll ge those trivial patches up in a sec.

          Show
          Elliott Clark added a comment - Stack wanted a screen shot of what all the recent region metrics work has enabled. Here's a shot of TSDB showing average time of a multi put over all the regions of a table. It illustrates regions being split and new regions showing up. stack had some comments about places that needed some extra info I'll ge those trivial patches up in a sec.
          Hide
          stack added a comment -

          +1 on patch. Please add more comments around why you are hacking into hadoop metrics. Please also paste some pretty pictures so Lars can see why this has to be in 0.94.

          Show
          stack added a comment - +1 on patch. Please add more comments around why you are hacking into hadoop metrics. Please also paste some pretty pictures so Lars can see why this has to be in 0.94.
          Hide
          Ted Yu added a comment -

          @Stack:
          Do you have suggestions on further improvement for the latest patch ?

          Show
          Ted Yu added a comment - @Stack: Do you have suggestions on further improvement for the latest patch ?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524095/HBASE-5862-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1650//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1650//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1650//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524095/HBASE-5862-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1650//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1650//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1650//console This message is automatically generated.
          Hide
          Elliott Clark added a comment -

          The last QA failure was for the 94 branch patch. It failed because QA only tries to patch trunk.

          Show
          Elliott Clark added a comment - The last QA failure was for the 94 branch patch. It failed because QA only tries to patch trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12524329/HBASE-5862-94-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1651//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524329/HBASE-5862-94-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1651//console This message is automatically generated.
          Hide
          Elliott Clark added a comment -

          Attaching 94 backport of v3

          Show
          Elliott Clark added a comment - Attaching 94 backport of v3
          Hide
          Ted Yu added a comment -

          +1 on patch v3.

          Show
          Ted Yu added a comment - +1 on patch v3.
          Hide
          Elliott Clark added a comment -

          Thanks for the check.

          Show
          Elliott Clark added a comment - Thanks for the check.
          Hide
          Ted Yu added a comment -

          I didn't check MetricsRecord in hadoop 1.0
          Thanks for the confirmation.

          Show
          Ted Yu added a comment - I didn't check MetricsRecord in hadoop 1.0 Thanks for the confirmation.
          Hide
          Elliott Clark added a comment -

          Testing on both 1.0 and 0.23 both work (after applying patch from HBASE-5870). MetricsRecord has always been an interface as far as I can tell. What did you think needed a shim ?

          Show
          Elliott Clark added a comment - Testing on both 1.0 and 0.23 both work (after applying patch from HBASE-5870 ). MetricsRecord has always been an interface as far as I can tell. What did you think needed a shim ?
          Hide
          Ted Yu added a comment -

          MetricsRecord has become an interface in MRv2.

          Please introduce Shim to make the solution work for both hadoop 1.0 and 2.0

          Show
          Ted Yu added a comment - MetricsRecord has become an interface in MRv2. Please introduce Shim to make the solution work for both hadoop 1.0 and 2.0
          Hide
          Elliott Clark added a comment -

          I can write a more comprehensive test but that would require changing several fields to be more accessible. Not sure what the general thought was on that.

          Show
          Elliott Clark added a comment - I can write a more comprehensive test but that would require changing several fields to be more accessible. Not sure what the general thought was on that.
          Hide
          Elliott Clark added a comment -

          HRegionServer.dynamicMetrics registers it's self with the Hadoop Metrics. From then on the metrics are polled by a thread outside of hbase.

          We clear everything in order to clear schema metrics and metrics about cf's. It would be time consuming to iterate through all metrics and see if they are for column families that are still being served. Since the metrics are reset by jmx every time it comes through removing them is not great but saves a lot of string comparisons and ensures that all edge cases are covered.

          I added the bool to make sure that initializing is only tried once.

          Changed the or to a && since only clearing half doesn't get the job done so better not not try if one field was not accessible.

          Show
          Elliott Clark added a comment - HRegionServer.dynamicMetrics registers it's self with the Hadoop Metrics. From then on the metrics are polled by a thread outside of hbase. We clear everything in order to clear schema metrics and metrics about cf's. It would be time consuming to iterate through all metrics and see if they are for column families that are still being served. Since the metrics are reset by jmx every time it comes through removing them is not great but saves a lot of string comparisons and ensures that all edge cases are covered. I added the bool to make sure that initializing is only tried once. Changed the or to a && since only clearing half doesn't get the job done so better not not try if one field was not accessible.
          Hide
          Ted Yu added a comment -
            @SuppressWarnings("unused")
            private RegionServerDynamicMetrics dynamicMetrics;
          

          I tried to find out how HRegionServer.dynamicMetrics is used but wasn't able to.

          +    //Clear all of the dynamic metrics as they are now probably useless
          +    this.dynamicMetrics.clear();
          

          Only encodedName is removed. Why do we clear dynamicMetrics ?

          +      } catch (SecurityException e) {
          +        LOG.debug("Unable to clear metricsRecord");
          

          We don't need to stumble over the same exception(s) again and again. Why not set a boolean to indicate that reflection shouldn't be used in the future ?

          +    if (this.recordMetricMapField != null || this.registryMetricMapField != null) {
          +      try {
          

          Please separate the above two conditions into two if blocks.

          +import com.google.common.collect.Multiset.Entry;
          

          Is the above import used ?
          It's nice to have a test.

          Show
          Ted Yu added a comment - @SuppressWarnings( "unused" ) private RegionServerDynamicMetrics dynamicMetrics; I tried to find out how HRegionServer.dynamicMetrics is used but wasn't able to. + //Clear all of the dynamic metrics as they are now probably useless + this .dynamicMetrics.clear(); Only encodedName is removed. Why do we clear dynamicMetrics ? + } catch (SecurityException e) { + LOG.debug( "Unable to clear metricsRecord" ); We don't need to stumble over the same exception(s) again and again. Why not set a boolean to indicate that reflection shouldn't be used in the future ? + if ( this .recordMetricMapField != null || this .registryMetricMapField != null ) { + try { Please separate the above two conditions into two if blocks. + import com.google.common.collect.Multiset.Entry; Is the above import used ? It's nice to have a test.
          Hide
          Elliott Clark added a comment -

          Here's how I was able to do it. It's not pretty. But it works.

          Basically remove all DynamicStatistics after a region is placed off-line.

          Show
          Elliott Clark added a comment - Here's how I was able to do it. It's not pretty. But it works. Basically remove all DynamicStatistics after a region is placed off-line.
          Hide
          Elliott Clark added a comment -

          Filed HADOOP-8313

          I'm still looking into other ways of doing it.

          Show
          Elliott Clark added a comment - Filed HADOOP-8313 I'm still looking into other ways of doing it.
          Hide
          Elliott Clark added a comment -

          @Stack yes our per region metrics will be messy if regions move or split or if tables are dropped.

          For operation metrics anything reading the jmx can just ignore metrics with _NumOps = 0. That's messy but it works. For something going forward that's a little more general would be nice. I'll look into it.

          Show
          Elliott Clark added a comment - @Stack yes our per region metrics will be messy if regions move or split or if tables are dropped. For operation metrics anything reading the jmx can just ignore metrics with _NumOps = 0. That's messy but it works. For something going forward that's a little more general would be nice. I'll look into it.
          Hide
          stack added a comment -

          @Elliott What happens? The region looks like its on a regionserver its no longer on? The counters just don't change? Whats that mean? That our per-region metrics are going to be messy if regions move? Good on you.

          Show
          stack added a comment - @Elliott What happens? The region looks like its on a regionserver its no longer on? The counters just don't change? Whats that mean? That our per-region metrics are going to be messy if regions move? Good on you.
          Hide
          Ted Yu added a comment -

          Log Hadoop JIRA to request this feature ?

          Show
          Ted Yu added a comment - Log Hadoop JIRA to request this feature ?
          Hide
          Elliott Clark added a comment -

          So with the way hadoop metrics MetricsRegistry works it's not really possible to remove a metric. So this won't do what's needed.

          Show
          Elliott Clark added a comment - So with the way hadoop metrics MetricsRegistry works it's not really possible to remove a metric. So this won't do what's needed.
          Hide
          Ted Yu added a comment -
          +  public void closeMetrics() {
          +    for (String m:metricsPut) {
          

          Please insert spaces around the colon above.

          +      RegionMetricsStorage.deleteTimeVaryingMetric(m);
          +    }
          +    this.metricsPut = new TreeSet<String>();
          

          Calling this.metricsPut.clear() should be enough.

          Show
          Ted Yu added a comment - + public void closeMetrics() { + for ( String m:metricsPut) { Please insert spaces around the colon above. + RegionMetricsStorage.deleteTimeVaryingMetric(m); + } + this .metricsPut = new TreeSet< String >(); Calling this.metricsPut.clear() should be enough.

            People

            • Assignee:
              Elliott Clark
              Reporter:
              Elliott Clark
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development