Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.95.1
    • Fix Version/s: 0.98.0, 0.96.0
    • Component/s: UI
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      hbase.regionserver.metrics.period can now be used to configure how often the RS metrics are computed. Defaults to 5, was 15 before this patch.

      Description

      I tried 0.95.1 RC1 in standalone, and the requests count in both the master and RS web UIs are wrong. I haven't dug too much in but it seems too low when I'm sending load, and it takes >10 seconds to clear up when the cluster becomes completely idle.

      1. HBASE-8711-1.patch
        4 kB
        James Kinley
      2. RPS-TEST-1.log
        5 kB
        James Kinley
      3. RPS-TEST-2.log
        5 kB
        James Kinley

        Issue Links

          Activity

          Hide
          James Kinley added a comment -

          Hey Elliott. Are you working on this at the moment? If not, I was going to take a look if thats ok with you?

          Show
          James Kinley added a comment - Hey Elliott. Are you working on this at the moment? If not, I was going to take a look if thats ok with you?
          Hide
          Elliott Clark added a comment -

          That would be great. Unfortunately this has fallen off my plate lately.

          Show
          Elliott Clark added a comment - That would be great. Unfortunately this has fallen off my plate lately.
          Hide
          James Kinley added a comment -

          There are a couple of things going on here. The first issue is that the point in time metrics for a region server are only computed every 15 seconds and therefore requestsPerSecond is actually an average for the previous interval:

          requestsPerSecond = (numberRequestsSinceLastRun / secondsSinceLastRun)
          

          The client write buffer also confuses things because the writes are delayed reaching the server, then the server can take up to 15 seconds to include them in the metrics, and another 15 seconds to clear them down.

          The second issue is that the requestCount counter used to compute requestsPerSecond is incremented for every region server operation (i.e. including requests for region info and not just for mutations) and is therefore higher than expected.

          I've attached the results of two tests to show whats happening. The same client was used in both tests to generate 100 req/sec for 60 seconds (6000 writes):

          In the first test (RPS-TEST-1.log) the client write buffer was enabled and you can see that it took 52 seconds before the writes appeared in the metrics, and because they were all sent at once requestsPerSecond was skewed at 09:07:28 (requestsPerSecond: 602.7862951606452). It took a further 30 seconds before requestsPerSecond returned to 0. Also, writeRequestsCount was only incremented once for each flush, and the individual writes were not counted (tempWriteRequestsCount = 4).

          In the second test (RPS-TEST-2.log) the client write buffer was disabled and you can see that the writes started to appear in the metrics after the next metric computation was run (15 seconds), and requestsPerSecond hovered at ~170 for the duration of the test. The writes were also counted individually (tempWriteRequestsCount = 6004).

          Both tests show an increase of over 12000 requests for 6000 writes.

          Computing requestsPerSecond from the requestCount counter makes sense, but the interval should at least be configurable and be lower by default (maybe 5 seconds).

          Show
          James Kinley added a comment - There are a couple of things going on here. The first issue is that the point in time metrics for a region server are only computed every 15 seconds and therefore requestsPerSecond is actually an average for the previous interval: requestsPerSecond = (numberRequestsSinceLastRun / secondsSinceLastRun) The client write buffer also confuses things because the writes are delayed reaching the server, then the server can take up to 15 seconds to include them in the metrics, and another 15 seconds to clear them down. The second issue is that the requestCount counter used to compute requestsPerSecond is incremented for every region server operation (i.e. including requests for region info and not just for mutations) and is therefore higher than expected. I've attached the results of two tests to show whats happening. The same client was used in both tests to generate 100 req/sec for 60 seconds (6000 writes): In the first test (RPS-TEST-1.log) the client write buffer was enabled and you can see that it took 52 seconds before the writes appeared in the metrics, and because they were all sent at once requestsPerSecond was skewed at 09:07:28 (requestsPerSecond: 602.7862951606452). It took a further 30 seconds before requestsPerSecond returned to 0. Also, writeRequestsCount was only incremented once for each flush, and the individual writes were not counted (tempWriteRequestsCount = 4). In the second test (RPS-TEST-2.log) the client write buffer was disabled and you can see that the writes started to appear in the metrics after the next metric computation was run (15 seconds), and requestsPerSecond hovered at ~170 for the duration of the test. The writes were also counted individually (tempWriteRequestsCount = 6004). Both tests show an increase of over 12000 requests for 6000 writes. Computing requestsPerSecond from the requestCount counter makes sense, but the interval should at least be configurable and be lower by default (maybe 5 seconds).
          Hide
          James Kinley added a comment -

          Attached patch to make regionserver metric period configurable, and to lower the default period to 5 seconds.

          Show
          James Kinley added a comment - Attached patch to make regionserver metric period configurable, and to lower the default period to 5 seconds.
          Hide
          Elliott Clark added a comment -

          +1 lgtm

          Show
          Elliott Clark added a comment - +1 lgtm
          Hide
          Jean-Daniel Cryans added a comment -

          Thanks James, committed to branch and trunk.

          Show
          Jean-Daniel Cryans added a comment - Thanks James, committed to branch and trunk.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96 #100 (See https://builds.apache.org/job/hbase-0.96/100/)
          HBASE-8711 Requests count is completely off (jdcryans: rev 1526753)

          • /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96 #100 (See https://builds.apache.org/job/hbase-0.96/100/ ) HBASE-8711 Requests count is completely off (jdcryans: rev 1526753) /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4567 (See https://builds.apache.org/job/HBase-TRUNK/4567/)
          HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4567 (See https://builds.apache.org/job/HBase-TRUNK/4567/ ) HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96-hadoop2 #61 (See https://builds.apache.org/job/hbase-0.96-hadoop2/61/)
          HBASE-8711 Requests count is completely off (jdcryans: rev 1526753)

          • /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96-hadoop2 #61 (See https://builds.apache.org/job/hbase-0.96-hadoop2/61/ ) HBASE-8711 Requests count is completely off (jdcryans: rev 1526753) /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #765 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/765/)
          HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #765 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/765/ ) HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java

            People

            • Assignee:
              James Kinley
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development