Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.95.1
    • Fix Version/s: 0.98.0, 0.96.0
    • Component/s: UI
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      hbase.regionserver.metrics.period can now be used to configure how often the RS metrics are computed. Defaults to 5, was 15 before this patch.

      Description

      I tried 0.95.1 RC1 in standalone, and the requests count in both the master and RS web UIs are wrong. I haven't dug too much in but it seems too low when I'm sending load, and it takes >10 seconds to clear up when the cluster becomes completely idle.

      1. RPS-TEST-2.log
        5 kB
        James Kinley
      2. RPS-TEST-1.log
        5 kB
        James Kinley
      3. HBASE-8711-1.patch
        4 kB
        James Kinley

        Issue Links

          Activity

          Jean-Daniel Cryans created issue -
          stack made changes -
          Field Original Value New Value
          Assignee Elliott Clark [ eclark ]
          Priority Major [ 3 ] Critical [ 2 ]
          Elliott Clark made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          Elliott Clark made changes -
          Component/s UI [ 12319669 ]
          Hide
          James Kinley added a comment -

          Hey Elliott. Are you working on this at the moment? If not, I was going to take a look if thats ok with you?

          Show
          James Kinley added a comment - Hey Elliott. Are you working on this at the moment? If not, I was going to take a look if thats ok with you?
          Hide
          Elliott Clark added a comment -

          That would be great. Unfortunately this has fallen off my plate lately.

          Show
          Elliott Clark added a comment - That would be great. Unfortunately this has fallen off my plate lately.
          Elliott Clark made changes -
          Assignee Elliott Clark [ eclark ] James Kinley [ jrkinley ]
          stack made changes -
          Fix Version/s 0.96.0 [ 12324822 ]
          Fix Version/s 0.95.2 [ 12320040 ]
          Hide
          James Kinley added a comment -

          There are a couple of things going on here. The first issue is that the point in time metrics for a region server are only computed every 15 seconds and therefore requestsPerSecond is actually an average for the previous interval:

          requestsPerSecond = (numberRequestsSinceLastRun / secondsSinceLastRun)
          

          The client write buffer also confuses things because the writes are delayed reaching the server, then the server can take up to 15 seconds to include them in the metrics, and another 15 seconds to clear them down.

          The second issue is that the requestCount counter used to compute requestsPerSecond is incremented for every region server operation (i.e. including requests for region info and not just for mutations) and is therefore higher than expected.

          I've attached the results of two tests to show whats happening. The same client was used in both tests to generate 100 req/sec for 60 seconds (6000 writes):

          In the first test (RPS-TEST-1.log) the client write buffer was enabled and you can see that it took 52 seconds before the writes appeared in the metrics, and because they were all sent at once requestsPerSecond was skewed at 09:07:28 (requestsPerSecond: 602.7862951606452). It took a further 30 seconds before requestsPerSecond returned to 0. Also, writeRequestsCount was only incremented once for each flush, and the individual writes were not counted (tempWriteRequestsCount = 4).

          In the second test (RPS-TEST-2.log) the client write buffer was disabled and you can see that the writes started to appear in the metrics after the next metric computation was run (15 seconds), and requestsPerSecond hovered at ~170 for the duration of the test. The writes were also counted individually (tempWriteRequestsCount = 6004).

          Both tests show an increase of over 12000 requests for 6000 writes.

          Computing requestsPerSecond from the requestCount counter makes sense, but the interval should at least be configurable and be lower by default (maybe 5 seconds).

          Show
          James Kinley added a comment - There are a couple of things going on here. The first issue is that the point in time metrics for a region server are only computed every 15 seconds and therefore requestsPerSecond is actually an average for the previous interval: requestsPerSecond = (numberRequestsSinceLastRun / secondsSinceLastRun) The client write buffer also confuses things because the writes are delayed reaching the server, then the server can take up to 15 seconds to include them in the metrics, and another 15 seconds to clear them down. The second issue is that the requestCount counter used to compute requestsPerSecond is incremented for every region server operation (i.e. including requests for region info and not just for mutations) and is therefore higher than expected. I've attached the results of two tests to show whats happening. The same client was used in both tests to generate 100 req/sec for 60 seconds (6000 writes): In the first test (RPS-TEST-1.log) the client write buffer was enabled and you can see that it took 52 seconds before the writes appeared in the metrics, and because they were all sent at once requestsPerSecond was skewed at 09:07:28 (requestsPerSecond: 602.7862951606452). It took a further 30 seconds before requestsPerSecond returned to 0. Also, writeRequestsCount was only incremented once for each flush, and the individual writes were not counted (tempWriteRequestsCount = 4). In the second test (RPS-TEST-2.log) the client write buffer was disabled and you can see that the writes started to appear in the metrics after the next metric computation was run (15 seconds), and requestsPerSecond hovered at ~170 for the duration of the test. The writes were also counted individually (tempWriteRequestsCount = 6004). Both tests show an increase of over 12000 requests for 6000 writes. Computing requestsPerSecond from the requestCount counter makes sense, but the interval should at least be configurable and be lower by default (maybe 5 seconds).
          James Kinley made changes -
          Attachment RPS-TEST-2.log [ 12604997 ]
          Attachment RPS-TEST-1.log [ 12604998 ]
          Hide
          James Kinley added a comment -

          Attached patch to make regionserver metric period configurable, and to lower the default period to 5 seconds.

          Show
          James Kinley added a comment - Attached patch to make regionserver metric period configurable, and to lower the default period to 5 seconds.
          James Kinley made changes -
          Attachment HBASE-8711-1.patch [ 12605026 ]
          James Kinley made changes -
          Link This issue is related to HBASE-9656 [ HBASE-9656 ]
          Hide
          Elliott Clark added a comment -

          +1 lgtm

          Show
          Elliott Clark added a comment - +1 lgtm
          Hide
          Jean-Daniel Cryans added a comment -

          Thanks James, committed to branch and trunk.

          Show
          Jean-Daniel Cryans added a comment - Thanks James, committed to branch and trunk.
          Jean-Daniel Cryans made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 0.98.0 [ 12323143 ]
          Resolution Fixed [ 1 ]
          Jean-Daniel Cryans made changes -
          Release Note hbase.regionserver.metrics.period can now be used to configure how often the RS metrics are computed. Defaults to 5, was 15 before this patch.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96 #100 (See https://builds.apache.org/job/hbase-0.96/100/)
          HBASE-8711 Requests count is completely off (jdcryans: rev 1526753)

          • /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96 #100 (See https://builds.apache.org/job/hbase-0.96/100/ ) HBASE-8711 Requests count is completely off (jdcryans: rev 1526753) /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4567 (See https://builds.apache.org/job/HBase-TRUNK/4567/)
          HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4567 (See https://builds.apache.org/job/HBase-TRUNK/4567/ ) HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in hbase-0.96-hadoop2 #61 (See https://builds.apache.org/job/hbase-0.96-hadoop2/61/)
          HBASE-8711 Requests count is completely off (jdcryans: rev 1526753)

          • /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - SUCCESS: Integrated in hbase-0.96-hadoop2 #61 (See https://builds.apache.org/job/hbase-0.96-hadoop2/61/ ) HBASE-8711 Requests count is completely off (jdcryans: rev 1526753) /hbase/branches/0.96/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #765 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/765/)
          HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754)

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #765 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/765/ ) HBASE-8711 Requests count is completely off (James Kinley via JD) (jdcryans: rev 1526754) /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionServerWrapperImpl.java
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          111d 1h 46m 1 Jean-Daniel Cryans 27/Sep/13 01:04

            People

            • Assignee:
              James Kinley
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development