HBase
  1. HBase
  2. HBASE-10788

Add 99th percentile of latency in PE

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.99.0
    • Fix Version/s: 0.99.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In production env, 99th percentile of latency is more important than the avg. The 99th percentile is helpful to measure the influence of GC, slow read/write of HDFS.

      1. HBASE-10788-trunk-v3.diff
        9 kB
        Liu Shaohui
      2. HBASE-10788-trunk-v2.diff
        7 kB
        Liu Shaohui
      3. HBASE-10788-trunk-v1.diff
        2 kB
        Liu Shaohui

        Issue Links

          Activity

          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5047 (See https://builds.apache.org/job/HBase-TRUNK/5047/)
          HBASE-10788 Add 99th percentile of latency in PE (Liu Shaohui) (liangxie: rev 1582583)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5047 (See https://builds.apache.org/job/HBase-TRUNK/5047/ ) HBASE-10788 Add 99th percentile of latency in PE (Liu Shaohui) (liangxie: rev 1582583) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
          Hide
          Liang Xie added a comment -

          Integrated into trunk, thanks all for review, thank you for the patch Liu Shaohui

          Show
          Liang Xie added a comment - Integrated into trunk, thanks all for review, thank you for the patch Liu Shaohui
          Hide
          Nick Dimiduk added a comment -

          +1

          Show
          Nick Dimiduk added a comment - +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12637092/HBASE-10788-trunk-v3.diff
          against trunk revision .
          ATTACHMENT ID: 12637092

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 6 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:

          -1 core zombie tests. There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestTableMapReduceBase.testMultiRegionTable(TestTableMapReduceBase.java:96)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12637092/HBASE-10788-trunk-v3.diff against trunk revision . ATTACHMENT ID: 12637092 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 javadoc . The javadoc tool appears to have generated 6 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: -1 core zombie tests . There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestTableMapReduceBase.testMultiRegionTable(TestTableMapReduceBase.java:96) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9108//console This message is automatically generated.
          Hide
          Liu Shaohui added a comment -

          Sample output

          2014-03-27 13:02:04,935 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest latency log (ms), on 1024 measures
          2014-03-27 13:02:04,935 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest Min    = 0.0
          2014-03-27 13:02:04,935 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest Avg    = 0.0341796875
          2014-03-27 13:02:04,936 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest StdDev = 0.2722120941128983
          2014-03-27 13:02:04,937 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 50th   = 0.0
          2014-03-27 13:02:04,938 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 95th   = 0.0
          2014-03-27 13:02:04,938 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 99th   = 1.0
          2014-03-27 13:02:04,939 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 99.9th = 6.850000000000136
          2014-03-27 13:02:04,939 INFO  [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest Max    = 7.0
          
          Show
          Liu Shaohui added a comment - Sample output 2014-03-27 13:02:04,935 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest latency log (ms), on 1024 measures 2014-03-27 13:02:04,935 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest Min = 0.0 2014-03-27 13:02:04,935 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest Avg = 0.0341796875 2014-03-27 13:02:04,936 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest StdDev = 0.2722120941128983 2014-03-27 13:02:04,937 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 50th = 0.0 2014-03-27 13:02:04,938 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 95th = 0.0 2014-03-27 13:02:04,938 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 99th = 1.0 2014-03-27 13:02:04,939 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest 99.9th = 6.850000000000136 2014-03-27 13:02:04,939 INFO [TestClient-2] hbase.PerformanceEvaluation: RandomWriteTest Max = 7.0
          Hide
          Liu Shaohui added a comment -

          Update for Nick Dimiduk's review.

          Changes:
          a, clean some extra ws.
          b, remove redundant thread name in log
          c, make latency metric with sample rate and support the sample rate for all tests.

          Show
          Liu Shaohui added a comment - Update for Nick Dimiduk 's review. Changes: a, clean some extra ws. b, remove redundant thread name in log c, make latency metric with sample rate and support the sample rate for all tests.
          Hide
          Liu Shaohui added a comment -

          Nick Dimiduk

          The thread name is not relevant in MR mode, and is redundant in your example output above.

          +      String metricName =
          +          testName + "-Client-" + Thread.currentThread().getName() + "-testRowTime";
          

          Adding thread name in the metric name is to distinguish metrics from different threads in no mapred mode. The the thread names in the log are added by the setStatus automatically and i will remove one.
          Thanks for your review and I will update the patch later.

          Show
          Liu Shaohui added a comment - Nick Dimiduk The thread name is not relevant in MR mode, and is redundant in your example output above. + String metricName = + testName + "-Client-" + Thread .currentThread().getName() + "-testRowTime" ; Adding thread name in the metric name is to distinguish metrics from different threads in no mapred mode. The the thread names in the log are added by the setStatus automatically and i will remove one. Thanks for your review and I will update the patch later.
          Hide
          Nick Dimiduk added a comment -

          nit: this patch has some extra ws. Please clean it up on commit.

          The thread name is not relevant in MR mode, and is redundant in your example output above.

          +      String metricName =
          +          testName + "-Client-" + Thread.currentThread().getName() + "-testRowTime";
          

          This will inaccurately count any results from tests that respect --sampleRate.

          +        long startTime = System.currentTimeMillis();
                   testRow(i);
          +        latency.update(System.currentTimeMillis() - startTime);
          
          Show
          Nick Dimiduk added a comment - nit: this patch has some extra ws. Please clean it up on commit. The thread name is not relevant in MR mode, and is redundant in your example output above. + String metricName = + testName + "-Client-" + Thread.currentThread().getName() + "-testRowTime"; This will inaccurately count any results from tests that respect --sampleRate. + long startTime = System.currentTimeMillis(); testRow(i); + latency.update(System.currentTimeMillis() - startTime);
          Hide
          Liang Xie added a comment -

          +1. will commit tomorrow if no objection

          Show
          Liang Xie added a comment - +1. will commit tomorrow if no objection
          Hide
          stack added a comment -

          Patch lgtm (This is excellent)

          Show
          stack added a comment - Patch lgtm (This is excellent)
          Hide
          Liang Xie added a comment -

          emmm... Liu Shaohui you forgot to click "Submit Patch", so it's not on the radar

          Show
          Liang Xie added a comment - emmm... Liu Shaohui you forgot to click "Submit Patch", so it's not on the radar
          Hide
          Liu Shaohui added a comment -

          Would someone have time to review this patch? Thanks.

          Show
          Liu Shaohui added a comment - Would someone have time to review this patch? Thanks.
          Hide
          Liu Shaohui added a comment -

          Add an report of min, avg, 95th, 99th, and 99.9th, max atency for each test.

          Sample output

          2014-03-20 13:59:27,924 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest latency log (ms), on 10240 measures
          2014-03-20 13:59:27,924 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest Min    = 0.0
          2014-03-20 13:59:27,924 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest Avg    = 0.11376953125
          2014-03-20 13:59:27,924 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest StdDev = 4.195329193206241
          2014-03-20 13:59:27,924 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 50th   = 0.0
          2014-03-20 13:59:27,925 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 95th   = 0.0
          2014-03-20 13:59:27,925 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 99th   = 0.7100000000000364
          2014-03-20 13:59:27,925 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 99.9th = 45.78200000000015
          2014-03-20 13:59:27,925 INFO  [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest Max    = 289.0
          

          Nick DimidukLars Hofhansl Andrew Purtell
          Please help to review this patch. Thx

          Show
          Liu Shaohui added a comment - Add an report of min, avg, 95th, 99th, and 99.9th, max atency for each test. Sample output 2014-03-20 13:59:27,924 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest latency log (ms), on 10240 measures 2014-03-20 13:59:27,924 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest Min = 0.0 2014-03-20 13:59:27,924 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest Avg = 0.11376953125 2014-03-20 13:59:27,924 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest StdDev = 4.195329193206241 2014-03-20 13:59:27,924 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 50th = 0.0 2014-03-20 13:59:27,925 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 95th = 0.0 2014-03-20 13:59:27,925 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 99th = 0.7100000000000364 2014-03-20 13:59:27,925 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest 99.9th = 45.78200000000015 2014-03-20 13:59:27,925 INFO [TestClient-2] hbase.PerformanceEvaluation: client-TestClient-2 RandomWriteTest Max = 289.0 Nick Dimiduk Lars Hofhansl Andrew Purtell Please help to review this patch. Thx
          Hide
          Andrew Purtell added a comment -

          Usually I find min, avg, 95th, 99th, and 99.9th percentiles, and max useful.

          Certainly average, max, and 95th are useful information in addition to higher percentiles, +1

          Show
          Andrew Purtell added a comment - Usually I find min, avg, 95th, 99th, and 99.9th percentiles, and max useful. Certainly average, max, and 95th are useful information in addition to higher percentiles, +1
          Hide
          Nick Dimiduk added a comment -

          Could be your use of a real metrics library is the right way to go. My version allocates arrays of doubles, which can become expensive. I'd also like to add a mixed-workload test, in which case it'll be good to isolate read from write metrics, etc. Maybe your use of the yammer metrics library will support this, and also help minimize memory footprint while maintaining statical significance of the results. If you're adding a new dependency, be sure to include the jar in the mapreduce job.

          Good on you Liu Shaohui.

          Show
          Nick Dimiduk added a comment - Could be your use of a real metrics library is the right way to go. My version allocates arrays of doubles, which can become expensive. I'd also like to add a mixed-workload test, in which case it'll be good to isolate read from write metrics, etc. Maybe your use of the yammer metrics library will support this, and also help minimize memory footprint while maintaining statical significance of the results. If you're adding a new dependency, be sure to include the jar in the mapreduce job. Good on you Liu Shaohui .
          Hide
          Liu Shaohui added a comment -

          Nick Dimiduk
          Yes, I would like to add percentiles in the base class: Test and add all percentiles for all tests.
          Sorry for not noticing the percentiles code in the randomRead test. I will redo the patch based on HBASE-10007.

          Show
          Liu Shaohui added a comment - Nick Dimiduk Yes, I would like to add percentiles in the base class: Test and add all percentiles for all tests. Sorry for not noticing the percentiles code in the randomRead test. I will redo the patch based on HBASE-10007 .
          Hide
          Liang Xie added a comment -

          FYI, Nick have done a good latency job inside PE, see HBASE-10007. so your plan is to enhance it into more operation or?

          Show
          Liang Xie added a comment - FYI, Nick have done a good latency job inside PE, see HBASE-10007 . so your plan is to enhance it into more operation or?
          Hide
          Nick Dimiduk added a comment -

          PerfEval already has percentiles, at least for the randomRead test. Would you like to add support to the other tests?

          https://github.com/apache/hbase/blob/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java#L785-L795

                  LOG.info("randomRead latency log (ms), on " + times.length + " measures");
                  LOG.info("99.9999% = " + ds.getPercentile(99.9999d));
                  LOG.info(" 99.999% = " + ds.getPercentile(99.999d));
                  LOG.info("  99.99% = " + ds.getPercentile(99.99d));
                  LOG.info("   99.9% = " + ds.getPercentile(99.9d));
                  LOG.info("     99% = " + ds.getPercentile(99d));
                  LOG.info("     95% = " + ds.getPercentile(95d));
                  LOG.info("     90% = " + ds.getPercentile(90d));
                  LOG.info("     80% = " + ds.getPercentile(80d));
                  LOG.info("Standard Deviation = " + ds.getStandardDeviation());
                  LOG.info("Mean = " + ds.getMean());
          
          Show
          Nick Dimiduk added a comment - PerfEval already has percentiles, at least for the randomRead test. Would you like to add support to the other tests? https://github.com/apache/hbase/blob/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java#L785-L795 LOG.info("randomRead latency log (ms), on " + times.length + " measures"); LOG.info("99.9999% = " + ds.getPercentile(99.9999d)); LOG.info(" 99.999% = " + ds.getPercentile(99.999d)); LOG.info(" 99.99% = " + ds.getPercentile(99.99d)); LOG.info(" 99.9% = " + ds.getPercentile(99.9d)); LOG.info(" 99% = " + ds.getPercentile(99d)); LOG.info(" 95% = " + ds.getPercentile(95d)); LOG.info(" 90% = " + ds.getPercentile(90d)); LOG.info(" 80% = " + ds.getPercentile(80d)); LOG.info("Standard Deviation = " + ds.getStandardDeviation()); LOG.info("Mean = " + ds.getMean());
          Hide
          Liu Shaohui added a comment -

          Patch for trunk

          Show
          Liu Shaohui added a comment - Patch for trunk
          Hide
          Lars Hofhansl added a comment -

          Usually I find min, avg, 95th, 99th, and 99.9th percentiles, and max useful.

          Show
          Lars Hofhansl added a comment - Usually I find min, avg, 95th, 99th, and 99.9th percentiles, and max useful.

            People

            • Assignee:
              Liu Shaohui
              Reporter:
              Liu Shaohui
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development