Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Done
-
None
-
None
-
None
Attachments
Attachments
- flamegraph.svg
- 144 kB
- Duo Zhang
Activity
Is there a test of the async client alone w/o going via sync client or would hooking up async client to YCSB result in effectively the same as the sync client numbers?
OK, the autoFlush is false by default so we are testing the performance for BufferedMutator...
Let me test the single read/write first, it should not have too much difference.
Changed the command to
./bin/hbase pe --nomapred --presplit=10 --autoFlush=true --rows=1000000 randomWrite 10 &>test_write.log < /dev/null
The result for master
2019-06-12 09:48:52,720 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [446683, 451587, 454201, 452055, 450678, 454432, 447952, 451198, 453086, 451225] 2019-06-12 09:48:53,056 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 446683ms Max: 454432ms Avg: 451309ms 2019-06-12 09:48:53,056 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 454 2019-06-12 09:48:53,056 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 22158 row per second
The result for HBASE-21512
2019-06-12 09:58:37,060 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [448824, 453081, 458509, 455361, 456904, 457235, 456791, 455159, 455435, 451813] 2019-06-12 09:58:37,393 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 448824ms Max: 458509ms Avg: 454911ms 2019-06-12 09:58:37,393 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 452 2019-06-12 09:58:37,393 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 21983 row per secon
No much difference.
Let me check for read and scan also.
And I think there are something wrong with the BufferedMutator implementation, or maybe the AsyncBufferMutator itself has some performance issues need to be addressed. Will dig more later.
Read
master
2019-06-12 10:50:04,215 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [215237, 211224, 208305, 211377, 211420, 212057, 209782, 209410, 212985, 211331] 2019-06-12 10:50:04,479 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest duration ] Min: 208305ms Max: 215237ms Avg: 211312ms 2019-06-12 10:50:04,479 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 210 2019-06-12 10:50:04,479 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 47327 row per second
2019-06-13 11:44:58,937 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [211625, 213264, 212873, 213147, 213560, 212547, 212470, 211604, 212792, 213180] 2019-06-13 11:44:59,220 INFO [main] hbase.PerformanceEvaluation: [RandomReadTest duration ] Min: 211604ms Max: 213560ms Avg: 212706ms 2019-06-13 11:44:59,220 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 212 2019-06-13 11:44:59,220 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 47014 row per second
No big difference.
And for scan, the new implementation is way better.
./bin/hbase pe --nomapred --rows=1000000 scan 10 &>test_scan.log < /dev/null &
master
2019-06-13 13:58:27,159 INFO [main] hbase.PerformanceEvaluation: [ScanTest] Summary of timings (ms): [20897, 20982, 20569, 21168, 20766, 20826, 20231, 21049, 21041, 14057] 2019-06-13 13:58:27,260 INFO [main] hbase.PerformanceEvaluation: [ScanTest duration ] Min: 14057ms Max: 21168ms Avg: 20158ms 2019-06-13 13:58:27,261 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 20 2019-06-13 13:58:27,261 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 503147 row per second
2019-06-13 14:00:31,062 INFO [main] hbase.PerformanceEvaluation: [ScanTest] Summary of timings (ms): [15918, 15750, 15383, 15708, 15540, 15530, 15424, 15709, 15688, 10542] 2019-06-13 14:00:31,145 INFO [main] hbase.PerformanceEvaluation: [ScanTest duration ] Min: 10542ms Max: 15918ms Avg: 15119ms 2019-06-13 14:00:31,145 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 15 2019-06-13 14:00:31,145 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 670818 row per second
About 33% faster.
And for scanRange10
./bin/hbase pe --nomapred --rows=1000000 scanRange10 10 &>test_scan10.log < /dev/null &
master
2019-06-13 14:15:39,578 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange10Test] Summary of timings (ms): [431574, 431034, 426640, 436114, 429464, 431428, 434035, 428176, 430324, 431243] 2019-06-13 14:15:39,910 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange10Test duration ] Min: 426640ms Max: 436114ms Avg: 431003ms 2019-06-13 14:15:39,910 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 428 2019-06-13 14:15:39,910 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 23203 row per second
2019-06-13 14:07:42,016 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange10Test] Summary of timings (ms): [364311, 365391, 365243, 367459, 365778, 366567, 364024, 363931, 363785, 366916] 2019-06-13 14:07:42,350 INFO [main] hbase.PerformanceEvaluation: [RandomScanWithRange10Test duration ] Min: 363785ms Max: 367459ms Avg: 365340ms 2019-06-13 14:07:42,350 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 362 2019-06-13 14:07:42,350 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 27372 row per second
About 18% faster.
I think the reason here is that now the scanner is 'async prefetch' by default. And I've also tested the old async prefetch scanner, but the result is worse...
./bin/hbase pe --nomapred --rows=1000000 --asyncPrefetch scan 10 &>test_scan_async.log < /dev/null &
2019-06-13 14:27:54,801 INFO [main] hbase.PerformanceEvaluation: [ScanTest] Summary of timings (ms): [21498, 21307, 20792, 21525, 21006, 21569, 21109, 20920, 21774, 14307] 2019-06-13 14:27:54,858 INFO [main] hbase.PerformanceEvaluation: [ScanTest duration ] Min: 14307ms Max: 21774ms Avg: 20580ms 2019-06-13 14:27:54,858 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 20 2019-06-13 14:27:54,858 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 492967 row per second
Now, let me focus on the performance for buffered mutator.
OK, tried multi put
./bin/hbase pe --nomapred --presplit=10 --autoFlush=true --multiPut=100 --rows=1000000 randomWrite 10 &>test_write.log < /dev/null &
master
2019-06-13 15:35:07,037 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [39576, 38874, 39181, 38928, 39234, 39165, 39423, 39356, 39240, 39249] 2019-06-13 15:35:07,047 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 38874ms Max: 39576ms Avg: 39222ms 2019-06-13 15:35:07,047 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 3472 2019-06-13 15:35:07,047 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 254962 row per second
2019-06-13 15:36:47,018 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [37016, 36973, 37040, 37176, 37189, 36650, 36994, 37319, 37240, 36923] 2019-06-13 15:36:47,028 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 36650ms Max: 37319ms Avg: 37052ms 2019-06-13 15:36:47,028 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 3320 2019-06-13 15:36:47,028 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 269897 row per second
The new implementation is a bit better, which means the new implementation for batch call is not worse than the old AsyncProcess. So I think the problem here is only in the new BufferedMutator or the AsyncBufferedMutator implementation. Let me dig...
See the graph, I think the problem is the BufferedMutatorOverAsyncBufferedMutator.tryCompleteFuture. The implementation is a bit stupid, if there are no completed futures then we will copy the whole array again and again.
Let me think how to deal with this...
WIth the patch for HBASE-22577 in place, we can get the same performance.
./bin/hbase pe --presplit=10 --nomapred --rows=2000000 randomWrite 10 &>test.log </dev/null &
Test several times to get the best result for each branch
master
2019-06-13 22:34:58,623 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [53784, 53411, 52768, 51903, 53248, 53368, 53204, 53318, 53443, 53425] 2019-06-13 22:34:58,717 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 51903ms Max: 53784ms Avg: 53187ms 2019-06-13 22:34:58,717 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 26 2019-06-13 22:34:58,717 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 376063 row per second
2019-06-13 22:18:11,039 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [48765, 50178, 50147, 50072, 49698, 49768, 50099, 49815, 50084, 50164] 2019-06-13 22:18:11,147 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 48765ms Max: 50178ms Avg: 49879ms 2019-06-13 22:18:11,147 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 24 2019-06-13 22:18:11,147 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 400998 row per second
We can even do better.
So finally let me also do a YCSB test to finish this.
Run YCSB workloada, after applying the patch here https://github.com/brianfrankcooper/YCSB/pull/1315...
Created a test table with 50 splits, recordcount=10000000
./bin/ycsb run hbase20 -P workloads/workloada -cp conf/ -s -threads 10 -p columnfamily=cf &>ycsb.log </dev/null &
master
[OVERALL], RunTime(ms), 374253 [OVERALL], Throughput(ops/sec), 26719.892692910944 [TOTAL_GCS_PS_Scavenge], Count, 677 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 1974 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.5274506817580621 [TOTAL_GCS_PS_MarkSweep], Count, 1 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 37 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.00988636029637705 [TOTAL_GCs], Count, 678 [TOTAL_GC_TIME], Time(ms), 2011 [TOTAL_GC_TIME_%], Time(%), 0.5373370420544391 [READ], Operations, 4999490 [READ], AverageLatency(us), 354.65162666591993 [READ], MinLatency(us), 155 [READ], MaxLatency(us), 172287 [READ], 95thPercentileLatency(us), 574 [READ], 99thPercentileLatency(us), 713 [READ], Return=OK, 4999490 [CLEANUP], Operations, 20 [CLEANUP], AverageLatency(us), 97.85 [CLEANUP], MinLatency(us), 1 [CLEANUP], MaxLatency(us), 1468 [CLEANUP], 95thPercentileLatency(us), 375 [CLEANUP], 99thPercentileLatency(us), 1468 [UPDATE], Operations, 5000510 [UPDATE], AverageLatency(us), 386.8829985341495 [UPDATE], MinLatency(us), 231 [UPDATE], MaxLatency(us), 168831 [UPDATE], 95thPercentileLatency(us), 663 [UPDATE], 99thPercentileLatency(us), 1057 [UPDATE], Return=OK, 5000510
[OVERALL], RunTime(ms), 374329 [OVERALL], Throughput(ops/sec), 26714.4677543017 [TOTAL_GCS_PS_Scavenge], Count, 1035 [TOTAL_GC_TIME_PS_Scavenge], Time(ms), 1566 [TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.41834856503236456 [TOTAL_GCS_PS_MarkSweep], Count, 1 [TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 35 [TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.009350063714005593 [TOTAL_GCs], Count, 1036 [TOTAL_GC_TIME], Time(ms), 1601 [TOTAL_GC_TIME_%], Time(%), 0.4276986287463702 [READ], Operations, 4997338 [READ], AverageLatency(us), 354.2875168739837 [READ], MinLatency(us), 155 [READ], MaxLatency(us), 183039 [READ], 95thPercentileLatency(us), 581 [READ], 99thPercentileLatency(us), 723 [READ], Return=OK, 4997338 [CLEANUP], Operations, 20 [CLEANUP], AverageLatency(us), 323.8 [CLEANUP], MinLatency(us), 0 [CLEANUP], MaxLatency(us), 6007 [CLEANUP], 95thPercentileLatency(us), 277 [CLEANUP], 99thPercentileLatency(us), 6007 [UPDATE], Operations, 5002662 [UPDATE], AverageLatency(us), 387.05991630056155 [UPDATE], MinLatency(us), 232 [UPDATE], MaxLatency(us), 172799 [UPDATE], 95thPercentileLatency(us), 664 [UPDATE], 99thPercentileLatency(us), 1130 [UPDATE], Return=OK, 5002662
No differences.
Set up a 5 node hbase cluster with the master code, and test it with both master client and
HBASE-21512client with PE. The number is not good, need to find out what's the problem.The command is
And the result for master client
And the result for
HBASE-21512client