[HBASE-15619] Performance regression observed: Empty random read(get) performance of branch-1 worse than 0.98 - ASF JIRA

XML

Word

Printable

JSON

As titled, I observed the perf regression in the final stress testing before upgrading our online cluster to 1.x. More details as follows:

1. HBase version in the comparison test:

0.98: based on 0.98.12 with some backports, among which ~~HBASE-11297~~ is the most important perf-related one (especially under high stress)
1.x: checked 3 releases in total
1) 1.1.2 with important perf fixes/improvements including ~~HBASE-15031~~ and ~~HBASE-14465~~
2) 1.1.4 release
3) 1.2.1RC1

2. Test environment

YCSB: 0.7.0 with YCSB-651 applied
Client: 4 physical nodes, each with 8 YCSB instance, each instance with 100 threads
Server: 1 Master with 3 RS, each RS with 256 handlers and 64G heap
Hardware: 64-core CPU, 256GB Mem, 10Gb Net, 1 PCIe-SSD and 11 HDD, same hardware for client and server

3. Test cases

4. Test result

1.1.4 and 1.2.1 have a similar perf (less than 2% deviation) as 1.1.2+, so will only paste comparison data of 0.98.12+ and 1.1.2+
per-RS Throughput(ops/s)

HBaseVersion case#1 ~~case#2~~ ~~case#3~~

0.98.12+ 383562 ~~257493~~ ~~47594~~

1.1.2+ 363050 ~~232757~~ ~~35872~~
AverageLatency(us)

HBaseVersion case#1 ~~case#2~~ ~~case#3~~

0.98.12+ 2774 ~~4134~~ ~~22371~~

1.1.2+ 2930 ~~4572~~ ~~29690~~

It seems there's perf regression on RPCServer (we tried 0.98 client against 1.x server and observed a similar perf to 1.x client)

is superceded by

HBASE-15971 Regression: Random Read/WorkloadC slower in 1.x than 0.98

relates to

HDFS-10690 Optimize insertion/removal of replica in ShortCircuitCache

HBaseVersion	case#1	~~case#2~~	~~case#3~~
0.98.12+	383562	~~257493~~	~~47594~~
1.1.2+	363050	~~232757~~	~~35872~~

HBaseVersion	case#1	~~case#2~~	~~case#3~~
0.98.12+	2774	~~4134~~	~~22371~~
1.1.2+	2930	~~4572~~	~~29690~~