[HDFS-15202] HDFS-client: boost ShortCircuit Cache - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.3.1, 3.4.0
Component/s: dfsclient
Labels:
None
Environment:

4 nodes E5-2698 v4 @ 2.20GHz, 700 Gb Mem.

8 RegionServers (2 by host)

8 tables by 64 regions by 1.88 Gb data in each = 900 Gb total

Random read in 800 threads via YCSB and a little bit updates (10% of reads)

Description

ТотI want to propose how to improve reading performance HDFS-client. The idea: create few instances ShortCircuit caches instead of one.

The key points:

1. Create array of caches (set by clientShortCircuitNum=dfs.client.short.circuit.num, see in the pull requests below):

private ClientContext(String name, DfsClientConf conf, Configuration config) {
...
    shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum];
    for (int i = 0; i < this.clientShortCircuitNum; i++) {
      this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf);
    }

2 Then divide blocks by caches:

  public ShortCircuitCache getShortCircuitCache(long idx) {
    return shortCircuitCache[(int) (idx % clientShortCircuitNum)];
  }

3. And how to call it:

ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());

The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same.

It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%.

Hope it is interesting for someone.
Ready to explain some unobvious things.