Details
Description
ТотI want to propose how to improve reading performance HDFS-client. The idea: create few instances ShortCircuit caches instead of one.
The key points:
1. Create array of caches (set by clientShortCircuitNum=dfs.client.short.circuit.num, see in the pull requests below):
private ClientContext(String name, DfsClientConf conf, Configuration config) { ... shortCircuitCache = new ShortCircuitCache[this.clientShortCircuitNum]; for (int i = 0; i < this.clientShortCircuitNum; i++) { this.shortCircuitCache[i] = ShortCircuitCache.fromConf(scConf); }
2 Then divide blocks by caches:
public ShortCircuitCache getShortCircuitCache(long idx) { return shortCircuitCache[(int) (idx % clientShortCircuitNum)]; }
3. And how to call it:
ShortCircuitCache cache = clientContext.getShortCircuitCache(block.getBlockId());
The last number of offset evenly distributed from 0 to 9 - that's why all caches will full approximately the same.
It is good for performance. Below the attachment, it is load test reading HDFS via HBase where clientShortCircuitNum = 1 vs 3. We can see that performance grows ~30%, CPU usage about +15%.
Hope it is interesting for someone.
Ready to explain some unobvious things.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-15409 Optimization Strategy for choosing ShortCircuitCache
- Open
- relates to
-
HBASE-23887 New L1 cache : AdaptiveLRU
- Resolved
- links to