When we do a stress test against Short-Circuit Local Reads, and found a bottleneck that allocating new DfsClientShm blocks a lot of slot allocatings on it.
Currently, there are 128 slots per shm which means at most, 128 reads could be blocked by the shm allocation. Especially when stressed, the domain socket communication to datanode gets slow, and datanode could also have GC, which could cause some hundreds ms to allocate 1 shm, in turn, the reads. This is bad for some latency sensitive service, like Hbase.
I'm working on the prototype and will upload the code and test result later.