Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
NameNode takes global read lock when it process heartbeat RPCs from DataNodes. This increases lock contention and could impact NN overall throughput. Given Heartbeat processing needs to access data specific to the DataNode that invokes the RPC; it could just synchronize on the specific DataNode and datanodeMap.
It looks like each DatanodeDescriptor already keeps its own recover blocks, replication blocks and invalidate blocks. There are several places that needed to be changed to remove FSN lock.
As mentioned in other jiras, we need to some mechanism to reason about the correctness of the solution.
Thoughts?