Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The total commands pending for a datanode is the sum of the commands on the NodeManager CommandQueue and the number of commands the DN reported it has in the previous heartbeat.
As things stand, these two piece of information come from two different methods, each with their own locking, the result is potentially inconsistent.
To allow a consistent view of the commands queued on a data, this PR:
1. Adds a read write lock into the SCMNodeManager so it can lock around updates to the command queue, updating the DN queue count in heartbeat processing and querying the counts.
2. Moves the CommandQueueReportProcessing from being asynchronous to being processed as part of the heartbeat in SCM. This avoids a problem were the command queue has been emptied, but the pending count has not been updated inside DatanodeInfo.
3. In an earlier PR, a low priority flag was added to ReplicateContainer commands, so that the balancer can send commands with a lower priority. The DN does not report these low priority commands in its counts, so the command queue has been adjusted to not count them either.
Attachments
Issue Links
- links to