Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-7759 Improve Ozone Replication Manager
  3. HDDS-8074

Improve synchronization around command queue updates in Node Manager

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SCM

    Description

      The total commands pending for a datanode is the sum of the commands on the NodeManager CommandQueue and the number of commands the DN reported it has in the previous heartbeat.

      As things stand, these two piece of information come from two different methods, each with their own locking, the result is potentially inconsistent.

      To allow a consistent view of the commands queued on a data, this PR:

      1. Adds a read write lock into the SCMNodeManager so it can lock around updates to the command queue, updating the DN queue count in heartbeat processing and querying the counts.

      2. Moves the CommandQueueReportProcessing from being asynchronous to being processed as part of the heartbeat in SCM. This avoids a problem were the command queue has been emptied, but the pending count has not been updated inside DatanodeInfo.

      3. In an earlier PR, a low priority flag was added to ReplicateContainer commands, so that the balancer can send commands with a lower priority. The DN does not report these low priority commands in its counts, so the command queue has been adjusted to not count them either.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: