Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14624

When decommissioning a node, log remaining blocks to replicate periodically

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.3.0, 3.1.4, 3.2.2
    • Component/s: namenode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When a node is marked for decommission, there is a monitor thread which runs every 30 seconds by default, and checks if the node still has pending blocks to be replicated before the node can complete replication.

      There are two existing debug level messages logged in the monitor thread, DatanodeAdminManager$Monitor.check(), which log the correct information already, first as the pending blocks are replicated:

      LOG.debug("Node {} still has {} blocks to replicate "
          + "before it is a candidate to finish {}.",
          dn, blocks.size(), dn.getAdminState());

      And then after the initial set of blocks has completed and a rescan happens:

      LOG.debug("Node {} {} healthy."
          + " It needs to replicate {} more blocks."
          + " {} is still in progress.", dn,
          isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());

      I would like to propose moving these messages to INFO level so it is easier to monitor decommission progress over time from the Namenode log.

      Based on the default settings, this would result in at most 1 log message per node being decommissioned every 30 seconds. The reason this is at the most, is because the monitor thread stops after checking after 500K blocks and therefore in practice it could be as little as 1 log message per 30 seconds, even if many DNs are being decommissioned at the same time.

      Note that the namenode webUI does display the above information, but having this in the NN logs would allow progress to be tracked more easily.

        Attachments

        1. HDFS-14624.001.patch
          1 kB
          Stephen O'Donnell
        2. HDFS-14624.002.patch
          2 kB
          Stephen O'Donnell
        3. HDFS-14624.003.patch
          2 kB
          Stephen O'Donnell

          Activity

            People

            • Assignee:
              sodonnell Stephen O'Donnell
              Reporter:
              sodonnell Stephen O'Donnell
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: