Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10758

Reduce verbosity of SCM replication manager logs when no nodes are available

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.4.0
    • None
    • SCM
    • None

    Description

      If there is an under-replicated EC container, but no nodes available to service reconstruction, the leader SCM will log the following for each such container (1030 in this case) on each replication manager run:

      2024-04-09 00:33:35,408 WARN org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler: Exception while processing for creating the EC reconstruction container commands for #1030.
      org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to allocate container.
      	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
      	at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
      	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
      	at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
      	at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
      	at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
      	at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
      	at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
      	at org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      2024-04-09 00:33:35,408 ERROR org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor: Error processing under replicated container ContainerInfo{id=#1030, state=CLOSED, pipelineID=PipelineID=acb9c258-1dfe-46a3-b317-e2231b6acffb, stateEnterTime=2024-04-08T23:55:49.554Z, owner=om2}
      org.apache.hadoop.hdds.scm.exceptions.SCMException: No healthy node found to allocate container.
      	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodesInternal(SCMCommonPlacementPolicy.java:184)
      	at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRandom.chooseDatanodesInternal(SCMContainerPlacementRandom.java:78)
      	at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:148)
      	at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:266)
      	at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:155)
      	at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:372)
      	at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
      	at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
      	at org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:102)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      

      This is two stack traces for one error, and can quickly roll off the leader's logs. We should remove the stack traces and reduce this to one log message per container per replication manager run.

      Attachments

        Activity

          People

            Unassigned Unassigned
            erose Ethan Rose
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: