Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6462 Phase II : Erasure Coding Offline Recovery & Read/Write Improvements
  3. HDDS-8831

UnsupportedOperationException when there are more replication tasks than limit

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.4.0
    • SCM

    Description

      There is an UnsupportedOperationException when there are more re-replication tasks than the hdds.scm.replication.datanode.replication.limit value. 

      In case of EC reconstruction tasks if the hdds.scm.replication.datanode.replication.limit is set to 2 then the reconstuction never completes and the container remains under-replicated if few of the DNs are down. (This is because the reconstruction weight of EC is 3 which is higher than the limit 2)

      In case of RATIS, or if the limit is 3 or more in case of EC, the replication tasks complete without issues in the subsequent iterations of the processor.

       

      2023-06-12 11:28:37,912 [Under Replicated Processor] ERROR org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor: Error processing Health result of class: class org.apache.hadoop.hdds.scm.container.replication.ContainerHealthResult$UnderReplicatedHealthResult for container ContainerInfo{id=#2003, state=CLOSED, stateEnterTime=2023-06-12T11:01:28.843Z, pipelineID=PipelineID=e8fa71c9-7f9a-4b6f-a4ca-8cb01d78a646, owner=om2}
      java.lang.UnsupportedOperationException
          at com.google.common.collect.ImmutableList.set(ImmutableList.java:528)
          at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.validateDatanodes(SCMCommonPlacementPolicy.java:162)
          at org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy.chooseDatanodes(SCMCommonPlacementPolicy.java:208)
          at org.apache.hadoop.hdds.scm.container.replication.ReplicationManagerUtil.getTargetDatanodes(ReplicationManagerUtil.java:83)
          at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.getTargetDatanodes(ECUnderReplicationHandler.java:396)
          at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processMissingIndexes(ECUnderReplicationHandler.java:307)
          at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndSendCommands(ECUnderReplicationHandler.java:161)
          at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:769)
          at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.sendDatanodeCommands(UnderReplicatedProcessor.java:58)
          at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.sendDatanodeCommands(UnderReplicatedProcessor.java:27)
          at org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.processContainer(UnhealthyReplicationProcessor.java:148)
          at org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.processAll(UnhealthyReplicationProcessor.java:115)
          at org.apache.hadoop.hdds.scm.container.replication.UnhealthyReplicationProcessor.run(UnhealthyReplicationProcessor.java:157)
          at java.base/java.lang.Thread.run(Thread.java:834)

      Attachments

        Issue Links

          Activity

            People

              adoroszlai Attila Doroszlai
              varsha.ravi Varsha Ravi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: