Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6462 Phase II : Erasure Coding Offline Recovery & Read/Write Improvements
  3. HDDS-7081

EC: ReplicationManager - UnderRep handler should handle duplicate indexes

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0
    • SCM

    Description

      When there are two indexes online with the same index, eg due to decommission, over-replication or maintenance, and the container is under replicated due to another missing index, an illegal argument exception can be thrown when collecting the source indexes:

      2022-08-02 10:54:26,939 WARN org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler: Exception while processing for creating the EC reconstruction container commands for #2.
      java.lang.IllegalStateException: Duplicate key 3 (attempted merging values ContainerReplica{containerID=#2, state=CLOSED, datanodeDetails=fb63a3c8-2e5b-432e-be63-274c41aab79f{ip: 172.27.124.131, host: quasar-onjdpu-5.quasar-onjdpu.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, placeOfBirth=2ff0ed60-a461-41a4-8fff-9da6cb4e52ad, sequenceId=0, keyCount=1, bytesUsed=34603008,replicaIndex= 3} and ContainerReplica{containerID=#2, state=CLOSED, datanodeDetails=2ff0ed60-a461-41a4-8fff-9da6cb4e52ad{ip: 172.27.193.4, host: quasar-onjdpu-3.quasar-onjdpu.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default, certSerialId: null, persistedOpState: DECOMMISSIONING, persistedOpStateExpiryEpochSec: 0}, placeOfBirth=2ff0ed60-a461-41a4-8fff-9da6cb4e52ad, sequenceId=0, keyCount=1, bytesUsed=34603008,replicaIndex= 3})
              at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
              at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
              at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
              at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
              at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
              at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
              at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1603)
              at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
              at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
              at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
              at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
              at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
              at org.apache.hadoop.hdds.scm.container.replication.ECUnderReplicationHandler.processAndCreateCommands(ECUnderReplicationHandler.java:151)
              at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager.processUnderReplicatedContainer(ReplicationManager.java:366)
              at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processContainer(UnderReplicatedProcessor.java:92)
              at org.apache.hadoop.hdds.scm.container.replication.UnderReplicatedProcessor.processAll(UnderReplicatedProcessor.java:76)
              at org.apache.hadoop.hdds.scm.ha.BackgroundSCMService.run(BackgroundSCMService.java:101)
              at java.base/java.lang.Thread.run(Thread.java:834)
      

      This then goes unhandled and causes the under rep processing thread to exit.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              nilotpalnandi Nilotpal Nandi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: