Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-11085

EC key get/put fails with "INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • EC
    • None

    Description

      It is mostly occurring after Ozone restart is done, and even after we wait for
      ozone.client.exclude.nodes.expiry.time. 
      The test is trying to execute Key put using freon command:

      2024-06-28 14:35:13,521|INFO|MainThread|machine.py:190 - run()||GUID=afc7da77-87f2-45ee-ac8f-eef1ef0491ae|RUNNING: ssh -l root -i /tmp/hw-qe-keypair.pem -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null ccycloud-4.quasar-dyrwcg.xyz "export KRB5CCNAME=/hwqe/hadoopqe/artifacts/kerberosTickets/hrt_qa.kerberos.ticket ;/opt/cloudera/parcels/CDH/bin/ozone freon ozone-client-key-generator --om-service-id=ozone1719563413 --volume vol-test-workload-om-decommission-recommission-1719585300 --bucket buck-test-workload-om-decommission-recommission-1719585300 --size 1048576 --number-of-tests 4265 --threads 20 --prefix dir-cvotw" 

      And then fails with error "OMException: Pipeline limit (15) reached (15), none closed":

      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S F F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12011 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12011 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 1, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 1, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15009 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15009 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 4e116cb8-c6d6-4aa2-800d-1f6293b7afa6(ccycloud-1.quasar-dyrwcg.xyz/10.140.117.67)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15009 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15009 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 4e116cb8-c6d6-4aa2-800d-1f6293b7afa6(ccycloud-1.quasar-dyrwcg.xyz/10.140.117.67)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15012 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15012 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15013 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15013 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15009 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15009 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S F F S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 4e116cb8-c6d6-4aa2-800d-1f6293b7afa6(ccycloud-1.quasar-dyrwcg.xyz/10.140.117.67)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15004 creation failed
      	... 6 more
      24/06/28 14:35:21 WARN io.KeyOutputStream: EC stripe write failed: F S S S S
      24/06/28 14:35:21 WARN io.KeyOutputStream: Failure for replica index: 1, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12011 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12011 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12008 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12008 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 14
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12011 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12011 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15013 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15013 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12008 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 12008 creation failed
      	... 6 more
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 1
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 12
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 9
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 15
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 7
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 16
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 8
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 WARN io.KeyOutputStream: EC stripe write failed: F S S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 1, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15010 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 4e116cb8-c6d6-4aa2-800d-1f6293b7afa6(ccycloud-1.quasar-dyrwcg.xyz/10.140.117.67)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15012 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15012 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S S F S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 3, DatanodeDetails: 7781c75b-9572-4be7-b1e9-c2a84c6025a5(ccycloud-2.quasar-dyrwcg.xyz/10.140.137.72)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15013 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15013 creation failed
      	... 6 more
      24/06/28 14:35:22 WARN io.KeyOutputStream: Put block failed: S F S S S
      24/06/28 14:35:22 WARN io.KeyOutputStream: Failure for replica index: 2, DatanodeDetails: 1bce9190-f05b-4d44-bcf1-f99112ae4007(ccycloud-6.quasar-dyrwcg.xyz/10.140.217.198)
      java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      	... 6 more
      24/06/28 14:35:22 ERROR freon.BaseFreonGenerator: Error on executing task 24
      INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Pipeline limit (15) reached (15), none closed
      24/06/28 14:35:22 INFO freon.ProgressBar: Progress: 0.61 % (26 out of 4265)
      One ore more freon test is failed.
      24/06/28 14:35:22 INFO metrics: type=TIMER, name=key-create, count=26, min=198.928607, max=4079.209519, mean=3127.3231251034435, stddev=1512.01082938794, median=4007.51179, p75=4035.089922, p95=4060.004277, p98=4079.209519, p99=4079.209519, p999=4079.209519, mean_rate=5.632678197470965, m1=0.0, m5=0.0, m15=0.0, rate_unit=events/second, duration_unit=milliseconds
      24/06/28 14:35:22 INFO freon.BaseFreonGenerator: Total execution time (sec): 6
      24/06/28 14:35:22 INFO freon.BaseFreonGenerator: Failures: 9
      24/06/28 14:35:22 INFO freon.BaseFreonGenerator: Successful executions: 17 
      

      Earlier also it used to fail with same/similar issue with pipeline limit 8, But then I increased ozone.scm.ec.pipeline.minimum to 15, but it still seems to be failing with above error:

      E             cp: Pipeline limit (8) reached (8), none closed
      E             24/06/28 02:19:48 WARN io.KeyOutputStream: EC stripe write failed: S S S F F
      E             24/06/28 02:19:48 WARN io.KeyOutputStream: Failure for replica index: 4, DatanodeDetails: c0f6de55-6c53-4fd6-862b-980d4873ef9c(ccycloud-4.quasar-vwsgfa.xyz/10.140.92.69)
      E             java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      E             Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      E             	... 6 more
      E             24/06/28 02:19:48 WARN io.KeyOutputStream: Failure for replica index: 5, DatanodeDetails: 77c1e12a-b6c6-49bb-8c35-340f31630c59(ccycloud-5.quasar-vwsgfa.xyz/10.140.140.16)
      E             java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      E             Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15014 creation failed
      E             	... 6 more
      E             24/06/28 02:19:48 WARN io.KeyOutputStream: EC stripe write failed: F S S F S
      E             24/06/28 02:19:48 WARN io.KeyOutputStream: Failure for replica index: 1, DatanodeDetails: a2c7ee1b-04bd-4899-b9dd-01cd5716b026(ccycloud-2.quasar-vwsgfa.xyz/10.140.55.0)
      E             java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15015 creation failed
      E             Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15015 creation failed
      E             	... 6 more
      E             24/06/28 02:19:48 WARN io.KeyOutputStream: Failure for replica index: 4, DatanodeDetails: c0f6de55-6c53-4fd6-862b-980d4873ef9c(ccycloud-4.quasar-vwsgfa.xyz/10.140.92.69)
      E             java.io.IOException: Unexpected Storage Container Exception: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15015 creation failed
      E             Caused by: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: ContainerID 15015 creation failed
      E             	... 6 more
      E             cp: Pipeline limit (8) reached (8), none closed 

      Attachments

        Activity

          People

            Unassigned Unassigned
            pratyush.bhatt Pratyush Bhatt
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: