Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10951

Container is stuck in CLOSING state for more than 12 hours on getting ICR of UNHEALTHY replica

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • SCM
    • None

    Description

      Steps :

      • Create vol/buck/key
      • Simulate unhealthy replica in the container of above key
      • Check for container to close

      Expected behaviour - Container should be closed soon after it receives ICR of UNHEALTHY replica

      Actual behaviour - Container is stuck in CLOSING state for more than 12 hours after receiving ICR

      Container close initiated at -

      2023-10-26 19:56:08,079 INFO [FixedThreadPoolWithAffinityExecutor-1-0]-org.apache.hadoop.hdds.scm.container.IncrementalContainerReportHandler: Moving OPEN container #18002 to CLOSING state, datanode f2a6be07-db06-430b-8311-534247744f99(quasar-yzwbdi-8.quasar-yzwbdi.root.hwx.site/172.27.112.2) reported UNHEALTHY replica with index 0. 

      Current state of container -

      root@quasar-yzwbdi-1:~# ozone admin container info 18002
      Container id: 18002
      Pipeline id: 34771df9-8ba5-4a3e-9e48-abb590e67ea2
      Container State: CLOSING
      Datanodes: [f2a6be07-db06-430b-8311-534247744f99/quasar-yzwbdi-8.quasar-yzwbdi.root.hwx.site,
      baa35af1-7b51-4275-b465-f750c429c618/quasar-yzwbdi-5.quasar-yzwbdi.root.hwx.site,
      f40aed3a-dddf-4f2b-a30f-035136bfceba/quasar-yzwbdi-4.quasar-yzwbdi.root.hwx.site]
      Replicas: [State: CLOSING; ReplicaIndex: 0; Origin: f40aed3a-dddf-4f2b-a30f-035136bfceba; Location: f40aed3a-dddf-4f2b-a30f-035136bfceba/quasar-yzwbdi-4.quasar-yzwbdi.root.hwx.site,
      State: UNHEALTHY; ReplicaIndex: 0; Origin: f2a6be07-db06-430b-8311-534247744f99; Location: f2a6be07-db06-430b-8311-534247744f99/quasar-yzwbdi-8.quasar-yzwbdi.root.hwx.site,
      State: CLOSING; ReplicaIndex: 0; Origin: baa35af1-7b51-4275-b465-f750c429c618; Location: baa35af1-7b51-4275-b465-f750c429c618/quasar-yzwbdi-5.quasar-yzwbdi.root.hwx.site]
      root@quasar-yzwbdi-1:~# date
      Fri 27 Oct 2023 04:57:29 AM UTC 

      Attachments

        Activity

          People

            Unassigned Unassigned
            jyosin Jyotirmoy Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: