Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3066

SCM startup failed during loading containers from DB

    XMLWordPrintableJSON

Details

    Description

       This is happening because pipeline scrubber came and removed pipeline, and it closed pipeline and removed from DB and triggered close containers to set them to CLOSING. When SCM is restarted before close container command is handled and change the state to CLOSING, the below issue can happen.

       

      This can happen in other scenarios like when safeModeHandler calls finalizeAndDestroyPipeline and do SCM restart. 

       

      The root cause for this is Pipeline removed from DB and the container is in open state in this scenario, and when trying to get pipeline we will crash SCM due to the PipelineNotFoundException error.

      {{}}

       2020-02-21 13:57:34,888 [main] ERROR org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SCM start failed with exception org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e not found at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:133) at org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.addContainerToPipeline(PipelineStateMap.java:110) at org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.addContainerToPipeline(PipelineStateManager.java:59) at org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.addContainerToPipeline(SCMPipelineManager.java:309) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.loadExistingContainers(SCMContainerManager.java:121) at org.apache.hadoop.hdds.scm.container.SCMContainerManager.<init>(SCMContainerManager.java:107) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:412) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:283) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:215) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:612) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:142) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:117) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:66) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:42) at picocli.CommandLine.execute(CommandLine.java:1173) at picocli.CommandLine.access$800(CommandLine.java:141) at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:55) 2020-02-21 13:57:34,892 [shutdown-hook-0] INFO org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down StorageContainerManager at om-ha-1.vpc.cloudera.com/10.65.51.49 ************************************************************/

      {{}}

      Attachments

        Issue Links

          Activity

            People

              bharat Bharat Viswanadham
              bharat Bharat Viswanadham
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m