Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-5726

Skip remove for already removed pipeline.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Suspicious logs seen while executing decommission on the datanode side:

       

      [ozoneadmin@3d6bc06ffe3d logs]$ grep -nr "Received SCM close pipeline request" *
      ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021548:2021-09-07 06:56:48,927 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
      ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021550:2021-09-07 06:56:48,927 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
      ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021757:2021-09-07 06:56:50,006 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1
      ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021758:2021-09-07 06:56:50,007 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
      

      There are duplicate pipeline close commands received on a datanode. So it results in a succeeded pipeline close and a failed one.

      I checked the log of scm and found that one is from the StartAdminOnNodeForStartDatanodeAdminHandler and one from PipelineReportForPipelineReportHandler.

      Because the pipeline is already closed by decommission and the pipeline report is sent before it happens, so there is no such pipeline on the scm side, then scm delivers a second close pipeline command to datanode.

       

       

      logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70775:2021-09-07 06:56:36,938 [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to datanode 62687ea7-7043-4b7c-889a-6a27b1586df9
      logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70777:2021-09-07 06:56:36,938 [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to datanode 160bb7aa-dc9b-4817-9910-05fb20c7b2fc
      logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70779:2021-09-07 06:56:36,938 [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to datanode 9e5d6a30-9149-4d8b-9455-e48db254bfa5
      

       

      2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] INFO org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported pipeline PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 is not found
      2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] DEBUG org.apache.hadoop.hdds.server.events.EventQueue: Delivering [event=Datanode_Command] to executor/handler DatanodeCommandForSCMNodeManager: CommandForDatanode
      

      So we should check for possible duplicate close commands on the datanode side.

       

       

       

       

       

       

       

       

       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            markgui Mark Gui
            markgui Mark Gui
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment