Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Suspicious logs seen while executing decommission on the datanode side:
[ozoneadmin@3d6bc06ffe3d logs]$ grep -nr "Received SCM close pipeline request" * ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021548:2021-09-07 06:56:48,927 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021550:2021-09-07 06:56:48,927 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=98792470-c118-4462-8978-e4edf9b38ba3 ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021757:2021-09-07 06:56:50,006 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 ozone-ozoneadmin-datanode-3d6bc06ffe3d.log.1:1021758:2021-09-07 06:56:50,007 [EndpointStateMachine task thread for /17.16.10.51:9861 - 0 ] DEBUG org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: Received SCM close pipeline request PipelineID=98792470-c118-4462-8978-e4edf9b38ba3
There are duplicate pipeline close commands received on a datanode. So it results in a succeeded pipeline close and a failed one.
I checked the log of scm and found that one is from the StartAdminOnNodeForStartDatanodeAdminHandler and one from PipelineReportForPipelineReportHandler.
Because the pipeline is already closed by decommission and the pipeline report is sent before it happens, so there is no such pipeline on the scm side, then scm delivers a second close pipeline command to datanode.
logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70775:2021-09-07 06:56:36,938 [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to datanode 62687ea7-7043-4b7c-889a-6a27b1586df9 logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70777:2021-09-07 06:56:36,938 [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to datanode 160bb7aa-dc9b-4817-9910-05fb20c7b2fc logs/ozone-ozoneadmin-scm-3d6bc06ffe3d.log:70779:2021-09-07 06:56:36,938 [EventQueue-StartAdminOnNodeForStartDatanodeAdminHandler] INFO org.apache.hadoop.hdds.scm.pipeline.RatisPipelineProvider: Send pipeline:PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 close command to datanode 9e5d6a30-9149-4d8b-9455-e48db254bfa5
2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] INFO org.apache.hadoop.hdds.scm.pipeline.PipelineReportHandler: Reported pipeline PipelineID=c21f0d3e-a62d-4d34-97a5-9e95b8fbf9f1 is not found 2021-09-07 06:56:48,925 [EventQueue-PipelineReportForPipelineReportHandler] DEBUG org.apache.hadoop.hdds.server.events.EventQueue: Delivering [event=Datanode_Command] to executor/handler DatanodeCommandForSCMNodeManager: CommandForDatanode
So we should check for possible duplicate close commands on the datanode side.
Attachments
Issue Links
- is duplicated by
-
HDDS-7784 ClosePipelineCommandHandler should not process pipeline close command if the pipeline is not present
- Resolved
- links to