Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Issue:
As per one of the Cloudera system test, 2 Datanode are scheduled for decommission post data write and data pipeline close.
LEADER node has received the scheduled command for decommission as expected from the test, But the FOLLOWER never received the decommission.
Summary logs :
Follower
19:58:04,931 : persistedOpState: DECOMMISSIONING, the value stored in SCM (IN_SERVICE, 0) 19:58:10,016 : persistedOpState: IN_SERVICE, the value stored in SCM (DECOMMISSIONING, 0)
Leader: TimeOut
2023-07-20 19:38:31,689 : persistedOpState: IN_SERVICE, the value stored in SCM (DECOMMISSIONING, 0) ...... multiple retries ....... 2023-07-20 19:55:54,323 : persistedOpState: IN_SERVICE, the value stored in SCM (DECOMMISSIONING, 0) 2023-07-20 19:56:24,344 : persistedOpState: IN_SERVICE, the value stored in SCM (DECOMMISSIONING, 0) 2023-07-20 19:58:04,931 : persistedOpState: DECOMMISSIONING, the value stored in SCM (IN_SERVICE, 0)
Detailed logs :
FOLLOWER 2023-07-20 19:58:04,931 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: Update the operationalState saved in follower SCM for 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: 70976812254805668, persistedOpState: DECOMMISSIONING, persistedOpStateExpiryEpochSec: 0} as the reported value does not match the value stored in SCM (IN_SERVICE, 0) 2023-07-20 19:58:10,016 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: Update the operationalState saved in follower SCM for 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: 70976812254805668, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} as the reported value does not match the value stored in SCM (DECOMMISSIONING, 0) LEADER 2023-07-20 19:56:24,344 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: Scheduling a command to update the operationalState persisted on 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: 70976812254805668, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0} as the reported value does not match the value stored in SCM (DECOMMISSIONING, 0) 2023-07-20 19:58:04,931 INFO org.apache.hadoop.hdds.scm.node.SCMNodeManager: Scheduling a command to update the operationalState persisted on 33c95701-aaa5-4b08-a56b-70ac5d237187{ip: 172.27.12.66, host: quasar-zqlpfe-5.quasar-zqlpfe.root.hwx.site, ports: [REPLICATION=9886, RATIS=9858, RATIS_ADMIN=9857, RATIS_SERVER=9856, STANDALONE=9859], networkLocation: /default-rack, certSerialId: 70976812254805668, persistedOpState: DECOMMISSIONING, persistedOpStateExpiryEpochSec: 0} as the reported value does not match the value stored in SCM (IN_SERVICE, 0)
PFA SCM logs for more details
Attachments
Issue Links
- links to