[HDDS-3067] Fix Bug in Scrub Pipeline causing destory pipelines after SCM restart - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: SCM
Labels:
- OMHATest
- pull-request-available

Target Version/s:

1.0.0

Description

Currently, the scrubber is run as part of create pipeline.

When SCM is started, scrubber is coming up and cleaning up all the containers in SCM. Because when loading pipelines, the pipelineCreationTimeStamp is set from when the pipeline is created.

Because of this, below condition is satisfied and destroying all the pipelines when SCM is restarted. This can be easily reproduced start SCM, wait for 10 minutes and restart SCM.

List<Pipeline> needToSrubPipelines = stateManager.getPipelines(type, factor,
 Pipeline.PipelineState.ALLOCATED).stream()
 .filter(p -> currentTime.toEpochMilli() - p.getCreationTimestamp()
 .toEpochMilli() >= pipelineScrubTimeoutInMills)
 .collect(Collectors.toList());
for (Pipeline p : needToSrubPipelines) {
 LOG.info("srubbing pipeline: id: " + p.getId().toString() +
 " since it stays at ALLOCATED stage for " +
 Duration.between(currentTime, p.getCreationTimestamp()).toMinutes() +
 " mins.");
 finalizeAndDestroyPipeline(p, false);
}

Log showing scrubbing of pipeline

2020-02-20 12:42:18,946 [RatisPipelineUtilsThread] INFO org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager: srubbing pipeline: id: PipelineID=35dff62d-9bfa-449b-b6e8-6f00cc8c1b6e since it stays at ALLOCATED stage for -1003 mins.

Attachments

Issue Links

fixes

HDDS-3004 OM HA stability issues

Resolved

links to

GitHub Pull Request #598

Activity

People

Assignee:: Bharat Viswanadham

Reporter:: Bharat Viswanadham

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Feb/20 23:50

Updated:: 02/Mar/20 21:06

Resolved:: 28/Feb/20 04:16

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m