[SOLR-11458] Bugs in MoveReplicaCmd handling of failures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 7.0, 7.0.1, 7.1, 8.0
Fix Version/s: 7.2, 8.0
Component/s: None
Labels:
None

Description

There's a section of code in moveNormalReplica that ensures that we don't lose a shard leader during move. There's no corresponding protection in moveHdfsReplica, which means that moving a replica that is also a shard leader may potentially lead to data loss (eg. when replicationFactor=1).
Also, there's no rollback strategy when moveHdfsReplica partially fails, unlike in moveNormalReplica where the code simply skips deleting the original replica - it seems that the code should attempt to restore the original replica in this case? When RF=1 and such failure occurs then not restoring the original replica means lost shard.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-11458.diff
05/Dec/17 16:55
45 kB
Andrzej Bialecki
SOLR-11458.diff
05/Dec/17 17:44
46 kB
Andrzej Bialecki

Issue Links

is blocked by

SOLR-11661 New HDFS collection reuses unremoved data from a deleted HDFS collection with same name causes inconsistent view of documents

Closed

Activity

People

Assignee:: Andrzej Bialecki

Reporter:: Andrzej Bialecki

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Oct/17 10:49

Updated:: 08/Jun/19 15:13

Resolved:: 06/Dec/17 14:34