[CASSANDRA-8801] Decommissioned nodes are willing to rejoin the cluster if restarted - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0 alpha 1
Component/s: None
Labels:
None

Severity:
Normal

Description

This issue comes from the Cassandra user group.

If a node which was successfully decommissioned gets restarted with its data directory in tact, it will rejoin the cluster immediately going to UN and beginning to serve client requests.

This is wrong - the node has consistency issues, having missed any writes while it was offline because no hinted handoffs were being kept. And in the best case scenario (it's spotted and remediated immediately), near-100% overstreaming will still occur.

Also, whatever reasons the operator had for decommissioning the node would presumably still be valid, so this action may threaten cluster stability if the node is underpowered or suffering hardware issues.

But what elevates this to critical is that if the node had been offline longer than gc_grace_seconds, it may cause permanent and unrecoverable consistency issues due to data resurrection.

Recommendation:

A node should remember that it was decommissioned and refuse to rejoin a cluster without at least a -Dflag forcing it to.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

8801.txt
17/Apr/15 14:33
4 kB
Brandon Williams
8801-v2.txt
29/Apr/15 15:45
4 kB
Carl Yeksigian

Issue Links

is related to

CASSANDRA-5780 nodetool status and ring report incorrect/stale information after decommission

Open

links to

dtest

Activity

People

Assignee:: Brandon Williams

Reporter:: Eric Stevens

Authors:: Brandon Williams

Reviewers:: Carl Yeksigian

Tester:: Jim Witschey

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 13/Feb/15 15:03

Updated:: 16/Apr/19 09:31

Resolved:: 19/May/15 15:47