[CASSANDRA-18555] Expose decommission state to nodetool info - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 5.0-alpha1, 5.0
Component/s: Observability/JMX
Labels:
None

Change Category:
Operability
Complexity:
Normal
Platform:

All
Impacts:

None
Source Control Link:

https://github.com/apache/cassandra/commit/e2a6c99310aa93ba3506ca8f603ae1039372f533
Test and Documentation Plan:

Hide

added jvm dtest + ci

Show
added jvm dtest + ci

Description

Currently, when a node is being decommissioned and if any failure happens, then an exception is thrown back to the caller.

But Cassandra's decommission takes considerable time ranging from minutes to hours to days. There are various scenarios in that the caller may need to probe the status again:

The caller times out
It is not possible to keep the caller hanging for such a long time

And If the caller does not know what happened internally, then it cannot retry, etc., leading to other issues.

So, in this ticket, I am going to add a new nodetool/JMX command that can be invoked by the caller anytime, and it will return the correct status.

It might look like a smaller change, but when we need to operate Cassandra at scale in a large-scale fleet, then this becomes a bottleneck and require constant operator intervention.

Attachments

Issue Links

relates to

CASSANDRA-18749 Expose bootstrap failure state via JMX

Resolved

links to

GitHub Pull Request #2390

Activity

People

Assignee:: Jaydeepkumar Chovatia

Reporter:: Jaydeepkumar Chovatia

Authors:: Jaydeepkumar Chovatia, Stefan Miklosovic

Reviewers:: Brandon Williams, Maxwell Guo

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 27/May/23 04:48

Updated:: 16/Nov/23 09:57

Resolved:: 20/Jun/23 20:40

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: