[CASSANDRA-2253] Gossiper Starvation - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 0.7.3
Component/s: None
Labels:
None
Environment:

linux, windows

Severity:
Normal

Description

Gossiper periodic task will get into starvation in case large sstable files need to be deleted.
Indeed the SSTableDeletingReference uses the same scheduledTasks pool (from StorageService) as the Gossiper and other periodic tasks, but the gossiper tasks should run each second to assure correct cluster status (liveness of nodes). In case of large sstable files to be deleted (several GB) the delete operation can take more than 30 sec, thus making the whole cluster going into a wrong state where nodes are marked as not living while they are!
This will lead to unneeded additional load like hinted hand off, wrong cluster state, increase in latency.

One of the possible solution is to use a separate pool for periodic and non periodic tasks.
I've implemented such change and it resolves the problem.
I can provide a patch

Attachments

CASSANDRA-0.7-2253.txt
28/Feb/11 23:02
16 kB
Mikael Sitruk

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Mikael Sitruk Assign to me

Reporter:: Mikael Sitruk

Authors:: Mikael Sitruk

Reviewers:: Jonathan Ellis

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/Feb/11 16:03

Updated:: 16/Apr/19 09:33

Resolved:: 28/Feb/11 23:53

Time Tracking

Estimated:

Remaining:

Logged:

Gossiper Starvation

Details

Description

Attachments

Attachments

Activity

People

Dates

Time Tracking

Agile

Slack

Issue deployment