Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2253

Gossiper Starvation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 0.7.3
    • None
    • None
    • linux, windows

    • Normal

    Description

      Gossiper periodic task will get into starvation in case large sstable files need to be deleted.
      Indeed the SSTableDeletingReference uses the same scheduledTasks pool (from StorageService) as the Gossiper and other periodic tasks, but the gossiper tasks should run each second to assure correct cluster status (liveness of nodes). In case of large sstable files to be deleted (several GB) the delete operation can take more than 30 sec, thus making the whole cluster going into a wrong state where nodes are marked as not living while they are!
      This will lead to unneeded additional load like hinted hand off, wrong cluster state, increase in latency.

      One of the possible solution is to use a separate pool for periodic and non periodic tasks.
      I've implemented such change and it resolves the problem.
      I can provide a patch

      Attachments

        1. CASSANDRA-0.7-2253.txt
          16 kB
          Mikael Sitruk

        Activity

          People

            mikaels Mikael Sitruk
            mikaels Mikael Sitruk
            Mikael Sitruk
            Jonathan Ellis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified