Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-4650

RangeStreamer should be smarter when picking endpoints for streaming in case of N >=3 in each DC.

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 4.0-alpha1, 4.0
    • None

    Description

      getRangeFetchMap method in RangeStreamer should pick unique nodes to stream data from when number of replicas in each DC is three or more.
      When N>=3 in a DC, there are two options for streaming a range. Consider an example of 4 nodes in one datacenter and replication factor of 3.
      If a node goes down, it needs to recover 3 ranges of data. With current code, two nodes could get selected as it orders the node by proximity.
      We ideally will want to select 3 nodes for streaming the data. We can do this by selecting unique nodes for each range.

      Advantages:
      This will increase the performance of bootstrapping a node and will also put less pressure on nodes serving the data.

      Note: This does not affect if N < 3 in each DC as then it streams data from only 2 nodes.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kohlisankalp Sankalp Kohli Assign to me
            kohlisankalp Sankalp Kohli
            Sankalp Kohli
            Marcus Eriksson
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment