[CASSANDRA-15774] Improve range reads to query by endpoints instead of vnodes to reduce number of remote requests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Normal
Resolution: Unresolved
Fix Version/s: 5.x
Component/s: Legacy/Coordination
Labels:
- performance
- read

Change Category:
Performance
Complexity:
Challenging
Platform:

All
Impacts:

None

Description

Currently, range read queries in batches, see StorageProxy.RangeCommandIterator#sendNextRequests(). For each batch, it computes a list of merged vnode ranges up to concurrency factor and query each merged vnode range asynchronously. (note: consecutive vnode ranges can be merged if they share enough replicas to satisfy consistency level requirement)

This works fine in general, but when concurrency factor is high because returned row count is small comparing to query limit or index filtering is used, coordinator may send too many concurrent remote range requests in a batch.

We can improve it by grouping remote range requests by endpoints where each endpoint will return response corresponding to multiple non-consecutive ranges. With endpoint grouping, number of remote range requests should largely reduced and it's always capped by number of nodes in the cluster instead of number of ranges which is capped by concurrency factor.

Let's look at an example on a 5-node cluster with 10 ranges(a,b,c,d,e,f,g,h,i,h) and rf3.

Following is the range to replica mapping using round robin that should work well with consecutive range merger (consecutive range merger doesn't work well with fully random replica mapping, because it's less likely to have overlapping replicas for consecutive ranges)

   range-a replicas: 1, 2, 3
   range-b replicas: 2, 3, 4
   range-c replicas: 3, 4, 5
   range-d replicas: 1, 4, 5
   range-e replicas: 1, 2, 5
   range-f replicas: 1, 2, 3
   range-g replicas: 2, 3, 4
   range-h replicas: 3, 4, 5
   range-i replicas: 1, 4, 5
   range-j replicas: 1, 2, 5

With default range read implementation and consecutive range merger, we need 10 replica read requests(2 for each merged range) for quorum:

     range (a,b] on node [2, 3]
     range (c,d] on node [4, 5]
     range (e,f] on node [1, 2]
     range (g,h] on node [3, 4]
     range (i,j] on node [1, 5]

With group query by endpoints, we only need 4 replica read requests for quorum:

    * node 1: a, d, e, f, i, j
    * node 2: a, b, e, f, g, j
    * node 3: b, c, g, h
    * node 4: c, d, h, i

Note that there are some complexities around short-read protection which needs to know whether replica has more rows available for current range.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Zhao Yang

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 29/Apr/20 16:28

Updated:: 07/Mar/23 10:53