Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-18515

Optimize Initial Concurrency Selection for Range Read Algorithm During SAI Queries

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 5.x
    • Feature/2i Index
    • None

    Description

      The range read algorithm relies on the Index API’s notion of estimated result rows to decide how many replicas to contact in parallel during its first round of requests. The more results expected from a replica for a token range, the fewer replicas the range read will initially try to contact. Like SASI, SAI floors that estimate to a huge negative number to make sure it’s selected over other indexes, and this floors the concurrency factor to 1. The actual formula looks like this:

      // resultsPerRange, from SAI, is a giant negative number
      concurrencyFactor = Math.max(1, Math.min(ranges.rangeCount(), (int) Math.ceil(command.limits().count() / resultsPerRange)));
      

      Although that concurrency factor is updated as actual results stream in, only sending a single range request to a single replica in every case for SAI is not ideal. For example, assume I have a 3 node cluster and a keyspace at RF=1, with 10 rows spread across the 3 nodes, without vnodes. Issuing a query that matches all 10 rows with a LIMIT of 10 will make 2 or 3 serial range requests from the coordinator, one to each of the 3 nodes.

      This can be fixed by allowing indexes to bypass the initial concurrency calculation allowing SAI queries to contact the entire ring in a single round of queries, or at worst the minimum number of rounds as bounded by the existing statutory maximum ranges per round.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mike_tr_adamson Mike Adamson Assign to me
            mike_tr_adamson Mike Adamson
            Mike Adamson
            Andres de la Peña, Berenguer Blasi, Caleb Rackliffe
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 10m
                4h 10m

                Slack

                  Issue deployment