Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2540

Data reads by default

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Low
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The intention of digest vs data reads is to save bandwidth in the read path at the cost of latency, but I expect that this has been a premature optimization.

      • Data requested by a read will often be within an order of magnitude of the digest size, and a failed digest means extra roundtrips, more bandwidth
      • The digest reads but not your data read problem means failing QUORUM reads because a single node is unavailable, and would require eagerly re-requesting at some fraction of your timeout
      • Saving bandwidth in cross datacenter usecases comes at huge cost to latency, but since both constraints change proportionally (enough), the tradeoff is not clear

      Some options:

      1. Add an option to use digest reads
      2. Remove digest reads entirely (and/or punt and make them a runtime optimization based on data size in the future)
      3. Continue to use digest reads, but send them to N - R nodes for (somewhat) more predicatable behavior with QUORUM


      The outcome of data-reads-by-default should be significantly improved latency, with a moderate increase in bandwidth usage for large reads.

        Attachments

          Activity

            People

            • Assignee:
              scode Peter Schuller
              Reporter:
              stuhood Stu Hood
              Authors:
              Peter Schuller
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: