Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-2540

Data reads by default

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Won't Fix
    • None
    • None
    • None

    Description

      The intention of digest vs data reads is to save bandwidth in the read path at the cost of latency, but I expect that this has been a premature optimization.

      • Data requested by a read will often be within an order of magnitude of the digest size, and a failed digest means extra roundtrips, more bandwidth
      • The digest reads but not your data read problem means failing QUORUM reads because a single node is unavailable, and would require eagerly re-requesting at some fraction of your timeout
      • Saving bandwidth in cross datacenter usecases comes at huge cost to latency, but since both constraints change proportionally (enough), the tradeoff is not clear

      Some options:

      1. Add an option to use digest reads
      2. Remove digest reads entirely (and/or punt and make them a runtime optimization based on data size in the future)
      3. Continue to use digest reads, but send them to N - R nodes for (somewhat) more predicatable behavior with QUORUM


      The outcome of data-reads-by-default should be significantly improved latency, with a moderate increase in bandwidth usage for large reads.

      Attachments

        Activity

          People

            scode Peter Schuller
            stuhood Stu Hood
            Peter Schuller
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: