Details
-
Improvement
-
Status: Resolved
-
Low
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
The intention of digest vs data reads is to save bandwidth in the read path at the cost of latency, but I expect that this has been a premature optimization.
- Data requested by a read will often be within an order of magnitude of the digest size, and a failed digest means extra roundtrips, more bandwidth
- The digest reads but not your data read problem means failing QUORUM reads because a single node is unavailable, and would require eagerly re-requesting at some fraction of your timeout
- Saving bandwidth in cross datacenter usecases comes at huge cost to latency, but since both constraints change proportionally (enough), the tradeoff is not clear
Some options:
- Add an option to use digest reads
- Remove digest reads entirely (and/or punt and make them a runtime optimization based on data size in the future)
- Continue to use digest reads, but send them to N - R nodes for (somewhat) more predicatable behavior with QUORUM
The outcome of data-reads-by-default should be significantly improved latency, with a moderate increase in bandwidth usage for large reads.