[CASSANDRA-13009] Speculative retry bugs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 2.2.9, 3.0.11
Component/s: None
Labels:
None

Severity:
Normal
Since Version:

2.2.0

Description

There are a few issues with speculative retry:

1. Time unit bugs. These are from ColumnFamilyStore (v3.0.10):

The left hand side is in nanos, as the name suggests, while the right hand side is in millis.

sampleLatencyNanos = DatabaseDescriptor.getReadRpcTimeout() / 2;

Here coordinatorReadLatency is already in nanos and we shouldn't multiple the value by 1000. This was a regression in 8896a70 when we switch metrics library and the two libraries use different time units.

sampleLatencyNanos = (long) (metric.coordinatorReadLatency.getSnapshot().getValue(retryPolicy.threshold()) * 1000d);

2. Confusing overload protection and retry delay. As the name "sampleLatencyNanos" suggests, it should be used to keep the actually sampled read latency. However, we assign it the retry threshold in the case of CUSTOM. Then we compare the retry threshold with read timeout (defaults to 5000ms). This means, if we use speculative_retry=10ms for the table, we won't be able to avoid being overloaded. We should compare the actual read latency with the read timeout for overload protection. See line 450 of ColumnFamilyStore.java and line 279 of AbstractReadExecutor.java.

My proposals are:
a. We use sampled p99 delay and compare it with a customizable threshold (-Dcassandra.overload.threshold) for overload detection.
b. Introduce another variable retryDelayNanos for waiting time before retry. This is the value from table setting (PERCENTILE or CUSTOM).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CASSANDRA-13009-v1.patch
07/Dec/16 01:33
7 kB
Simon Zhou
CASSANDRA-13009-v2.patch
02/Feb/17 21:30
2 kB
Simon Zhou

Activity

People

Assignee:: Simon Zhou

Reporter:: Simon Zhou

Authors:: Simon Zhou

Reviewers:: T Jake Luciani

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 07/Dec/16 01:19

Updated:: 16/Apr/19 09:30

Resolved:: 03/Feb/17 20:21