[CASSANDRA-16663] Request-Based Native Transport Rate-Limiting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 4.1-alpha1, 4.1
Component/s: Messaging/Client
Labels:
None

Change Category:
Operability
Complexity:
Normal
Platform:

All
Impacts:

None
Source Control Link:

https://github.com/apache/cassandra/commit/d220d24994400d4342f5281f1a51514a6ae8c2fd
Test and Documentation Plan:

Hide

There are two new cassandra.yaml options in this patch:

native_transport_rate_limiting_enabled - Whether or not to apply a rate limit on CQL requests. (Default: false)

native_transport_max_requests_per_second - The limit itself. (Default: 1000000)

In terms of testing, there is a new suite of overload tests in RateLimitingTest, and this provides at least basic coverage along the major axes of message size, back-pressure vs. throw-on-overload, and behavior in the face of configuration changes at runtime (although part of that last bit is in ClientResourceLimitsTest).

As this moves into review, I've also used SimpleClientPerfTest across small/large messages and all 4 active protocol versions to make sure the limiter maintains throughput in a reasonably tight band around the configured limit. More stress testing, likely w/ tlp-stress and with multi-node clusters, will happen concurrently w/ review.

Finally, from an operational perspective, this is a coordinator-level limit, and assuming RF=N, any time a replica is unavailable, cluster-wide capacity to query a given partition (assuming a token-aware client) decreases by a factor of 1/N.

Show
There are two new cassandra.yaml options in this patch: native_transport_rate_limiting_enabled - Whether or not to apply a rate limit on CQL requests. (Default: false) native_transport_max_requests_per_second - The limit itself. (Default: 1000000) In terms of testing, there is a new suite of overload tests in RateLimitingTest , and this provides at least basic coverage along the major axes of message size, back-pressure vs. throw-on-overload, and behavior in the face of configuration changes at runtime (although part of that last bit is in ClientResourceLimitsTest ). As this moves into review, I've also used SimpleClientPerfTest across small/large messages and all 4 active protocol versions to make sure the limiter maintains throughput in a reasonably tight band around the configured limit. More stress testing, likely w/ tlp-stress and with multi-node clusters, will happen concurrently w/ review. Finally, from an operational perspective, this is a coordinator-level limit, and assuming RF=N, any time a replica is unavailable, cluster-wide capacity to query a given partition (assuming a token-aware client) decreases by a factor of 1/N.

Description

Together, ~~CASSANDRA-14855~~, ~~CASSANDRA-15013~~, and ~~CASSANDRA-15519~~ added support for a runtime-configurable, per-coordinator limit on the number of bytes allocated for concurrent requests over the native protocol. It supports channel back-pressure by default, and optionally supports throwing OverloadedException if that is requested in the relevant connection’s STARTUP message.

This can be an effective tool to prevent the coordinator from running out of memory, but it may not correspond to how expensive a queries are or provide a direct conceptual mapping to how users think about request capacity. I propose adding the option of request-based (or perhaps more correctly message-based) back-pressure, coexisting with (and reusing the logic that supports) the current bytes-based back-pressure.

We can roll this forward in phases, where the server’s cost accounting becomes more accurate, we segment limits by operation type/keyspace/etc., and the client/driver reacts more intelligently to (especially non-back-pressure) overload, but something minimally viable could look like this:

1.) Reuse most of the existing logic in Limits, et al. to support a simple per-coordinator limit only on native transport requests per second. Under this limit will be CQL reads and writes, but also auth requests, prepare requests, and batches. This is obviously simplistic, and it does not account for the variation in cost between individual queries, but even a fixed cost model should be useful in aggregate.

If the client specifies THROW_ON_OVERLOAD in its STARTUP message at connection time, a breach of the per-node limit will result in an OverloadedException being propagated to the client, and the server will discard the request.
If THROW_ON_OVERLOAD is not specified, the server will stop consuming messages from the channel/socket, which should back-pressure the client, while the message continues to be processed.

2.) This limit is infinite by default (or simply disabled), and can be enabled via the YAML config or JMX at runtime. (It might be cleaner to have a no-op rate limiter that's used when the feature is disabled entirely.)

3.) The current value of the limit is available via JMX, and metrics around coordinator operations/second are already available to compare against it.

4.) Any interaction with existing byte-based limits will intersect. (i.e. A breach of any limit, bytes or request-based, will actuate back-pressure or OverloadedExceptions.)

In this first pass, explicitly out of scope would be any work on the client/driver side.

In terms of validation/testing, our biggest concern with anything that adds overhead on a very hot path is performance. In particular, we want to fully understand how the client and server perform along two axes constituting 4 scenarios. Those are a.) whether or not we are breaching the request limit and b.) whether the server is throwing on overload at the behest of the client. Having said that, query execution should dwarf the cost of limit accounting.

Attachments

Issue Links

is related to

CASSANDRA-15299 CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

Resolved

relates to

CASSANDRA-16758 Flaky ClientResourceLimitsTest

Resolved

Activity

People

Assignee:: Caleb Rackliffe

Reporter:: Caleb Rackliffe

Authors:: Caleb Rackliffe

Reviewers:: Benedict Elliott Smith, Josh McKenzie

Tester:: Caleb Rackliffe

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/May/21 22:15

Updated:: 27/May/22 19:25

Resolved:: 19/Aug/21 15:58

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

15h