[CASSANDRA-15907] Operational Improvements & Hardening for Replica Filtering Protection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0.22, 3.11.8, 4.0-beta2, 4.0
Component/s: Consistency/Coordination, Feature/2i Index
Labels:
- 2i
- memory

Change Category:
Operability
Complexity:
Normal
Platform:

All
Impacts:

Docs
Source Control Link:

https://github.com/apache/cassandra/commit/e5c3d08a1428d378b6690f0419a2b25724b9736e
Test and Documentation Plan:

Hide

The first line of defense against regression here is the set of dtests built for ~~CASSANDRA-8272~~ in replica_side_filtering. In addition to that, we'll need at minimum a basic battery of in-JVM dtests around the new guardrails.

Once the implementation is reviewed, we'll use the tlp-stress filtering workload to stress things a bit, both to see how things behave with larger sets of query results when filtering protection isn't activated, and to see how the thresholds work when we have severely out-of-sync replicas.

In terms of documentation, we have two new cassandra.yaml options, which have what should be reasonable documentation inline.

There is also a new metric in org.apache.cassandra.metrics:
type=Table
keyspace=<your keyspace>
scope=<your table name>
name=ReplicaFilteringProtectionRowsCachedPerQuery

This is a histogram of the number of rows cached per query that activates RFP.

Also, the existing metric, ReplicaSideFilteringProtectionRequests, is now called ReplicaFilteringProtectionRequests.

Show
The first line of defense against regression here is the set of dtests built for CASSANDRA-8272 in replica_side_filtering . In addition to that, we'll need at minimum a basic battery of in-JVM dtests around the new guardrails. Once the implementation is reviewed, we'll use the tlp-stress filtering workload to stress things a bit, both to see how things behave with larger sets of query results when filtering protection isn't activated, and to see how the thresholds work when we have severely out-of-sync replicas. In terms of documentation, we have two new cassandra.yaml options , which have what should be reasonable documentation inline. There is also a new metric in org.apache.cassandra.metrics : type=Table keyspace=<your keyspace> scope=<your table name> name=ReplicaFilteringProtectionRowsCachedPerQuery This is a histogram of the number of rows cached per query that activates RFP. Also, the existing metric, ReplicaSideFilteringProtectionRequests , is now called ReplicaFilteringProtectionRequests .

Description

~~CASSANDRA-8272~~ uses additional space on the heap to ensure correctness for 2i and filtering queries at consistency levels above ONE/LOCAL_ONE. There are a few things we should follow up on, however, to make life a bit easier for operators and generally de-risk usage:

(Note: Line numbers are based on trunk as of 3cfe3c9f0dcf8ca8b25ad111800a21725bf152cb.)

Minor Optimizations

ReplicaFilteringProtection:114 - Given we size them up-front, we may be able to use simple arrays instead of lists for rowsToFetch and originalPartitions. Alternatively (or also), we may be able to null out references in these two collections more aggressively. (ex. Using ArrayList#set() instead of get() in queryProtectedPartitions(), assuming we pass toFetch as an argument to querySourceOnKey().)
ReplicaFilteringProtection:323 - We may be able to use EncodingStats.merge() and remove the custom stats() method.
DataResolver:111 & 228 - Cache an instance of UnaryOperator#identity() instead of creating one on the fly.
ReplicaFilteringProtection:217 - We may be able to scatter/gather rather than serially querying every row that needs to be completed. This isn't a clear win perhaps, given it targets the latency of single queries and adds some complexity. (Certainly a decent candidate to kick even out of this issue.)

Documentation and Intelligibility

There are a few places (CHANGES.txt, tracing output in ReplicaFilteringProtection, etc.) where we mention "replica-side filtering protection" (which makes it seem like the coordinator doesn't filter) rather than "replica filtering protection" (which sounds more like what we actually do, which is protect ourselves against incorrect replica filtering results). It's a minor fix, but would avoid confusion.
The method call chain in DataResolver might be a bit simpler if we put the repairedDataTracker in ResolveContext.

Testing

I want to bite the bullet and get some basic tests for RFP (including any guardrails we might add here) onto the in-JVM dtest framework.

Guardrails

As it stands, we don't have a way to enforce an upper bound on the memory usage of ReplicaFilteringProtection which caches row responses from the first round of requests. (Remember, these are later used to merged with the second round of results to complete the data for filtering.) Operators will likely need a way to protect themselves, i.e. simply fail queries if they hit a particular threshold rather than GC nodes into oblivion. (Having control over limits and page sizes doesn't quite get us there, because stale results expand the number of incomplete results we must cache.) The fun question is how we do this, with the primary axes being scope (per-query, global, etc.) and granularity (per-partition, per-row, per-cell, actual heap usage, etc.). My starting disposition on the right trade-off between performance/complexity and accuracy is having something along the lines of cached rows per query. Prior art suggests this probably makes sense alongside things like tombstone_failure_threshold in cassandra.yaml.

Attachments

Issue Links

is related to

CASSANDRA-8272 2ndary indexes can return stale data

Resolved

CASSANDRA-8273 Allow filtering queries can return stale data

Resolved

CASSANDRA-15920 SimpleQueryResult should contain client warnings

Resolved

relates to

CASSANDRA-15910 Add multi-partition key commands

Triage Needed

links to

3.0 PR

3.11 PR

trunk PR

(2 links to)

Activity

People

Assignee:: Caleb Rackliffe

Reporter:: Caleb Rackliffe

Authors:: Caleb Rackliffe

Reviewers:: Andres de la Peña, Jordan West

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 29/Jun/20 17:16

Updated:: 16/Mar/22 13:56

Resolved:: 30/Jul/20 13:15

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

9h 20m