Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Performance
-
Normal
-
All
-
None
Description
Before the completion of CASSANDRA-16226, upgrading a cluster from 2.1 to 3.0 with compact tables could cause a significant regression in the latency of reads using ClusteringIndexNamesFilter. The details are described in that Jira, but in short, 3.0+ did not skip SSTables it should have during reads, because it thought (wrongly) there might be primary key liveness information in SSTables for compact tables.
CASSANDRA-16226 addressed this behavior for still-compact tables, and also maintained it after DROP COMPACT STORAGE was run. However, it also allowed tables that were never compact to drop rows from query results if they contained no live non-key columns, which is only a normal behavior for compact tables. This is addressed in CASSANDRA-16671 by reverting the bits of the logic from CASSANDRA-16226 that deal with formerly compact tables where DROP COMPACT STORAGE has been run, in the interest of unblocking the 4.0 release and making sure strictly compact and strictly non-compact tables are queried properly and construct properly formed results.
This goal of this issue is to safely restore the performance of formerly compact tables, which necessarily contain ambiguous primary key liveness info. Roughly, the idea is that we record in a system table (and pull into TableMetadata) the time when DROP COMPACT STORAGE is executed. If a time exists for a table, we can treat it as being formerly compact, and ignore primary key liveness info for determining row completeness in SinglePartitionReadCommand#isComplete(). Otherwise, the normal rules for never-compact tables will apply, avoiding any regression in the scenario described by CASSANDRA-16671.
This would obviously not be helpful in the case where a user has already dropped compact storage, but it may logically be the best we can do, given we cannot correctly reconstruct liveness info for SSTables created while a table was compact (i.e. there is no way to tell INSERT and UPDATE apart for those). Especially if CASSANDRA-16671 moves in the direction of disabling DROP COMPACT STORAGE by default, I would also propose that we do this only for 4.0+.
Attachments
Issue Links
- is related to
-
CASSANDRA-16226 COMPACT STORAGE SSTables created before 3.0 are not correctly skipped by timestamp due to missing primary key liveness info
- Resolved
-
CASSANDRA-16671 Cassandra can return no row when the row columns have been deleted.
- Resolved