Details
-
Improvement
-
Status: Open
-
Normal
-
Resolution: Unresolved
-
None
-
Operability
-
Normal
-
All
-
None
Description
As we know Cassandra' secondary index is a local secondary index , and for every data update , and the every update hit the indexed columns. The old redundant data (Stale Entries) for index table are keeped in the table only when the data are read (may be a little like read repair ).
So there may exist some old and useless data for index table if they are not read. So we would like to support a tool that can remove the old useless data .See the picture below , we create a table with a secondary index on c1 column , then update data with same pk ,different c1 value, and we flush after every update, after that we force a major on the index table . See the sstable dump for secondary index (The dump tool for secondary index can not be used but fortunately we use the CASSANDRA-17698), and we can see the content of index sstable.
Below are the cql and dump result.
cqlsh> DESC ks.tb CREATE TABLE ks.tb ( pk int PRIMARY KEY, c1 int ) WITH additional_write_policy = '99p' AND allow_auto_snapshot = true AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND cdc = false AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND memtable = 'default' AND crc_check_chance = 1.0 AND default_time_to_live = 0 AND extensions = {} AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair = 'BLOCKING' AND speculative_retry = '99p'; CREATE INDEX idx ON ks.tb (c1); cqlsh> INSERT INTO ks.tb(pk, c1)values (1, 1); cqlsh> INSERT INTO ks.tb(pk, c1)values (1, 2); cqlsh> INSERT INTO ks.tb(pk, c1)values (1, 3); cqlsh>
On the other hand we flush after every update and force a major at the end.
bin git:(trunk) ✗ ./nodetool flush ➜ bin git:(trunk) ✗ ./nodetool flush ➜ bin git:(trunk) ✗ ./nodetool flush ➜ bin git:(trunk) ✗ ./nodetool compact ks tb.idx ➜ bin git:(trunk) ✗ ../tools/bin/sstabledump ../data/data/ks/tb-65d902b0b2bc11ed86ed81daebeca99d/.idx/nb-13-big-Data.db [ { "table kind" : "INDEX", "partition" : { "key" : [ "1" ], "position" : 0 }, "rows" : [ { "type" : "row", "position" : 18, "clustering" : [ 1 ], "liveness_info" : { "tstamp" : "2023-02-23T03:21:57.638558Z" }, "cells" : [ ] } ] }, { "table kind" : "INDEX", "partition" : { "key" : [ "2" ], "position" : 29 }, "rows" : [ { "type" : "row", "position" : 47, "clustering" : [ 1 ], "liveness_info" : { "tstamp" : "2023-02-23T03:22:19.834466Z" }, "cells" : [ ] } ] }, { "table kind" : "INDEX", "partition" : { "key" : [ "3" ], "position" : 61 }, "rows" : [ { "type" : "row", "position" : 79, "clustering" : [ 1 ], "liveness_info" : { "tstamp" : "2023-02-23T03:22:27.532174Z" }, "cells" : [ ] } ] } ]%