[CASSANDRA-9107] More accurate row count estimates - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 2.1.6, 2.2.0 rc1
Component/s: None
Labels:
None

Description

Currently the estimated row count from cfstats is the sum of the number of rows in all the sstables. This becomes very inaccurate with wide rows or heavily updated datasets since the same partition would exist in many sstables. In example:

create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, 
'max_threshold': 100} ;
-------------------------------

insert INTO wide (key, value) VALUES ('key', 'value');
// flush
// cfstats output: Number of keys (estimate): 1  (128 in older version from index)

insert INTO wide (key, value) VALUES ('key', 'value');
// flush
// cfstats output: Number of keys (estimate): 2  (256 in older version from index)

... etc

previously it used the index but it still did it per sstable and summed them up which became inaccurate as there are more sstables (just by much worse). With new versions of sstables we can merge the cardinalities to resolve this with a slight hit to accuracy in the case of every sstable having completely unique partitions.

Furthermore I think it would be pretty minimal effort to include the number of rows in the memtables to this count. We wont have the cardinality merging between memtables and sstables but I would consider that a relatively minor negative.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

9107-cassandra2-1.patch
02/Apr/15 21:51
6 kB
Chris Lohfink
9107-v2.txt
28/Apr/15 14:52
6 kB
Sam Tunnicliffe

Activity

People

Assignee:: Chris Lohfink

Reporter:: Chris Lohfink

Authors:: Chris Lohfink

Reviewers:: Sam Tunnicliffe

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Apr/15 21:50

Updated:: 16/Apr/19 09:31

Resolved:: 21/May/15 12:48