[CASSANDRA-13995] Don't fetch unnecessary data in SliceQueryFilter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Patch Available
Priority: Normal
Resolution: Unresolved
Fix Version/s: 2.2.x
Component/s: Legacy/Local Write-Read Paths
Labels:
None

Description

Link to patch branch on github: https://github.com/apache/cassandra/pull/170

Slice queries currently fetch more data then necessary, when there is only one column not part of the primary key. Specifically, SliceQueryFilter does not stop reading until it has seen `limit + 1` live cells, even though in the case we can stop after seeing `limit` live cells.

We have a use case where we use wide rows to implement versioning, by including a timestamp as part of the primary key. Every once in a while, we "garbage collect" old versions by deleting them. This results in a single column containing the latest version, followed by many tombstones.

We use a `LIMIT 1` query to select the latest version (which is the first column in the row). However, because SliceQueryFilter does not stop until it has seen `limit + 1` live cells, we have to read all the tombstones following the single live cell. Furthermore, if these tombstones are covering data in other sstables, we have to read all the corresponding data when merging the sstable iterators. This can be a massive performance hit, and is unexpectedly caused by deleting data.

This patch allows the `ColumnCounter` implementation to decide when it has seen enough cells. For counters that don't require grouping, we can stop immediately after finding the first cell.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nathan Ziebart

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Nov/17 17:55

Updated:: 16/Apr/19 09:29