[CASSANDRA-2503] Eagerly re-write data at read time ("superseding / defragmenting") - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 1.1.0
Component/s: None
Labels:
- compaction
- performance

Description

Oncdsed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.

Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by ~~CASSANDRA-674~~ or ~~CASSANDRA-47~~), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).

Initially described on 1608.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

2503-v3.txt
16/Nov/11 18:46
5 kB
Jonathan Ellis
2503-v2.txt
27/Oct/11 16:39
2 kB
Jonathan Ellis
2503.txt
24/Oct/11 02:23
2 kB
Jonathan Ellis

Issue Links

is blocked by

CASSANDRA-2498 Improve read performance in update-intensive workload

Resolved

Activity

People

Assignee:: Jonathan Ellis

Reporter:: Stu Hood

Authors:: Jonathan Ellis

Reviewers:: Sylvain Lebresne

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Apr/11 02:16

Updated:: 09/Jul/19 22:12

Resolved:: 23/Nov/11 14:08