Cassandra
  1. Cassandra
  2. CASSANDRA-2503

Eagerly re-write data at read time ("superseding / defragmenting")

    Details

      Description

      Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.

      Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).

      Initially described on 1608.

      1. 2503.txt
        2 kB
        Jonathan Ellis
      2. 2503-v2.txt
        2 kB
        Jonathan Ellis
      3. 2503-v3.txt
        5 kB
        Jonathan Ellis

        Issue Links

          Activity

          Stu Hood created issue -
          Stu Hood made changes -
          Field Original Value New Value
          Link This issue is blocked by CASSANDRA-2498 [ CASSANDRA-2498 ]
          Jonathan Ellis made changes -
          Attachment 2503.txt [ 12500392 ]
          Jonathan Ellis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Jonathan Ellis [ jbellis ]
          Reviewer slebresne
          Fix Version/s 1.0.1 [ 12317948 ]
          Jonathan Ellis made changes -
          Attachment 2503-v2.txt [ 12501115 ]
          Jonathan Ellis made changes -
          Attachment 2503-v2.txt [ 12501115 ]
          Jonathan Ellis made changes -
          Attachment 2503-v2.txt [ 12501116 ]
          Sylvain Lebresne made changes -
          Fix Version/s 1.0.2 [ 12318740 ]
          Fix Version/s 1.0.1 [ 12317948 ]
          Jonathan Ellis made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Jonathan Ellis made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Jonathan Ellis made changes -
          Fix Version/s 1.1 [ 12317615 ]
          Fix Version/s 1.0.2 [ 12318740 ]
          Jonathan Ellis made changes -
          Attachment 2503-v3.txt [ 12503921 ]
          Jonathan Ellis made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Jonathan Ellis made changes -
          Summary Eagerly re-write data at read time ("superseding") Eagerly re-write data at read time ("superseding / defragmenting")
          Gavin made changes -
          Workflow no-reopen-closed, patch-avail [ 12610883 ] patch-available, re-open possible [ 12753508 ]
          Gavin made changes -
          Workflow patch-available, re-open possible [ 12753508 ] reopen-resolved, no closed status, patch-avail, testing [ 12758779 ]

            People

            • Assignee:
              Jonathan Ellis
              Reporter:
              Stu Hood
              Reviewer:
              Sylvain Lebresne
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development