Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: Core
    • Labels:

      Description

      Now that we have a metadata component it would be better to keep sstable level there, than in a separate manifest. With information per-sstable we don't need to do a full re-level if there is corruption.

      1. 0001-CASSANDRA-4872-move-sstable-level-into-sstable-metad-v1.patch
        69 kB
        Marcus Eriksson
      2. 0001-CASSANDRA-4872-move-sstable-level-into-sstable-metad-v2.patch
        71 kB
        Marcus Eriksson
      3. 0001-CASSANDRA-4872-move-sstable-level-into-sstable-metad-v3.patch
        71 kB
        Marcus Eriksson
      4. 0001-CASSANDRA-4872-move-sstable-level-into-sstable-metad-v4.patch
        73 kB
        Marcus Eriksson
      5. 4872-v5.txt
        65 kB
        Jonathan Ellis
      6. 0001-CASSANDRA-4872-wip-v6.patch
        72 kB
        Marcus Eriksson
      7. 0001-CASSANDRA-4872-wip-v7.patch
        73 kB
        Marcus Eriksson

        Issue Links

          Activity

          Hide
          Marcus Eriksson added a comment -

          CASSANDRA-5271 does that for you

          Show
          Marcus Eriksson added a comment - CASSANDRA-5271 does that for you
          Hide
          Rick Branson added a comment -

          Correct me if I'm wrong, but it looks like there's no longer a straightforward way to force a re-level now (say if you changed sstable_size_in_mb). Previously deleting the JSON manifest file would work.

          Show
          Rick Branson added a comment - Correct me if I'm wrong, but it looks like there's no longer a straightforward way to force a re-level now (say if you changed sstable_size_in_mb). Previously deleting the JSON manifest file would work.
          Hide
          Jonathan Ellis added a comment -

          SGTM, thanks.

          Show
          Jonathan Ellis added a comment - SGTM, thanks.
          Hide
          Ryan McGuire added a comment -

          I believe this is working on windows. I started with a 1.2.2 node, ran stress for some data, upgraded to trunk (b06c45a96d9bd3f3c0807e87bfbccf8f1a9e2459), restarted and I saw no errors in the logs along with this:

           INFO [CompactionExecutor:1] 2013-03-15 12:42:37,109 CompactionTask.java (line 108) Compacting [SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ic-5-Data.db'), SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ic-6-Data.db'), SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ic-4-Data.db'), SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ib-3-Data.db'), SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ib-2-Data.db'), SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ib-1-Data.db'), SSTableReader(path='\var\lib\cassandra\data\system\local\system-local-ic-7-Data.db')]
           INFO [CompactionExecutor:1] 2013-03-15 12:42:37,187 CompactionTask.java (line 270) Compacted 7 sstables to [\var\lib\cassandra\data\system\local\system-local-ic-9,].  1,215 bytes to 558 (~45% of original) in 63ms = 0.008447MB/s.  7 total rows, 1 unique.  Row merge counts were {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:1, }
          
          Show
          Ryan McGuire added a comment - I believe this is working on windows. I started with a 1.2.2 node, ran stress for some data, upgraded to trunk (b06c45a96d9bd3f3c0807e87bfbccf8f1a9e2459), restarted and I saw no errors in the logs along with this: INFO [CompactionExecutor:1] 2013-03-15 12:42:37,109 CompactionTask.java (line 108) Compacting [SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ic-5-Data.db'), SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ic-6-Data.db'), SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ic-4-Data.db'), SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ib-3-Data.db'), SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ib-2-Data.db'), SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ib-1-Data.db'), SSTableReader(path='\ var \lib\cassandra\data\system\local\system-local-ic-7-Data.db')] INFO [CompactionExecutor:1] 2013-03-15 12:42:37,187 CompactionTask.java (line 270) Compacted 7 sstables to [\ var \lib\cassandra\data\system\local\system-local-ic-9,]. 1,215 bytes to 558 (~45% of original) in 63ms = 0.008447MB/s. 7 total rows, 1 unique. Row merge counts were {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:1, }
          Hide
          Jonathan Ellis added a comment -

          Ryan, can you test this on a Windows VM? Either upgrading from 1.2's LCS, or invoking the add new sstables MBean will exercise the possibly problematic code. (Not sure Windows will allow us to rename over an existing file.)

          Show
          Jonathan Ellis added a comment - Ryan, can you test this on a Windows VM? Either upgrading from 1.2's LCS, or invoking the add new sstables MBean will exercise the possibly problematic code. (Not sure Windows will allow us to rename over an existing file.)
          Hide
          Jonathan Ellis added a comment -

          As far as i can see, reseting levels was not done when moving from STCS to LCS or when changing compaction options before this,

          Hmm... That's probably because, thinking about it further, leaving the levels the same for any surviving sstables should be harmless; if they didn't overlap before, they still won't overlap.

          Committed v6. Thanks for the help!

          Show
          Jonathan Ellis added a comment - As far as i can see, reseting levels was not done when moving from STCS to LCS or when changing compaction options before this, Hmm... That's probably because, thinking about it further, leaving the levels the same for any surviving sstables should be harmless; if they didn't overlap before, they still won't overlap. Committed v6. Thanks for the help!
          Hide
          Marcus Eriksson added a comment - - edited

          As far as i can see, reseting levels was not done when moving from STCS to LCS or when changing compaction options before this, the json file was left behind, meaning, for example that a change to max sstable size might leave data with a too high level etc.

          anyway, v7 resets levels when changing to LCS (or when updating compaction strategy options).

          Show
          Marcus Eriksson added a comment - - edited As far as i can see, reseting levels was not done when moving from STCS to LCS or when changing compaction options before this, the json file was left behind, meaning, for example that a change to max sstable size might leave data with a too high level etc. anyway, v7 resets levels when changing to LCS (or when updating compaction strategy options).
          Hide
          Jonathan Ellis added a comment -

          Good solution.

          I think the last thing we need is to force everything to L0 when moving from STCS to LCS, unless I missed where that's happening.

          Show
          Jonathan Ellis added a comment - Good solution. I think the last thing we need is to force everything to L0 when moving from STCS to LCS, unless I missed where that's happening.
          Hide
          Marcus Eriksson added a comment -
          • Removes the max(..) in SSTM
          • Moves the overlapping check to LeveledManifest.create(..) - this also makes sure that we are in a good state when people change compaction strategy to LCS
          • add license header to LegacyLeveledManifest

          btw, git show doesn't include the binary files, git format-patch does

          Show
          Marcus Eriksson added a comment - Removes the max(..) in SSTM Moves the overlapping check to LeveledManifest.create(..) - this also makes sure that we are in a good state when people change compaction strategy to LCS add license header to LegacyLeveledManifest btw, git show doesn't include the binary files, git format-patch does
          Hide
          Jonathan Ellis added a comment -

          (I think the difference in file size in v5 vs v4 is mostly due to "git show" vs "git am.")

          Show
          Jonathan Ellis added a comment - (I think the difference in file size in v5 vs v4 is mostly due to "git show" vs "git am.")
          Hide
          Jonathan Ellis added a comment -

          v5 attached with some minor cleanup.

          Two questions:

          • How do we attach sstables to CFS after calling Table.openWithoutSSTables? Table.open(true) will just use the earlier instance.
          • Why does SSTM.sstableLevel throw in the max() call instead of just accepting the new level?
          Show
          Jonathan Ellis added a comment - v5 attached with some minor cleanup. Two questions: How do we attach sstables to CFS after calling Table.openWithoutSSTables? Table.open(true) will just use the earlier instance. Why does SSTM.sstableLevel throw in the max() call instead of just accepting the new level?
          Hide
          Marcus Eriksson added a comment -

          v4 fixes overlapping sstables within a level on startup

          Show
          Marcus Eriksson added a comment - v4 fixes overlapping sstables within a level on startup
          Hide
          Jonathan Ellis added a comment -

          guess we might need to check for overlapping sstables on startup

          If we use the first/last keys from the metadata this should be tractable. (Of course we only have to check for overlaps w/in the claimed level.)

          Show
          Jonathan Ellis added a comment - guess we might need to check for overlapping sstables on startup If we use the first/last keys from the metadata this should be tractable. (Of course we only have to check for overlaps w/in the claimed level.)
          Hide
          Marcus Eriksson added a comment -

          v3 checks if the metadatafile actually exists before trying to change level

          Show
          Marcus Eriksson added a comment - v3 checks if the metadatafile actually exists before trying to change level
          Hide
          Marcus Eriksson added a comment -

          v2 sends back files to L0 when loadNewSSTables is called

          also realized that people might drop in sstables manually and then restart cassandra, this could make sstables within levels overlap

          guess we might need to check for overlapping sstables on startup

          Show
          Marcus Eriksson added a comment - v2 sends back files to L0 when loadNewSSTables is called also realized that people might drop in sstables manually and then restart cassandra, this could make sstables within levels overlap guess we might need to check for overlapping sstables on startup
          Hide
          Marcus Eriksson added a comment -

          there should be a tool to offline-drop all files back to L0, that is a "common" thing we have done a few times in production (by removing the .json file)

          ill do that as a separate ticket if this gets committed

          Show
          Marcus Eriksson added a comment - there should be a tool to offline-drop all files back to L0, that is a "common" thing we have done a few times in production (by removing the .json file) ill do that as a separate ticket if this gets committed
          Hide
          Marcus Eriksson added a comment - - edited

          moves sstable level into sstablemetadata

          makes SSTableMetadata files mutable to be able to send files back to L0 and to be able to migrate information from old json file into the metadata file without a full compaction/scrub of the sstable

          Show
          Marcus Eriksson added a comment - - edited moves sstable level into sstablemetadata makes SSTableMetadata files mutable to be able to send files back to L0 and to be able to migrate information from old json file into the metadata file without a full compaction/scrub of the sstable
          Hide
          Jonathan Ellis added a comment -

          Agreed, it is a bit messy. I think my preferred option would be to snapshot, then rewrite metadata and remove on startup.

          Show
          Jonathan Ellis added a comment - Agreed, it is a bit messy. I think my preferred option would be to snapshot, then rewrite metadata and remove on startup.
          Hide
          Marcus Eriksson added a comment -

          I looking at this now, i guess a requirement is to not have to do a complete relevel on upgrade? Gets quite messy but i guess the old json-reading-code will only have to stay for a short while though (ie, if this goes into 2.0, we can remove the backwards-compatible code in 2.1?)

          Show
          Marcus Eriksson added a comment - I looking at this now, i guess a requirement is to not have to do a complete relevel on upgrade? Gets quite messy but i guess the old json-reading-code will only have to stay for a short while though (ie, if this goes into 2.0, we can remove the backwards-compatible code in 2.1?)
          Hide
          Jonathan Ellis added a comment -

          I'm not completely sure what you mean by that?

          What I mean is, if you lose or corrupt the .manifest right now you're SOL and have to put everything in L0 and start over. If it's per-sstable then you don't have this extra non-sstable component causing fragility.

          Show
          Jonathan Ellis added a comment - I'm not completely sure what you mean by that? What I mean is, if you lose or corrupt the .manifest right now you're SOL and have to put everything in L0 and start over. If it's per-sstable then you don't have this extra non-sstable component causing fragility.
          Hide
          Sylvain Lebresne added a comment -

          With information per-sstable we don't need to do a full re-level if there is corruption.

          I'm not completely sure what you mean by that? Even if we do that ticket, we will need to reconstruct the manifest in-memory, and in doing that we might get corruption as well.

          I'm not saying I'm completely opposed to the idea, but I'm not I understand the benefits and it does seem to introduce some complexity (you'd have to reconstruct the manifest info from spread out sources).

          Show
          Sylvain Lebresne added a comment - With information per-sstable we don't need to do a full re-level if there is corruption. I'm not completely sure what you mean by that? Even if we do that ticket, we will need to reconstruct the manifest in-memory, and in doing that we might get corruption as well. I'm not saying I'm completely opposed to the idea, but I'm not I understand the benefits and it does seem to introduce some complexity (you'd have to reconstruct the manifest info from spread out sources).
          Hide
          Jonathan Ellis added a comment -

          Possible complexity:

          • loading external or from-snapshot sstables will need to ignore the level and start at L0 instead
          • there may be code paths where we promote an sstable, unmodified. We'd need to update the level then, making the metadata not-quite-immutable.
          Show
          Jonathan Ellis added a comment - Possible complexity: loading external or from-snapshot sstables will need to ignore the level and start at L0 instead there may be code paths where we promote an sstable, unmodified. We'd need to update the level then, making the metadata not-quite-immutable.

            People

            • Assignee:
              Marcus Eriksson
              Reporter:
              Jonathan Ellis
              Reviewer:
              Jonathan Ellis
              Tester:
              Ryan McGuire
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development