Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.7 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      As we continue to make changes to the on-disk format of SSTables, I propose we start versioning. The easiest way without breaking backwards compatibility is to store the version in the filename. This would allow us to figure out the version without looking at the SSTable data. After speaking to Jonathan here is the proposed example:

      <CF><ID><VERSION>-<DATA|INDEX|FILTER>

      1. 389-v3.patch
        20 kB
        Jonathan Ellis
      2. 0007-Don-t-serialize-Descriptors-as-filenames.patch
        3 kB
        Stu Hood
      3. 0006-Merge-oopses.patch
        28 kB
        Stu Hood
      4. 0005-The-special-casing-begins.patch
        1 kB
        Stu Hood
      5. 0004-Validate-params.patch
        4 kB
        Stu Hood
      6. 0003-Use-SSTable.Descriptor-for-Streaming.patch
        15 kB
        Stu Hood
      7. 0002-Add-keyspace-name-to-SSTable.Descriptor.patch
        6 kB
        Stu Hood
      8. 0001-Rebase-389-v4-for-trunk.patch
        24 kB
        Stu Hood

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Cassandra #364 (See http://hudson.zones.apache.org/hudson/job/Cassandra/364/)
          sstable versioning. Patch by Stu Hood, reviewed by Gary Dusbabek.

          Show
          Hudson added a comment - Integrated in Cassandra #364 (See http://hudson.zones.apache.org/hudson/job/Cassandra/364/ ) sstable versioning. Patch by Stu Hood, reviewed by Gary Dusbabek.
          Hide
          Gary Dusbabek added a comment -

          r912650

          Show
          Gary Dusbabek added a comment - r912650
          Hide
          Gary Dusbabek added a comment -

          +1 Everything seems to work and there doesn't appear to be any gotchas in the code.

          Show
          Gary Dusbabek added a comment - +1 Everything seems to work and there doesn't appear to be any gotchas in the code.
          Hide
          Gary Dusbabek added a comment -

          Deleted +1 because I found something that needs to be looked into.

          Show
          Gary Dusbabek added a comment - Deleted +1 because I found something that needs to be looked into.
          Hide
          Gary Dusbabek added a comment -

          I'm not sure what the historical context of the streaming directory is. I'll go about looking for a way to remove it today.

          Show
          Gary Dusbabek added a comment - I'm not sure what the historical context of the streaming directory is. I'll go about looking for a way to remove it today.
          Hide
          Jonathan Ellis added a comment -

          Whatever is easier is fine there imo, since we want to get rid of temporary to-be-streamed files entirely for CASSANDRA-579.

          Show
          Jonathan Ellis added a comment - Whatever is easier is fine there imo, since we want to get rid of temporary to-be-streamed files entirely for CASSANDRA-579 .
          Hide
          Stu Hood added a comment -

          Regarding 0008: perhaps rather than having the magical streaming directory, we should just leave files that are being streamed in 'temporary' status, and kill the subdirectory?

          Show
          Stu Hood added a comment - Regarding 0008: perhaps rather than having the magical streaming directory, we should just leave files that are being streamed in 'temporary' status, and kill the subdirectory?
          Hide
          Gary Dusbabek added a comment -

          I am not a fan of the storing-information-in-filenames approach. It is brittle, especially with the way we play fast and loose with streaming.

          Show
          Gary Dusbabek added a comment - I am not a fan of the storing-information-in-filenames approach. It is brittle, especially with the way we play fast and loose with streaming.
          Hide
          Stu Hood added a comment -

          Rebased patchset for trunk, and made sure we don't bump into CASSANDRA-794.

          Show
          Stu Hood added a comment - Rebased patchset for trunk, and made sure we don't bump into CASSANDRA-794 .
          Hide
          Stu Hood added a comment -

          Reminder: don't use File.toString() for serialization.

          Show
          Stu Hood added a comment - Reminder: don't use File.toString() for serialization.
          Hide
          Stu Hood added a comment - - edited

          Rebased for trunk. Since we know that file formats are going to change in 0.7, I'd like to get this in soon, so I don't have to waste any more time rebasing it.

          EDIT: Actually, I'm extracting interfaces for #674, so perhaps we should wait until we are ready to merge the new interfaces before we merge this one.
          EDIT2: See CASSANDRA-777

          Show
          Stu Hood added a comment - - edited Rebased for trunk. Since we know that file formats are going to change in 0.7, I'd like to get this in soon, so I don't have to waste any more time rebasing it. EDIT: Actually, I'm extracting interfaces for #674, so perhaps we should wait until we are ready to merge the new interfaces before we merge this one. EDIT2: See CASSANDRA-777
          Hide
          Stu Hood added a comment -

          Here is rebased version of the latest patchset: nothing significant has changed.

          I have not tested this with bootstrap, although it makes minor changes to Streaming (which pass the unit tests).

          Show
          Stu Hood added a comment - Here is rebased version of the latest patchset: nothing significant has changed. I have not tested this with bootstrap, although it makes minor changes to Streaming (which pass the unit tests).
          Hide
          Jonathan Ellis added a comment -

          > Perhaps version information is the one thing that should be stored outside the file

          +1

          Show
          Jonathan Ellis added a comment - > Perhaps version information is the one thing that should be stored outside the file +1
          Hide
          Stu Hood added a comment -

          I'm fine with storing the version and other metadata in the file (like 521 begins to do), as long as the different implementations of sstables can agree on the location of this information. For instance, the implementation in 674 stores SSTable metadata at the beginning of every block so that the sstable is easily splittable for use by the Streaming API or by Hadoop at some point in the future.

          Perhaps version information is the one thing that should be stored outside the file, because it can define how every other piece of metadata is stored?

          Show
          Stu Hood added a comment - I'm fine with storing the version and other metadata in the file (like 521 begins to do), as long as the different implementations of sstables can agree on the location of this information. For instance, the implementation in 674 stores SSTable metadata at the beginning of every block so that the sstable is easily splittable for use by the Streaming API or by Hadoop at some point in the future. Perhaps version information is the one thing that should be stored outside the file, because it can define how every other piece of metadata is stored?
          Hide
          Jonathan Ellis added a comment -

          as noted in CASSANDRA-521, we need to add the partitioner used to the sstable data file. is there any value to keeping version in filename once we are already adding partitioner to data file itself? (IMO the alternative, cramming partitioner into filename, is silly, and what if version N + 1 needs a 3rd piece of metadata?)

          Show
          Jonathan Ellis added a comment - as noted in CASSANDRA-521 , we need to add the partitioner used to the sstable data file. is there any value to keeping version in filename once we are already adding partitioner to data file itself? (IMO the alternative, cramming partitioner into filename, is silly, and what if version N + 1 needs a 3rd piece of metadata?)
          Hide
          Stu Hood added a comment -

          One more patch for the v5 set: upgrades weren't handled properly, because we began added the version tag to legacy filenames.

          Show
          Stu Hood added a comment - One more patch for the v5 set: upgrades weren't handled properly, because we began added the version tag to legacy filenames.
          Hide
          Stu Hood added a comment -

          New patchset: v5. Rebases v4 for trunk, adds the keyspace to the descriptor, and uses descriptors for streaming.

          No versions are checked: they're just being made available, so that something like #674 can increment, and add conditional handling.

          Show
          Stu Hood added a comment - New patchset: v5. Rebases v4 for trunk, adds the keyspace to the descriptor, and uses descriptors for streaming. No versions are checked: they're just being made available, so that something like #674 can increment, and add conditional handling.
          Hide
          Jonathan Ellis added a comment -

          there is also an implementation of data format versioning in CASSANDRA-521 now

          Show
          Jonathan Ellis added a comment - there is also an implementation of data format versioning in CASSANDRA-521 now
          Hide
          Jonathan Ellis added a comment -

          okay. let's table this until we actually need to break SSTable format again for whatever reason.

          Show
          Jonathan Ellis added a comment - okay. let's table this until we actually need to break SSTable format again for whatever reason.
          Hide
          Stu Hood added a comment -

          I may not get another chance to work on this one for a few days, so I wanted to attach the latest version of the patch.

          BootstrapTest is still broken with this version, because it assumes a lot about filenames, and uses string concatenation to identify what it is transferring. Ideally, it could use SSTable descriptors.

          Show
          Stu Hood added a comment - I may not get another chance to work on this one for a few days, so I wanted to attach the latest version of the patch. BootstrapTest is still broken with this version, because it assumes a lot about filenames, and uses string concatenation to identify what it is transferring. Ideally, it could use SSTable descriptors.
          Hide
          Stu Hood added a comment -

          This is almost finished, but I bumped into a bunch of code related to bootstrapping in db.Table.java that assumes a ton about filenames, that I don't have time to fix today. Hopefully I can get this wrapped up tomorrow.

          Show
          Stu Hood added a comment - This is almost finished, but I bumped into a bunch of code related to bootstrapping in db.Table.java that assumes a ton about filenames, that I don't have time to fix today. Hopefully I can get this wrapped up tomorrow.
          Hide
          Jonathan Ellis added a comment -

          not much point in having a version code if we don't validate it. this needs to start now, not in the future, since even if we did try to preserve indefinite backwards compatibility, there is no way to make 0.4 forwards-compatibile.

          (for the record, both of those options are bad. we'll probably write conversion utilities for contrib/ to migrate from version N-1 to version N. anything else is insane – we have enough trouble debugging a single binary format!)

          Show
          Jonathan Ellis added a comment - not much point in having a version code if we don't validate it. this needs to start now, not in the future, since even if we did try to preserve indefinite backwards compatibility, there is no way to make 0.4 forwards-compatibile. (for the record, both of those options are bad. we'll probably write conversion utilities for contrib/ to migrate from version N-1 to version N. anything else is insane – we have enough trouble debugging a single binary format!)
          Hide
          Stu Hood added a comment -

          > any version that is not the current version. (I used "z")
          Ahhh, I see what you mean. This patch doesn't actually implement handling for the different versions. In the future, when we need to make non-compatible changes to the disk format, we'll increment the version, and either:

          • Add conditionals all over the place to handle older versions differently (yuck),
          • Subclass SSTableReader/Writer for the new version, and modify the factory functions to return different reader subclasses per version.

          > having "legacy" be distinguished from "current" is probably a mistake since you then have to special case it.
          You're right. Since there are no file format differences between the LEGACY and CURRENT versions, they should both be "a". I'll update the patch.

          Show
          Stu Hood added a comment - > any version that is not the current version. (I used "z") Ahhh, I see what you mean. This patch doesn't actually implement handling for the different versions. In the future, when we need to make non-compatible changes to the disk format, we'll increment the version, and either: Add conditionals all over the place to handle older versions differently (yuck), Subclass SSTableReader/Writer for the new version, and modify the factory functions to return different reader subclasses per version. > having "legacy" be distinguished from "current" is probably a mistake since you then have to special case it. You're right. Since there are no file format differences between the LEGACY and CURRENT versions, they should both be "a". I'll update the patch.
          Hide
          Jonathan Ellis added a comment -

          > Which is the "wrong" version: the legacy version?

          any version that is not the current version. (I used "z")

          having "legacy" be distinguished from "current" is probably a mistake since you then have to special case it.

          Show
          Jonathan Ellis added a comment - > Which is the "wrong" version: the legacy version? any version that is not the current version. (I used "z") having "legacy" be distinguished from "current" is probably a mistake since you then have to special case it.
          Hide
          Stu Hood added a comment -

          > but I renamed a set of sstable files in data/ to the "wrong" version and it did not catch this at startup.
          I'm not sure what you mean. Which is the "wrong" version: the legacy version?

          > (does it still ignore tmp files at startup?)
          I'll double check, but I assume so. I didn't change the check that removes tmp files, because it did it based on looking for TEMPFILE_MARKER anywhere in the file name, which should continue to work.

          Show
          Stu Hood added a comment - > but I renamed a set of sstable files in data/ to the "wrong" version and it did not catch this at startup. I'm not sure what you mean. Which is the "wrong" version: the legacy version? > (does it still ignore tmp files at startup?) I'll double check, but I assume so. I didn't change the check that removes tmp files, because it did it based on looking for TEMPFILE_MARKER anywhere in the file name, which should continue to work.
          Hide
          Jonathan Ellis added a comment -

          here is a version with spaces, bracing, and redundant this cleaned up.

          but I renamed a set of sstable files in data/ to the "wrong" version and it did not catch this at startup.

          (does it still ignore tmp files at startup?)

          Show
          Jonathan Ellis added a comment - here is a version with spaces, bracing, and redundant this cleaned up. but I renamed a set of sstable files in data/ to the "wrong" version and it did not catch this at startup. (does it still ignore tmp files at startup?)
          Hide
          Stu Hood added a comment -

          The coding style changes are in this new version of the patch. I also noticed that we weren't ignoring subdirectories in the ColumnFamily directory (BootstrapTest leaves behind a subdirectory in there, which is a separate issue).

          I didn't change the tokenization, because I didn't see how it would make the code simpler... since TEMPFILE_MARKER is optional, checking the count of tokens isn't enough, and the split code would have had a proliferation of nested if statements (or something identical to the current code, with a position int to increment, rather than a 'nextToken' method).

          Show
          Stu Hood added a comment - The coding style changes are in this new version of the patch. I also noticed that we weren't ignoring subdirectories in the ColumnFamily directory (BootstrapTest leaves behind a subdirectory in there, which is a separate issue). I didn't change the tokenization, because I didn't see how it would make the code simpler... since TEMPFILE_MARKER is optional, checking the count of tokens isn't enough, and the split code would have had a proliferation of nested if statements (or something identical to the current code, with a position int to increment, rather than a 'nextToken' method).
          Hide
          Jonathan Ellis added a comment -

          +1 on the general approach and cleanup, thanks!

          Can you redo to follow http://wiki.apache.org/cassandra/CodeStyle?

          In particular the Descriptor final fields should be public and not underscored. Avoid mutating the temporary-ness in favor of creating a new Descriptor. And use split() instead of tokenizing; just counting the fields will show whether it's a "legacy" name.

          Show
          Jonathan Ellis added a comment - +1 on the general approach and cleanup, thanks! Can you redo to follow http://wiki.apache.org/cassandra/CodeStyle? In particular the Descriptor final fields should be public and not underscored. Avoid mutating the temporary-ness in favor of creating a new Descriptor. And use split() instead of tokenizing; just counting the fields will show whether it's a "legacy" name.
          Hide
          Stu Hood added a comment - - edited

          The filename format I settled on was:

          <CF>[-<TMP>]<VERSION><GENERATION>-<DATA|INDEX|FILTER>

          Show
          Stu Hood added a comment - - edited The filename format I settled on was: <CF> [-<TMP>] <VERSION> <GENERATION>-<DATA|INDEX|FILTER>
          Hide
          Stu Hood added a comment -

          This patch moves SSTable filename parsing into a SSTable.Descriptor class which has a special case for unversioned SSTable files. Systems running with legacy unversioned SSTables should write versioned SSTables during their next compaction.

          I think there is a lot more potential for the SSTable.Descriptor class: pretty much anywhere we use a String to describe an open SSTable, but I think it is out of the scope of this patch.

          Show
          Stu Hood added a comment - This patch moves SSTable filename parsing into a SSTable.Descriptor class which has a special case for unversioned SSTable files. Systems running with legacy unversioned SSTables should write versioned SSTables during their next compaction. I think there is a lot more potential for the SSTable.Descriptor class: pretty much anywhere we use a String to describe an open SSTable, but I think it is out of the scope of this patch.
          Hide
          Jonathan Ellis added a comment -

          note that it's important to make versions alphabetic (not numeric or alphanumeric) b/c generation is already numeric, so keeping those in separate vocabularies will help catch potential errors

          Show
          Jonathan Ellis added a comment - note that it's important to make versions alphabetic (not numeric or alphanumeric) b/c generation is already numeric, so keeping those in separate vocabularies will help catch potential errors
          Hide
          Jonathan Ellis added a comment -

          i suggest replacing the ad-hoc split/tokenize code in multiple places with a single method

          SSTableInfo parseSSTableInfo(String filename)

          where SSTI is something like

          class SSTableInfo

          { public final String columnFamilyName, version; public final int generation; }
          Show
          Jonathan Ellis added a comment - i suggest replacing the ad-hoc split/tokenize code in multiple places with a single method SSTableInfo parseSSTableInfo(String filename) where SSTI is something like class SSTableInfo { public final String columnFamilyName, version; public final int generation; }
          Hide
          Chris Goffinet added a comment -

          Version should start alphabetically, single character to start.

          Show
          Chris Goffinet added a comment - Version should start alphabetically, single character to start.

            People

            • Assignee:
              Stu Hood
              Reporter:
              Chris Goffinet
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development