Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None

      Description

      All SSTableIterators have a constructor that assumes the key and length has already been parsed. Moving this logic inside the iterator will improve symmetry and allow the file format to change without iterator consumers knowing it.

        Issue Links

          Activity

          Hide
          Stu Hood added a comment -

          Marking as blocked by 2062, which allows for merging to guarantee that hasNext will not be called until the current item has been fully consumed. This will make it possible to encapsulate row consumption entirely inside the row iterator.

          Show
          Stu Hood added a comment - Marking as blocked by 2062, which allows for merging to guarantee that hasNext will not be called until the current item has been fully consumed. This will make it possible to encapsulate row consumption entirely inside the row iterator.
          Hide
          Stu Hood added a comment -

          Patch to move key/length reading into the iterators. Applies atop the linked issues.

          For convenience I removed a step from scrub in 0002 where if the key/length at the head of a row were corrupt, we retried with the key/length from the index. I'll understand if we want to add this back (would have to push it into the iterator), but either way we end up skipping the row as corrupt (using the index), so I don't think it is much of a loss.

          Show
          Stu Hood added a comment - Patch to move key/length reading into the iterators. Applies atop the linked issues. For convenience I removed a step from scrub in 0002 where if the key/length at the head of a row were corrupt, we retried with the key/length from the index. I'll understand if we want to add this back (would have to push it into the iterator), but either way we end up skipping the row as corrupt (using the index), so I don't think it is much of a loss.
          Hide
          Stu Hood added a comment -

          Rebased... applies atop CASSANDRA-2468 and CASSANDRA-2576. This one is vital to CASSANDRA-674, since the key and length are no longer located where unversioned readers expect them to be: in particular, they will be compressed with nearby buddies.

          Show
          Stu Hood added a comment - Rebased... applies atop CASSANDRA-2468 and CASSANDRA-2576 . This one is vital to CASSANDRA-674 , since the key and length are no longer located where unversioned readers expect them to be: in particular, they will be compressed with nearby buddies.
          Hide
          Stu Hood added a comment -

          Oops... adding back the link to 2062.

          Show
          Stu Hood added a comment - Oops... adding back the link to 2062.
          Hide
          Alan Liang added a comment -

          CompactionManager.java:
          -retrying from key/length from index is useful, we should add this back, as you mentioned in your comments above.
          -move "long rowSizeFromIndex = nextRowPositionFromIndex - currentRowPositionFromIndex;" into the IF statement where it is needed
          -in your log warnings, specifying the actual sstable will help with debugging

          SSTableNamesIterator.java:
          -remove "this.key = key;" for both constructors and that means "public DecoratedKey key;" can still be final
          *init() method should be more descriptive
          -remove @param key comment from IFilter.java and SSTableSliceIterator.java

          SSTableWriter.java:
          -calling close() on an SSTableIdentityIterator to go to the end doesn't sound right. Use another name other than "close()"
          -safer to updateCache(iter) AFTER appending to writer

          Show
          Alan Liang added a comment - CompactionManager.java: -retrying from key/length from index is useful, we should add this back, as you mentioned in your comments above. -move "long rowSizeFromIndex = nextRowPositionFromIndex - currentRowPositionFromIndex;" into the IF statement where it is needed -in your log warnings, specifying the actual sstable will help with debugging SSTableNamesIterator.java: -remove "this.key = key;" for both constructors and that means "public DecoratedKey key;" can still be final *init() method should be more descriptive -remove @param key comment from IFilter.java and SSTableSliceIterator.java SSTableWriter.java: -calling close() on an SSTableIdentityIterator to go to the end doesn't sound right. Use another name other than "close()" -safer to updateCache(iter) AFTER appending to writer
          Hide
          Stu Hood added a comment -

          move "long rowSizeFromIndex = nextRowPositionFromIndex - currentRowPositionFromIndex;" into the IF statement where it is needed

          Done.

          in your log warnings, specifying the actual sstable will help with debugging

          Scrub starts with a log entry to indicate which file is being scrubbed... this should be sufficient unless someone is running without INFO.

          remove "this.key = key;" for both constructors

          Good catch... that line was a noop in the second constructor: we don't know the key until init() is called, so we can't make it final.

          *init() method should be more descriptive

          Was supposed to say "Reads the key and row length from the head of the row.": fixed.

          remove @param key comment from IFilter.java and SSTableSliceIterator.java

          Fixed.

          safer to updateCache(iter) AFTER appending to writer

          I think the whole thing is racey, because the entire rebuilding process can fail, which might mean that the cache was updated when it shouldn't have been.


          I didn't make the close() change: we talked about close a little bit offline, I think we reached the consensus that having two methods that you must always call won't be an improvement over close. I also didn't add back the scrub retry: I agree that we can do better there, but I think it will be easier to improve once we have the checksumming from 674: at the moment the reasons for doing certain things with the API would not be clear at all.

          Thanks for the review!

          Show
          Stu Hood added a comment - move "long rowSizeFromIndex = nextRowPositionFromIndex - currentRowPositionFromIndex;" into the IF statement where it is needed Done. in your log warnings, specifying the actual sstable will help with debugging Scrub starts with a log entry to indicate which file is being scrubbed... this should be sufficient unless someone is running without INFO. remove "this.key = key;" for both constructors Good catch... that line was a noop in the second constructor: we don't know the key until init() is called, so we can't make it final. *init() method should be more descriptive Was supposed to say "Reads the key and row length from the head of the row.": fixed. remove @param key comment from IFilter.java and SSTableSliceIterator.java Fixed. safer to updateCache(iter) AFTER appending to writer I think the whole thing is racey, because the entire rebuilding process can fail, which might mean that the cache was updated when it shouldn't have been. I didn't make the close() change: we talked about close a little bit offline, I think we reached the consensus that having two methods that you must always call won't be an improvement over close. I also didn't add back the scrub retry: I agree that we can do better there, but I think it will be easier to improve once we have the checksumming from 674: at the moment the reasons for doing certain things with the API would not be clear at all. Thanks for the review!
          Hide
          Stu Hood added a comment -

          Rebased for trunk: applies atop #2468 and #2576 (assuming r1138084 is reverted).

          Show
          Stu Hood added a comment - Rebased for trunk: applies atop #2468 and #2576 (assuming r1138084 is reverted).
          Hide
          Stu Hood added a comment -

          Rebased for trunk. This ticket is a blocker for alternate file formats.

          Show
          Stu Hood added a comment - Rebased for trunk. This ticket is a blocker for alternate file formats.
          Hide
          Sylvain Lebresne added a comment -

          For convenience I removed a step from scrub in 0002 where if the key/length at the head of a row were corrupt, we retried with the key/length from the index. I'll understand if we want to add this back (would have to push it into the iterator), but either way we end up skipping the row as corrupt (using the index), so I don't think it is much of a loss.

          We don't end up skipping the row either way. If the index is valid, we end up "repairing the row". I'm for adding this back, as I don't see why this is so unconvenient and doesn't justify the regression.

          Other comments:

          • The rewrite of IncomingStreamReader is actually more buggy that IncomingStreamReader is already, in that it uses AbstractCompactRow for everything which in the cases where LCR will be used is problematic. The code in trunk is already problematic however (CASSANDRA-3003). Nevertheless, let's keep the code there as close as it is in trunk because 1) it is more "correct" in trunk and 2) there is no point in changing this code (more than what this ticket require) that will need to be fixed anyway.
          • In SSTII constructor, dataStart should be set to 0 when the input is a stream. Otherwise an assert of getColumnFamilyWithColumns() that uses headerSize() will fail. I agree that it is not used in streaming right now, but it could, there is no reason to make that fail.

          Some nits:

          • In SSTable {Names|Slice}

            Iterator first constructor, it would be nice to add back the assert that the key given is the same that the one on disk.

          • In SSTSI.close(), the if (file != null) can be an assert.
          • In SSTScanner.KSI.next(), there is a start variable that is unused.
          Show
          Sylvain Lebresne added a comment - For convenience I removed a step from scrub in 0002 where if the key/length at the head of a row were corrupt, we retried with the key/length from the index. I'll understand if we want to add this back (would have to push it into the iterator), but either way we end up skipping the row as corrupt (using the index), so I don't think it is much of a loss. We don't end up skipping the row either way. If the index is valid, we end up "repairing the row". I'm for adding this back, as I don't see why this is so unconvenient and doesn't justify the regression. Other comments: The rewrite of IncomingStreamReader is actually more buggy that IncomingStreamReader is already, in that it uses AbstractCompactRow for everything which in the cases where LCR will be used is problematic. The code in trunk is already problematic however ( CASSANDRA-3003 ). Nevertheless, let's keep the code there as close as it is in trunk because 1) it is more "correct" in trunk and 2) there is no point in changing this code (more than what this ticket require) that will need to be fixed anyway. In SSTII constructor, dataStart should be set to 0 when the input is a stream. Otherwise an assert of getColumnFamilyWithColumns() that uses headerSize() will fail. I agree that it is not used in streaming right now, but it could, there is no reason to make that fail. Some nits: In SSTable {Names|Slice} Iterator first constructor, it would be nice to add back the assert that the key given is the same that the one on disk. In SSTSI.close(), the if (file != null) can be an assert. In SSTScanner.KSI.next(), there is a start variable that is unused.
          Hide
          Stu Hood added a comment -

          Thanks for the review Sylvain: I commented on CASSANDRA-3003. I'll try to make these changes before Monday the 15th.

          Show
          Stu Hood added a comment - Thanks for the review Sylvain: I commented on CASSANDRA-3003 . I'll try to make these changes before Monday the 15th.
          Hide
          Stu Hood added a comment -

          Rebased, but without the changes from Sylvain's review... I'll get back to this once 3067 has had some eyes on it.

          Show
          Stu Hood added a comment - Rebased, but without the changes from Sylvain's review... I'll get back to this once 3067 has had some eyes on it.
          Hide
          Jonathan Ellis added a comment -

          Is this still relevant?

          Show
          Jonathan Ellis added a comment - Is this still relevant?
          Hide
          Sylvain Lebresne added a comment -

          Closing for lack of progress and because the currently attached patches are completely outdated. The row key is still sometimes read outside of the SSTableITerator classes, but I don't see us making the sstable format pluggable anytime soon, so while this is relevant, this would largely be cosmetic. But I'm fine reconsidering that if someone comes up with updated patches.

          Show
          Sylvain Lebresne added a comment - Closing for lack of progress and because the currently attached patches are completely outdated. The row key is still sometimes read outside of the SSTableITerator classes, but I don't see us making the sstable format pluggable anytime soon, so while this is relevant, this would largely be cosmetic. But I'm fine reconsidering that if someone comes up with updated patches.

            People

            • Assignee:
              Stu Hood
              Reporter:
              Stu Hood
              Reviewer:
              Sylvain Lebresne
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development