HBase
  1. HBase
  2. HBASE-3048

unify code for major/minor compactions

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      minor compactions now enforce ttl, maxversions, deletes, etc... but will let through the delete markers themselves

      Description

      Today minor compactions do not process deletes, purge old versions, etc. Only major compactions do. The rationale was probably to save CPU . We should evaluate if major compaction logic indeed runs significantly slower.

      Unifying minor compactions to do the same thing as major compactions has other advantages:

      • If the same keys are deleted/updated repeatedly, the fact that deletes/overwrites are not processed during minor compaction makes each subsequent minor compaction more expensive as the total amount of data keeps growing.
      • We'll have fewer bugs if the logic is as symmetric as possible. Any bugs in TTL enforcement, version enforcement, etc. could cause behavior to be different after a major compaction. Keeping the same logic means these bugs will get caught earlier.

      -

      Note: There will still need to be one difference in the two schemes, and that has to do with delete markers. Any compaction which doesn't compact all files will still need to leave delete markers.

      1. HBASE-3048-0.90-v2.patch
        31 kB
        Jonathan Gray
      2. HBASE-3048-v3.patch
        32 kB
        Jonathan Gray
      3. unify.patch
        19 kB
        Amitanand Aiyer

        Activity

        Hide
        stack added a comment -

        This is fine by me. The one objection I was going to raise was the delete markers story but you got that in your footnote.

        Show
        stack added a comment - This is fine by me. The one objection I was going to raise was the delete markers story but you got that in your footnote.
        Hide
        Pranav Khaitan added a comment -

        This can be done using the following simple change:
        Currently, MajorCompaction uses StoreScanner while MinorCompaction uses MinorCompactionStoreScanner which does a subset of things done by StoreScanner. We now want MinorCompaction to do most of the things being done by StoreScanner. Therefore, I would suggest making a flag in the StoreScanner which says if deletes should be processed or not. This flag will be ON by default and will only be set to OFF during a minor compaction. The only alternative to this idea would result in duplicating code which doesn't seem like a good idea.

        Show
        Pranav Khaitan added a comment - This can be done using the following simple change: Currently, MajorCompaction uses StoreScanner while MinorCompaction uses MinorCompactionStoreScanner which does a subset of things done by StoreScanner. We now want MinorCompaction to do most of the things being done by StoreScanner. Therefore, I would suggest making a flag in the StoreScanner which says if deletes should be processed or not. This flag will be ON by default and will only be set to OFF during a minor compaction. The only alternative to this idea would result in duplicating code which doesn't seem like a good idea.
        Hide
        Kannan Muthukkaruppan added a comment -

        Hi Pranav! yes, the flag approach should work well. The intention is to unify not only the processing details but the code/logic too.

        Show
        Kannan Muthukkaruppan added a comment - Hi Pranav! yes, the flag approach should work well. The intention is to unify not only the processing details but the code/logic too.
        Hide
        Amitanand Aiyer added a comment -

        Hey Guys,
        Here is an initial diff with the changes to implement the unification. Would like to add in some more changes to the TestCompaction code to make it more exhaustive.

        If you have any comments/suggestions that would be great!

        Thanks,
        -Amit

        Show
        Amitanand Aiyer added a comment - Hey Guys, Here is an initial diff with the changes to implement the unification. Would like to add in some more changes to the TestCompaction code to make it more exhaustive. If you have any comments/suggestions that would be great! Thanks, -Amit
        Hide
        stack added a comment -

        @Amit Did you attach a patch? I don't see it. Thanks.

        Show
        stack added a comment - @Amit Did you attach a patch? I don't see it. Thanks.
        Hide
        Amitanand Aiyer added a comment -

        Hi Stack, can you try again. I've uploaded it again now.

        Show
        Amitanand Aiyer added a comment - Hi Stack, can you try again. I've uploaded it again now.
        Hide
        stack added a comment -

        Ohh, radical.

        All tests pass?

        Patch looks good to me. I like the unit tests that assert stuff works as it used to (Thats what you are doing as I read this, right?)

        Good stuff Amit.

        Show
        stack added a comment - Ohh, radical. All tests pass? Patch looks good to me. I like the unit tests that assert stuff works as it used to (Thats what you are doing as I read this, right?) Good stuff Amit.
        Hide
        Kannan Muthukkaruppan added a comment -

        Amit,

        This overload of StoreScanner which takes a set of columns:

          StoreScanner(Store store, Scan scan, final NavigableSet<byte[]> columns,
                                      boolean includeDeletes) throws IOException {
        

        doesn't seem to be ever called with anything other than "false". So, perhaps no need to introduce this.

        You can instead have the old method:

          StoreScanner(Store store, Scan scan, final NavigableSet<byte[]> columns) 
                                      throws IOException {
        

        simply call ScanQueryMatcher with "false" for includeDeletes.

        Show
        Kannan Muthukkaruppan added a comment - Amit, This overload of StoreScanner which takes a set of columns: StoreScanner(Store store, Scan scan, final NavigableSet< byte []> columns, boolean includeDeletes) throws IOException { doesn't seem to be ever called with anything other than "false". So, perhaps no need to introduce this. You can instead have the old method: StoreScanner(Store store, Scan scan, final NavigableSet< byte []> columns) throws IOException { simply call ScanQueryMatcher with "false" for includeDeletes.
        Hide
        ryan rawson added a comment -

        The code is using an additional flag to the StoreScanner constructor, perhaps we should use the InternalScan infrastructure added by HBASE-3082 to keep this as clean as possible? That seems to be the primary criticism of this patch I think. The rest looks good, perhaps instead of "includeDeletes" it should be called "keepDeletesInOutput" or something really descriptive like that? Write one read many and all that.

        Show
        ryan rawson added a comment - The code is using an additional flag to the StoreScanner constructor, perhaps we should use the InternalScan infrastructure added by HBASE-3082 to keep this as clean as possible? That seems to be the primary criticism of this patch I think. The rest looks good, perhaps instead of "includeDeletes" it should be called "keepDeletesInOutput" or something really descriptive like that? Write one read many and all that.
        Hide
        Jonathan Gray added a comment -

        This is a rebased patch for trunk. We committed this to our internal 0.89-based repo.

        Show
        Jonathan Gray added a comment - This is a rebased patch for trunk. We committed this to our internal 0.89-based repo.
        Hide
        HBase Review Board added a comment -

        Message from: "Jonathan Gray" <jgray@apache.org>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.cloudera.org/r/1185/
        -----------------------------------------------------------

        Review request for hbase, stack, Kannan Muthukkaruppan, and Ryan Rawson.

        Summary
        -------

        This is a rebased patch from Amit. This was internally reviewed but we may want some minor modifications for 0.90 with the new InternalScanner, if possible. (suggestion from ryan on jira)

        This addresses bug HBASE-3048.
        http://issues.apache.org/jira/browse/HBASE-3048

        Diffs


        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java 1031745
        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1031745
        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1031745
        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1031745
        trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 1031745
        trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinorCompactingStoreScanner.java 1031745

        Diff: http://review.cloudera.org/r/1185/diff

        Testing
        -------

        Lot's of good new additions by Amit to TestCompaction which passes.

        Thanks,

        Jonathan

        Show
        HBase Review Board added a comment - Message from: "Jonathan Gray" <jgray@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1185/ ----------------------------------------------------------- Review request for hbase, stack, Kannan Muthukkaruppan, and Ryan Rawson. Summary ------- This is a rebased patch from Amit. This was internally reviewed but we may want some minor modifications for 0.90 with the new InternalScanner, if possible. (suggestion from ryan on jira) This addresses bug HBASE-3048 . http://issues.apache.org/jira/browse/HBASE-3048 Diffs trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java 1031745 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1031745 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1031745 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1031745 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 1031745 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinorCompactingStoreScanner.java 1031745 Diff: http://review.cloudera.org/r/1185/diff Testing ------- Lot's of good new additions by Amit to TestCompaction which passes. Thanks, Jonathan
        Hide
        HBase Review Board added a comment -

        Message from: "Jonathan Gray" <jgray@apache.org>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.cloudera.org/r/1185/
        -----------------------------------------------------------

        (Updated 2010-11-06 15:50:30.615330)

        Review request for hbase, stack, Kannan Muthukkaruppan, and Ryan Rawson.

        Changes
        -------

        Minor cleanup of whitespace and stuff.

        Summary
        -------

        This is a rebased patch from Amit. This was internally reviewed but we may want some minor modifications for 0.90 with the new InternalScanner, if possible. (suggestion from ryan on jira)

        This addresses bug HBASE-3048.
        http://issues.apache.org/jira/browse/HBASE-3048

        Diffs (updated)


        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java 1032175
        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1032175
        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1032175
        trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1032175
        trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 1032175
        trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinorCompactingStoreScanner.java 1032175

        Diff: http://review.cloudera.org/r/1185/diff

        Testing
        -------

        Lot's of good new additions by Amit to TestCompaction which passes.

        Thanks,

        Jonathan

        Show
        HBase Review Board added a comment - Message from: "Jonathan Gray" <jgray@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1185/ ----------------------------------------------------------- (Updated 2010-11-06 15:50:30.615330) Review request for hbase, stack, Kannan Muthukkaruppan, and Ryan Rawson. Changes ------- Minor cleanup of whitespace and stuff. Summary ------- This is a rebased patch from Amit. This was internally reviewed but we may want some minor modifications for 0.90 with the new InternalScanner, if possible. (suggestion from ryan on jira) This addresses bug HBASE-3048 . http://issues.apache.org/jira/browse/HBASE-3048 Diffs (updated) trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java 1032175 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1032175 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1032175 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1032175 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java 1032175 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinorCompactingStoreScanner.java 1032175 Diff: http://review.cloudera.org/r/1185/diff Testing ------- Lot's of good new additions by Amit to TestCompaction which passes. Thanks, Jonathan
        Hide
        HBase Review Board added a comment -

        Message from: "Jonathan Gray" <jgray@apache.org>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.cloudera.org/r/1185/#review1835
        -----------------------------------------------------------

        Looking at this now, it's at completely different level than the new InternalScan we have. That deals with scanner selection, this deals with an option in the QueryMatcher. There are already a bunch of options in SQM so we could consolidate at some point, but it's only a single level so doesn't seem so bad.

        I'm +1 on it (it's not my patch) but want to get someone else to review.

        • Jonathan
        Show
        HBase Review Board added a comment - Message from: "Jonathan Gray" <jgray@apache.org> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1185/#review1835 ----------------------------------------------------------- Looking at this now, it's at completely different level than the new InternalScan we have. That deals with scanner selection, this deals with an option in the QueryMatcher. There are already a bunch of options in SQM so we could consolidate at some point, but it's only a single level so doesn't seem so bad. I'm +1 on it (it's not my patch) but want to get someone else to review. Jonathan
        Hide
        Kannan Muthukkaruppan added a comment -

        +1 on patch. Looks good to me.

        Show
        Kannan Muthukkaruppan added a comment - +1 on patch. Looks good to me.
        Hide
        HBase Review Board added a comment -

        Message from: "Ryan Rawson" <ryanobjc@gmail.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.cloudera.org/r/1185/#review1837
        -----------------------------------------------------------

        Ship it!

        ok lets go with this

        • Ryan
        Show
        HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1185/#review1837 ----------------------------------------------------------- Ship it! ok lets go with this Ryan
        Hide
        Jonathan Gray added a comment -

        v3 patch is final one for commit.

        Show
        Jonathan Gray added a comment - v3 patch is final one for commit.
        Hide
        Jonathan Gray added a comment -

        Committed to trunk. Thanks for the great work Amit. Thanks for reviews from Ryan and Kannan.

        Show
        Jonathan Gray added a comment - Committed to trunk. Thanks for the great work Amit. Thanks for reviews from Ryan and Kannan.
        Hide
        Kannan Muthukkaruppan added a comment -

        Prakash reported that he was seeing really good benefit of this optimization for his "incr" use case (which essentially rewrites the same keys over and over again). So doing delete processing in minor compactions reduce the working set nicely, and also improves block cache efficiencies and so on.

        Show
        Kannan Muthukkaruppan added a comment - Prakash reported that he was seeing really good benefit of this optimization for his "incr" use case (which essentially rewrites the same keys over and over again). So doing delete processing in minor compactions reduce the working set nicely, and also improves block cache efficiencies and so on.
        Hide
        stack added a comment -

        @Kannan Thanks for reporting back.

        Show
        stack added a comment - @Kannan Thanks for reporting back.

          People

          • Assignee:
            Amitanand Aiyer
            Reporter:
            Kannan Muthukkaruppan
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development