Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5330

TestCompactSelection - adding 2 test cases to testCompactionRatio

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      There were three existing assertions in TestCompactSelection testCompactionRatio that did "max # of files" assertions...

          assertEquals(maxFiles,
              store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact().size());
      

      ... and for references ...

        assertEquals(maxFiles,
              store.compactSelection(sfCreate(true, 7,6,5,4,3,2,1)).getFilesToCompact().size());
      

      ... but they didn't assert against which StoreFiles got selected. While the number of StoreFiles is the same, the files selected are actually different, and I thought that there should be explicit assertions showing that.

      Attachments

        Issue Links

          Activity

            hadoopqa Hadoop QA added a comment -

            -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12513160/TestCompactSelection_hbase_5330.java.patch
            against trunk revision .

            +1 @author. The patch does not contain any @author tags.

            +1 tests included. The patch appears to include 3 new or modified tests.

            -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

            +1 javac. The applied patch does not increase the total number of javac compiler warnings.

            -1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings.

            +1 release audit. The applied patch does not increase the total number of release audit warnings.

            -1 core tests. The patch failed these unit tests:
            org.apache.hadoop.hbase.io.hfile.TestHFileBlock
            org.apache.hadoop.hbase.TestInfoServers
            org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion

            Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/899//testReport/
            Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/899//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
            Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/899//console

            This message is automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12513160/TestCompactSelection_hbase_5330.java.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.TestInfoServers org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/899//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/899//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/899//console This message is automatically generated.

            There's actually more subtle shortcomings that these tests expose.

            1) The major compaction is treated differently than the reference use case because a major compaction does not require all the files. A major compaction just requires a contiguous range of the oldest files. We need to change compactSelection to return whether the operation is a major compaction or not.

            2) The reference use case should not look all all the StoreFiles. Instead, it should compact range(oldest_reference, newest_reference).

            3) I don't understand why reference files are [5:1]. It should be [7:3] like the others.

            nspiegelberg Nicolas Spiegelberg added a comment - There's actually more subtle shortcomings that these tests expose. 1) The major compaction is treated differently than the reference use case because a major compaction does not require all the files. A major compaction just requires a contiguous range of the oldest files. We need to change compactSelection to return whether the operation is a major compaction or not. 2) The reference use case should not look all all the StoreFiles. Instead, it should compact range(oldest_reference, newest_reference). 3) I don't understand why reference files are [5:1] . It should be [7:3] like the others.
            dmeil Doug Meil added a comment -

            Thanks Nicholas.

            I'll give #1 and #2 a shot.

            Especially for #3, the reason I was adding to the tests in the first place was to describe "what is happening" as opposed to "what should be happening." Basically, I want to get a better description of compaction in the book, so I figured that the unit tests were the best place to start.

            dmeil Doug Meil added a comment - Thanks Nicholas. I'll give #1 and #2 a shot. Especially for #3, the reason I was adding to the tests in the first place was to describe "what is happening" as opposed to "what should be happening." Basically, I want to get a better description of compaction in the book, so I figured that the unit tests were the best place to start.
            dmeil Doug Meil added a comment -

            Hey Nicholas, I initially mis-read this comment "We need to change compactSelection to return whether the operation is a major compaction or not" and now I see what you're talking about: there is no "isMajorCompaction" in CompactSelection.

            dmeil Doug Meil added a comment - Hey Nicholas, I initially mis-read this comment "We need to change compactSelection to return whether the operation is a major compaction or not" and now I see what you're talking about: there is no "isMajorCompaction" in CompactSelection.

            I spent a little time on this yesterday. This is correct behavior as written. Some detail:
            #1

            // Change
            compactEquals(store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact(), 7,6,5,4,3);
            // TO:
            compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
            

            The original code is doing a compaction, taking the output files, then doing a second compaction on them. Obviously, this is an identity operation, but is not technically correct since we're "double compacting".

            #2

            store.forceMajor = true;
            compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
            

            Should return [3:7] because it's NOT actually doing a major compaction. Currently, the algorithm states that Majors with too many files are downgraded. This is not really the behavoir we want. Instead, for a major compaction, we should try to compact storefiles[0:N] where N >= min(minFiles, sizeof(storefiles)). This will be a little tricky, because the candidate files don't always contain storefile[0], which is necessary for compaction.

            #3

            // Reference compaction
            compactEquals(sfCreate(true, 7, 6, 5, 4, 3, 2, 1), 5, 4, 3, 2, 1);
            

            This is correct as written, but still needs some improvement. As I recall, the original reasoning was that we'd only hit this case when we had a bug where we kept flushing storefiles. We weren't sure how to handle it at the time (we had prod pressure). The problem is that we didn't have the state of previous compactions & we thought we'd have to get the whole candidate set. The idea was that, if we're going to recompact the same files multiple times, it should be the smaller files at the end rather than the last file. Since we only need a shard of the files for major compaction and reference files keep inherent state, we can improve this.

            nspiegelberg Nicolas Spiegelberg added a comment - I spent a little time on this yesterday. This is correct behavior as written. Some detail: #1 // Change compactEquals(store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact(), 7,6,5,4,3); // TO: compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3); The original code is doing a compaction, taking the output files, then doing a second compaction on them. Obviously, this is an identity operation, but is not technically correct since we're "double compacting". #2 store.forceMajor = true ; compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3); Should return [3:7] because it's NOT actually doing a major compaction. Currently, the algorithm states that Majors with too many files are downgraded. This is not really the behavoir we want. Instead, for a major compaction, we should try to compact storefiles [0:N] where N >= min(minFiles, sizeof(storefiles)). This will be a little tricky, because the candidate files don't always contain storefile [0] , which is necessary for compaction. #3 // Reference compaction compactEquals(sfCreate( true , 7, 6, 5, 4, 3, 2, 1), 5, 4, 3, 2, 1); This is correct as written, but still needs some improvement. As I recall, the original reasoning was that we'd only hit this case when we had a bug where we kept flushing storefiles. We weren't sure how to handle it at the time (we had prod pressure). The problem is that we didn't have the state of previous compactions & we thought we'd have to get the whole candidate set. The idea was that, if we're going to recompact the same files multiple times, it should be the smaller files at the end rather than the last file. Since we only need a shard of the files for major compaction and reference files keep inherent state, we can improve this.
            dmeil Doug Meil added a comment -

            Thanks Nicholas. Mind if I commit the test after I update with these changes?

            Regarding, #2 "Should return [3:7] because it's NOT actually doing a major compaction" this sounds like it should be a separate Jira (bug/improvement), correct?

            dmeil Doug Meil added a comment - Thanks Nicholas. Mind if I commit the test after I update with these changes? Regarding, #2 "Should return [3:7] because it's NOT actually doing a major compaction" this sounds like it should be a separate Jira (bug/improvement), correct?
            dmeil Doug Meil added a comment -

            Committing this update to the unit test

            dmeil Doug Meil added a comment - Committing this update to the unit test
            hudson Hudson added a comment -

            Integrated in HBase-TRUNK-security #103 (See https://builds.apache.org/job/HBase-TRUNK-security/103/)
            hbase-5330. Update to TestCompactSelection unit test for selection SF assertions.

            hudson Hudson added a comment - Integrated in HBase-TRUNK-security #103 (See https://builds.apache.org/job/HBase-TRUNK-security/103/ ) hbase-5330. Update to TestCompactSelection unit test for selection SF assertions.
            hudson Hudson added a comment -

            Integrated in HBase-TRUNK #2656 (See https://builds.apache.org/job/HBase-TRUNK/2656/)
            hbase-5330. Update to TestCompactSelection unit test for selection SF assertions.

            hudson Hudson added a comment - Integrated in HBase-TRUNK #2656 (See https://builds.apache.org/job/HBase-TRUNK/2656/ ) hbase-5330. Update to TestCompactSelection unit test for selection SF assertions.

            People

              dmeil Doug Meil
              dmeil Doug Meil
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: