... but they didn't assert against which StoreFiles got selected. While the number of StoreFiles is the same, the files selected are actually different, and I thought that there should be explicit assertions showing that.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated -136 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.io.hfile.TestHFileBlock
org.apache.hadoop.hbase.TestInfoServers
org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
Hadoop QA
added a comment - -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12513160/TestCompactSelection_hbase_5330.java.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated -136 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.io.hfile.TestHFileBlock
org.apache.hadoop.hbase.TestInfoServers
org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/899//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/899//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/899//console
This message is automatically generated.
There's actually more subtle shortcomings that these tests expose.
1) The major compaction is treated differently than the reference use case because a major compaction does not require all the files. A major compaction just requires a contiguous range of the oldest files. We need to change compactSelection to return whether the operation is a major compaction or not.
2) The reference use case should not look all all the StoreFiles. Instead, it should compact range(oldest_reference, newest_reference).
3) I don't understand why reference files are [5:1]. It should be [7:3] like the others.
Nicolas Spiegelberg
added a comment - There's actually more subtle shortcomings that these tests expose.
1) The major compaction is treated differently than the reference use case because a major compaction does not require all the files. A major compaction just requires a contiguous range of the oldest files. We need to change compactSelection to return whether the operation is a major compaction or not.
2) The reference use case should not look all all the StoreFiles. Instead, it should compact range(oldest_reference, newest_reference).
3) I don't understand why reference files are [5:1] . It should be [7:3] like the others.
Especially for #3, the reason I was adding to the tests in the first place was to describe "what is happening" as opposed to "what should be happening." Basically, I want to get a better description of compaction in the book, so I figured that the unit tests were the best place to start.
Doug Meil
added a comment - Thanks Nicholas.
I'll give #1 and #2 a shot.
Especially for #3, the reason I was adding to the tests in the first place was to describe "what is happening" as opposed to "what should be happening." Basically, I want to get a better description of compaction in the book, so I figured that the unit tests were the best place to start.
Hey Nicholas, I initially mis-read this comment "We need to change compactSelection to return whether the operation is a major compaction or not" and now I see what you're talking about: there is no "isMajorCompaction" in CompactSelection.
Doug Meil
added a comment - Hey Nicholas, I initially mis-read this comment "We need to change compactSelection to return whether the operation is a major compaction or not" and now I see what you're talking about: there is no "isMajorCompaction" in CompactSelection.
The original code is doing a compaction, taking the output files, then doing a second compaction on them. Obviously, this is an identity operation, but is not technically correct since we're "double compacting".
Should return [3:7] because it's NOT actually doing a major compaction. Currently, the algorithm states that Majors with too many files are downgraded. This is not really the behavoir we want. Instead, for a major compaction, we should try to compact storefiles[0:N] where N >= min(minFiles, sizeof(storefiles)). This will be a little tricky, because the candidate files don't always contain storefile[0], which is necessary for compaction.
This is correct as written, but still needs some improvement. As I recall, the original reasoning was that we'd only hit this case when we had a bug where we kept flushing storefiles. We weren't sure how to handle it at the time (we had prod pressure). The problem is that we didn't have the state of previous compactions & we thought we'd have to get the whole candidate set. The idea was that, if we're going to recompact the same files multiple times, it should be the smaller files at the end rather than the last file. Since we only need a shard of the files for major compaction and reference files keep inherent state, we can improve this.
Nicolas Spiegelberg
added a comment - I spent a little time on this yesterday. This is correct behavior as written. Some detail:
#1
// Change
compactEquals(store.compactSelection(sfCreate(7,6,5,4,3,2,1)).getFilesToCompact(), 7,6,5,4,3);
// TO:
compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
The original code is doing a compaction, taking the output files, then doing a second compaction on them. Obviously, this is an identity operation, but is not technically correct since we're "double compacting".
#2
store.forceMajor = true ;
compactEquals(sfCreate(7, 6, 5, 4, 3, 2, 1), 7, 6, 5, 4, 3);
Should return [3:7] because it's NOT actually doing a major compaction. Currently, the algorithm states that Majors with too many files are downgraded. This is not really the behavoir we want. Instead, for a major compaction, we should try to compact storefiles [0:N] where N >= min(minFiles, sizeof(storefiles)). This will be a little tricky, because the candidate files don't always contain storefile [0] , which is necessary for compaction.
#3
// Reference compaction
compactEquals(sfCreate( true , 7, 6, 5, 4, 3, 2, 1), 5, 4, 3, 2, 1);
This is correct as written, but still needs some improvement. As I recall, the original reasoning was that we'd only hit this case when we had a bug where we kept flushing storefiles. We weren't sure how to handle it at the time (we had prod pressure). The problem is that we didn't have the state of previous compactions & we thought we'd have to get the whole candidate set. The idea was that, if we're going to recompact the same files multiple times, it should be the smaller files at the end rather than the last file. Since we only need a shard of the files for major compaction and reference files keep inherent state, we can improve this.
Thanks Nicholas. Mind if I commit the test after I update with these changes?
Regarding, #2 "Should return [3:7] because it's NOT actually doing a major compaction" this sounds like it should be a separate Jira (bug/improvement), correct?
Doug Meil
added a comment - Thanks Nicholas. Mind if I commit the test after I update with these changes?
Regarding, #2 "Should return [3:7] because it's NOT actually doing a major compaction" this sounds like it should be a separate Jira (bug/improvement), correct?
Hudson
added a comment - Integrated in HBase-TRUNK-security #103 (See https://builds.apache.org/job/HBase-TRUNK-security/103/ )
hbase-5330. Update to TestCompactSelection unit test for selection SF assertions.
Hudson
added a comment - Integrated in HBase-TRUNK #2656 (See https://builds.apache.org/job/HBase-TRUNK/2656/ )
hbase-5330. Update to TestCompactSelection unit test for selection SF assertions.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12513160/TestCompactSelection_hbase_5330.java.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
-1 javadoc. The javadoc tool appears to have generated -136 warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.io.hfile.TestHFileBlock
org.apache.hadoop.hbase.TestInfoServers
org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/899//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/899//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/899//console
This message is automatically generated.