|
If you have fixes for BloomFilter and DynamicBloomFilter, please open a Jira for HBase https://issues.apache.org/jira/browse/HBASE
We should avoid replicating source files. So files are copied from HBase to Core, then they should ideally then be removed from HBase, since HBase relies on Core.
As for the org.onelab classes: shouldn't we import these as a jar rather than as source? Re: replicating the source file - I agree, I just wasn't sure how to solve this.
My experience with these classes is that they are not well-debugged, so often bug fixing is necessary. The original creator was an EU IST project, so it's unlikely we could expect any maintenance from that project. Therefore I don't think that packaging them as a third-party jar is a good option. I think it would be best to import these classes into org.apache.hadoop.util.bloom package, and keep the comment about the original authors in the javadoc, as we do it now, and subsequently remove these classes form HBase. We can't import them as a Jar because they implement Writable
> I think it would be best to import these classes into org.apache.hadoop.util.bloom package [ ...]
+1 The license should also be added to our top-level LICENSE.txt, in the style of Updated patch. This patch imports the Bloom filter classes into org.apache.hadoop.util.bloom, and adds a notice to LICENSE.txt.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378873/bloommap-v2.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 4 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac -1. The applied patch generated 579 javac compiler warnings (more than the trunk's current 568 warnings). release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 3 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2098/testReport/ This message is automatically generated. I'm not sure what to do about the Findbugs warnings. There are two of them, and both don't make sense to me.
It is complaining that your clone methods don't call the super.clone, which is generally considered good style. (So in your case, Filter needs a clone method that copies its fields that is called by the subtypes.)
Yes, please add synchronization. It is far easier to add synchronization to a method that deal with missing synchronization. You also need to address the javac and javadoc warnings. What is the status on this? I was assigned to fix some bugs in the Hbase Bloom Filter code, but moving the code back into Hadoop and removing it from HBase needs to be done first. Can I consider HBase 533 unblocked? There is also a new hashing mechanism now that was added to hbase by Andrzej, MurmurHash, that should also be moved back up into hadoop.
Sorry, I got involved in other stuff - I'd like to prepare a new patch in the next day or two, perhaps we could squeeze it into 0.20 ... For the purpose of this patch I'll take the latest BloomFilter-s as they are now in HBase (plus the MurmurHash).
Thanks, refactoring of the onelab stuff into a hadoop util.bloomfilters would cleanup some awkward code tangles in hbase.
This patch uses the latest versions of Bloom filters and Hash implementations from HBase. It successfully passes the 'ant test-patch' command:
[exec] +1 overall. Any chance that this issue makes it into 0.20 ?
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395548/bloommap-v3.patch against trunk revision 725729. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3709/testReport/ This message is automatically generated. Andrzej: It failed its own unit test up on hudson org.apache.hadoop.io.TestBloomMapFile.testMembershipTest.
Error Message hashType must be known Stacktrace java.lang.IllegalArgumentException: hashType must be known at org.apache.hadoop.util.bloom.HashFunction.<init>(HashFunction.java:92) at org.apache.hadoop.util.bloom.Filter.<init>(Filter.java:102) at org.apache.hadoop.util.bloom.DynamicBloomFilter.<init>(DynamicBloomFilter.java:121) at org.apache.hadoop.io.BloomMapFile$Writer.initBloomFilter(BloomMapFile.java:150) at org.apache.hadoop.io.BloomMapFile$Writer.<init>(BloomMapFile.java:143) at org.apache.hadoop.io.TestBloomMapFile.testMembershipTest(TestBloomMapFile.java:37) I ran it locally and got same failure. Reviewing the patch, it looks great. The above failure would seem to be because the last two arguments need to be flipped:
236 + bloomFilter = new DynamicBloomFilter(vectorSize, HASH_COUNT, numKeys,
237 + Hash.getHashType(conf));
....the DBF constructor looks like this: 1528 + public DynamicBloomFilter(int vectorSize, int nbHash, int hashType, int nr) { Test still fails with an assertion error so maybe the above is not right. There is an hbasism still in the code that you probably want to remove: 408 + String name = conf.get("hbase.hash.type", "murmur"); Apart from fumbling the args passed to the constructor, the unit test failed for a valid reason - the calculation of BloomFilter size was wrong. I fixed the calculation and added a config knob to adjust the error rate.
This time it passes the tests and gets +1's from ant test-patch. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395994/bloommap-v4.patch against trunk revision 726129. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3739/testReport/ This message is automatically generated. The failing tests seem unrelated to this patch:
org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.TestFileTailingAdaptors.testLogRotate org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend IMO, these failures shouldn't hold up committing this patch. Edit release note for publication.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
NOTE 1: I wasn't sure how to approach the issue of the org.onelab.* classes that I borrowed from HBase (which originally were a part of Hadoop core
). For now they are included verbatim here in this patch.
NOTE 2: the BloomFilter and DynamicBloomFilter classes contained a few bugs related to their Writable (de)serialization, some of them related to specific assumptions about the environment, some others fatal under all conditions. This patch contains these fixes too.