Details
-
Improvement
-
Status: Resolved
-
Low
-
Resolution: Fixed
-
None
-
None
Description
Attached is a patch which resolves two issues:
First, the table of false positives for a given number of bits per element and a given number of hashes in BloomCalculations is incomplete. This patch includes the calculations from Table 5 of Bloom Filters - The Math, which is referenced in the documentation at the source of the table. This changes the optimal number of hashes for the Bloom filters created by SSTableWriter (with 15 bits per element) from 8 to 10, with a reduction in the false positive rate from 0.000852 to 0.000744. (The patch also extends the table to include false positive rates for 16-20 bits per element for future reference.)
Second, the values of BloomCalculations.optKPerBuckets were not correct, in that some of them did not choose the optimal number of hashes for a given number of bits per element, probably due to transcription errors in the past. This patch replaces that series of constants with an array of optimal hashes generated during BloomCalculation's static initialization.
This patch should apply cleanly to the 0.6 branch and trunk with all tests passing.