Details
-
Improvement
-
Status: Open
-
Low
-
Resolution: Unresolved
-
None
Description
org.apache.cassandra.utils.BloomCalculations performs some table lookups to calculate the bloom filter specification (size, # of hashes). Using the exact maths for that computation brings a better false-positive-ratio (the maths usually returns higher numbers for hash-counts).
TL;DR increasing the number of hash-rounds brings a nice improvement. Finally it's a trade-off between CPU and I/O.
false-positive-chance | elements | capacity | hash count new | false-positive-ratio new | hash count current | false-positive-ratio current | improvement |
---|---|---|---|---|---|---|---|
0.1 | 10000 | 50048 | 3 | 0.0848 | 3 | 0.0848 | 0 |
0.1 | 100000 | 500032 | 3 | 0.09203 | 3 | 0.09203 | 0 |
0.1 | 1000000 | 5000064 | 3 | 0.0919 | 3 | 0.0919 | 0 |
0.1 | 10000000 | 50000064 | 3 | 0.09182 | 3 | 0.09182 | 0 |
0.1 | 100000000 | 500000064 | 3 | 0.091874 | 3 | 0.091874 | 0 |
0.01 | 10000 | 100032 | 7 | 0.0092 | 5 | 0.0107 | 0.1630434783 |
0.01 | 100000 | 1000064 | 7 | 0.00818 | 5 | 0.00931 | 0.1381418093 |
0.01 | 1000000 | 10000064 | 7 | 0.008072 | 5 | 0.009405 | 0.1651387512 |
0.01 | 10000000 | 100000064 | 7 | 0.008174 | 5 | 0.009375 | 0.146929288 |
0.01 | 100000000 | 1000000064 | 7 | 0.008197 | 5 | 0.009428 | 0.150176894 |
0.001 | 10000 | 150080 | 10 | 0.0008 | 7 | 0.001 | 0.25 |
0.001 | 100000 | 1500032 | 10 | 0.0006 | 7 | 0.00094 | 0.5666666667 |
0.001 | 1000000 | 15000064 | 10 | 0.000717 | 7 | 0.000991 | 0.3821478382 |
0.001 | 10000000 | 150000064 | 10 | 0.000743 | 7 | 0.000992 | 0.33512786 |
0.001 | 100000000 | 1500000064 | 10 | 0.000741 | 7 | 0.001002 | 0.3522267206 |
0.0001 | 10000 | 200064 | 13 | 0 | 10 | 0.0002 | #DIV/0! |
0.0001 | 100000 | 2000064 | 13 | 0.00004 | 10 | 0.0001 | 1.5 |
0.0001 | 1000000 | 20000064 | 13 | 0.000075 | 10 | 0.000091 | 0.2133333333 |
0.0001 | 10000000 | 200000064 | 13 | 0.000069 | 10 | 0.000087 | 0.2608695652 |
0.0001 | 100000000 | 2000000064 | 13 | 0.000068 | 10 | 0.00009 | 0.3235294118 |
If we decide to allow more hash-rounds, it could be nicely back-ported even to 2.0 without affecting existing sstables.
Attachments
Issue Links
- links to