Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9167

Improve bloom-filter false-positive-ratio

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Low
    • Resolution: Unresolved
    • None
    • Legacy/Core

    Description

      org.apache.cassandra.utils.BloomCalculations performs some table lookups to calculate the bloom filter specification (size, # of hashes). Using the exact maths for that computation brings a better false-positive-ratio (the maths usually returns higher numbers for hash-counts).

      TL;DR increasing the number of hash-rounds brings a nice improvement. Finally it's a trade-off between CPU and I/O.

      false-positive-chance elements capacity hash count new false-positive-ratio new hash count current false-positive-ratio current improvement
      0.1 10000 50048 3 0.0848 3 0.0848 0
      0.1 100000 500032 3 0.09203 3 0.09203 0
      0.1 1000000 5000064 3 0.0919 3 0.0919 0
      0.1 10000000 50000064 3 0.09182 3 0.09182 0
      0.1 100000000 500000064 3 0.091874 3 0.091874 0
      0.01 10000 100032 7 0.0092 5 0.0107 0.1630434783
      0.01 100000 1000064 7 0.00818 5 0.00931 0.1381418093
      0.01 1000000 10000064 7 0.008072 5 0.009405 0.1651387512
      0.01 10000000 100000064 7 0.008174 5 0.009375 0.146929288
      0.01 100000000 1000000064 7 0.008197 5 0.009428 0.150176894
      0.001 10000 150080 10 0.0008 7 0.001 0.25
      0.001 100000 1500032 10 0.0006 7 0.00094 0.5666666667
      0.001 1000000 15000064 10 0.000717 7 0.000991 0.3821478382
      0.001 10000000 150000064 10 0.000743 7 0.000992 0.33512786
      0.001 100000000 1500000064 10 0.000741 7 0.001002 0.3522267206
      0.0001 10000 200064 13 0 10 0.0002 #DIV/0!
      0.0001 100000 2000064 13 0.00004 10 0.0001 1.5
      0.0001 1000000 20000064 13 0.000075 10 0.000091 0.2133333333
      0.0001 10000000 200000064 13 0.000069 10 0.000087 0.2608695652
      0.0001 100000000 2000000064 13 0.000068 10 0.00009 0.3235294118

      If we decide to allow more hash-rounds, it could be nicely back-ported even to 2.0 without affecting existing sstables.

      Attachments

        Issue Links

          Activity

            People

              snazy Robert Stupp
              snazy Robert Stupp
              Robert Stupp
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: