Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-8413

Bloom filter false positive ratio is not honoured

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 3.0 alpha 1
    • None
    • None
    • Normal

    Description

      Whilst thinking about CASSANDRA-7438 and hash bits, I realised we have a problem with sabotaging our bloom filters when using the murmur3 partitioner. I have performed a very quick test to confirm this risk is real.

      Since a typical cluster uses the same murmur3 hash for partitioning as we do for bloom filter lookups, and we own a contiguous range, we can guarantee that the top X bits collide for all keys on the node. This translates into poor bloom filter distribution. I quickly hacked LongBloomFilterTest to simulate the problem, and the result in these tests is up to a doubling of the actual false positive ratio. The actual change will depend on the key distribution, the number of keys, the false positive ratio, the number of nodes, the token distribution, etc. But seems to be a real problem for non-vnode clusters of at least ~128 nodes in size.

      Attachments

        1. 8413.hack.txt
          11 kB
          Benedict Elliott Smith
        2. 8413.hack-3.0.txt
          6 kB
          Robert Stupp
        3. 8413-patch.txt
          55 kB
          Robert Stupp

        Issue Links

          Activity

            People

              snazy Robert Stupp
              benedict Benedict Elliott Smith
              Robert Stupp
              Benedict Elliott Smith
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: