Cassandra
  1. Cassandra
  2. CASSANDRA-3497

BloomFilter FP ratio should be configurable or size-restricted some other way

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 1.0.7
    • Component/s: Core
    • Labels:
      None

      Description

      When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them. It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

      1. 0001-Add-bloom_filter_fp_chance-to-cli.patch
        2 kB
        Yuki Morishita
      2. 0001-give-default-val-to-fp_chance.patch
        0.9 kB
        Yuki Morishita
      3. 3497-v3.txt
        21 kB
        Jonathan Ellis
      4. 3497-v4.txt
        21 kB
        Jonathan Ellis
      5. CASSANDRA-1.0-3497.txt
        26 kB
        Yuki Morishita

        Issue Links

          Activity

          Hide
          Jonathan Ellis added a comment -

          Hmm, that sounds messy. How do you propose to distinguish BF configuration per-datacenter in the schema?

          Show
          Jonathan Ellis added a comment - Hmm, that sounds messy. How do you propose to distinguish BF configuration per-datacenter in the schema?
          Hide
          Brandon Williams added a comment -

          Perhaps as a strategy_option?

          Show
          Brandon Williams added a comment - Perhaps as a strategy_option?
          Hide
          Radim Kolar added a comment -

          BF configuration needs to be per CF like in HBASE. This will allow to have CF used for log with minimal BF if their rows are rarely read back.

          See HBASE for example:
          http://hbase.apache.org/book/blooms.html#d1161e4353

          Show
          Radim Kolar added a comment - BF configuration needs to be per CF like in HBASE. This will allow to have CF used for log with minimal BF if their rows are rarely read back. See HBASE for example: http://hbase.apache.org/book/blooms.html#d1161e4353
          Hide
          Radim Kolar added a comment -

          It will be good to have ability to shrink bloom filter during loading. Save only standard cassandra bloom filters but shrink them during load according to CF settings.

          Show
          Radim Kolar added a comment - It will be good to have ability to shrink bloom filter during loading. Save only standard cassandra bloom filters but shrink them during load according to CF settings.
          Hide
          Yuki Morishita added a comment -

          The problem is that currently strategy_options for NTS is thoroughly for replication setting, for example

          {DC1:2, DC2:2}

          .
          We can do like strategy_options=

          {DC1:2, DC2:1, DC2:fp(0.5)}

          or strategy_options=

          {DC1:2, DC2:1,fp(0.5)}

          or something preserving backward compatibility, but I think it's complicated.

          Maybe easiest fix is to have node-wide setting for fp ratio in cassandra.yaml (w/ jmx interface exposed) and have different values for each datacenter?

          Show
          Yuki Morishita added a comment - The problem is that currently strategy_options for NTS is thoroughly for replication setting, for example {DC1:2, DC2:2} . We can do like strategy_options= {DC1:2, DC2:1, DC2:fp(0.5)} or strategy_options= {DC1:2, DC2:1,fp(0.5)} or something preserving backward compatibility, but I think it's complicated. Maybe easiest fix is to have node-wide setting for fp ratio in cassandra.yaml (w/ jmx interface exposed) and have different values for each datacenter?
          Hide
          Brandon Williams added a comment -

          Maybe easiest fix is to have node-wide setting for fp ratio in cassandra.yaml (w/ jmx interface exposed) and have different values for each datacenter?

          Yes, I think that's good enough for the multi-datacenter scenario, however as Radim mentioned we also have a good use case for a per-CF threshold. We could do both, and then use whichever value is the lower, the one in the CF schema or the one in the node's yaml.

          Show
          Brandon Williams added a comment - Maybe easiest fix is to have node-wide setting for fp ratio in cassandra.yaml (w/ jmx interface exposed) and have different values for each datacenter? Yes, I think that's good enough for the multi-datacenter scenario, however as Radim mentioned we also have a good use case for a per-CF threshold. We could do both, and then use whichever value is the lower, the one in the CF schema or the one in the node's yaml.
          Hide
          Jonathan Ellis added a comment -

          Let's just go with a per-CF option. Brandon's right that ideally we'd like to configure it differently (ideally leaving them out entirely) in analytical DCs but I don't want to invent a totally new concept in 1.0.x, and having it per-CF (which we get via schema) is more important than having it per-DC (which we get with strategy_options).

          Show
          Jonathan Ellis added a comment - Let's just go with a per-CF option. Brandon's right that ideally we'd like to configure it differently (ideally leaving them out entirely) in analytical DCs but I don't want to invent a totally new concept in 1.0.x, and having it per-CF (which we get via schema) is more important than having it per-DC (which we get with strategy_options).
          Hide
          Yuki Morishita added a comment -

          I added 2 new Bloom Filter related options to CFMetadata.

          • filter_enabled
            if set to false, SSTableReader uses EMPTY bloom filter. Default to true.
          • fp_ratio
            if the value is greater than 0, SSTableReader adjusts Bloom Filter based on FP ratio and uses it. Default to 0.

          BloomFilter is created and saved as usual, but when opening SSTableReader, you got the one based on the CF setting.

          One thing to note is that the change is effective when next time SSTableReader is opened, so you need to restart node or compact/scrub sstable for existing sstables.

          Show
          Yuki Morishita added a comment - I added 2 new Bloom Filter related options to CFMetadata. filter_enabled if set to false, SSTableReader uses EMPTY bloom filter. Default to true. fp_ratio if the value is greater than 0, SSTableReader adjusts Bloom Filter based on FP ratio and uses it. Default to 0. BloomFilter is created and saved as usual, but when opening SSTableReader, you got the one based on the CF setting. One thing to note is that the change is effective when next time SSTableReader is opened, so you need to restart node or compact/scrub sstable for existing sstables.
          Hide
          Jonathan Ellis added a comment -

          Can we do it with a single setting?

          fp_ratio = null: use current 15-buckets-per-element filters
          fp_ratio = 0: no filter
          fp_ratio > 0: BF based on given FP probability

          Further, I think we should split this up so that for 1.0 we only worry about the null and positive cases – let's do a separate ticket for 1.1 about skipping the BF entirely.

          Show
          Jonathan Ellis added a comment - Can we do it with a single setting? fp_ratio = null: use current 15-buckets-per-element filters fp_ratio = 0: no filter fp_ratio > 0: BF based on given FP probability Further, I think we should split this up so that for 1.0 we only worry about the null and positive cases – let's do a separate ticket for 1.1 about skipping the BF entirely.
          Hide
          Yuki Morishita added a comment -

          OK, in attached patch, I removed filter_enabled option.

          Show
          Yuki Morishita added a comment - OK, in attached patch, I removed filter_enabled option.
          Hide
          Jonathan Ellis added a comment -

          Sorry, I didn't look closely enough the first time. The BloomFilter#modify approach won't work: when we change the BF parameters we change what bits should be set – there's no way to rebuild it with new parameters without re-inserting all the keys.

          Attached v3 that just changes the BloomFilter constructor in SSTableWriter. (So, people will have to scrub to rebuild things, but that's the best we can do.) Also changed the setting to bloom_filter_fp_chance and updated cli help.

          How does that look to you?

          Show
          Jonathan Ellis added a comment - Sorry, I didn't look closely enough the first time. The BloomFilter#modify approach won't work: when we change the BF parameters we change what bits should be set – there's no way to rebuild it with new parameters without re-inserting all the keys. Attached v3 that just changes the BloomFilter constructor in SSTableWriter. (So, people will have to scrub to rebuild things, but that's the best we can do.) Also changed the setting to bloom_filter_fp_chance and updated cli help. How does that look to you?
          Hide
          Yuki Morishita added a comment -

          Jonathan,

          Yours is what I first tried, but instead I tried to do it in SSTR, and I think that is what we can do best for 1.0.x.
          One thing to point out is that it NPE when fpChance is null and try to convert it to double at SSTableWriter.java#403.

          Show
          Yuki Morishita added a comment - Jonathan, Yours is what I first tried, but instead I tried to do it in SSTR, and I think that is what we can do best for 1.0.x. One thing to point out is that it NPE when fpChance is null and try to convert it to double at SSTableWriter.java#403.
          Hide
          Jonathan Ellis added a comment -

          v4 attached with unbox-of-null fixed.

          Show
          Jonathan Ellis added a comment - v4 attached with unbox-of-null fixed.
          Hide
          Yuki Morishita added a comment -

          +1

          Show
          Yuki Morishita added a comment - +1
          Hide
          Jonathan Ellis added a comment -

          committed

          Show
          Jonathan Ellis added a comment - committed
          Hide
          Radim Kolar added a comment -

          i compiled jars with this patch and cassandra do not boots an existing node

          Opening /var/lib/cassandra/data/system/Migrations-hc-109 (757635 bytes)
          INFO [SSTableBatchOpen:1] 2011-12-24 18:26:47,326 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/LocationInfo-hc-273 (647 bytes)
          INFO [SSTableBatchOpen:1] 2011-12-24 18:26:47,338 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-1 (275 bytes)
          INFO [SSTableBatchOpen:2] 2011-12-24 18:26:47,338 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-2 (85 bytes)
          INFO [main] 2011-12-24 18:26:47,396 DatabaseDescriptor.java (line 501) Loading schema version ad8d50b0-2cc3-11e1-0000-b1504fb874be
          ERROR [main] 2011-12-24 18:26:47,555 AbstractCassandraDaemon.java (line 372) Exception encountered during startup
          org.apache.avro.AvroTypeException: Found {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.db.migration.avro","fields":[

          {"name":"keyspace","type":"string"}

          ,

          {"name":"name","type":"string"}

          ,

          {"name":"column_type","type":["string","null"]}

          ,

          {"name":"comparator_type","type":["string","null"]}

          ,

          {"name":"subcomparator_type","type":["string","null"]}

          ,

          {"name":"comment","type":["string","null"]}

          ,

          {"name":"row_cache_size","type":["double","null"]}

          ,

          {"name":"key_cache_size","type":["double","null"]}

          ,

          {"name":"read_repair_chance","type":["double","null"]}

          ,

          {"name":"replicate_on_write","type":"boolean","default":false}

          ,

          {"name":"gc_grace_seconds","type":["int","null"]}

          ,

          {"name":"default_validation_class","type":["null","string"],"default":null}

          ,

          {"name":"key_validation_class","type":["null","string"],"default":null}

          ,

          {"name":"min_compaction_threshold","type":["null","int"],"default":null}

          ,

          {"name":"max_compaction_threshold","type":["null","int"],"default":null}

          ,

          {"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0}

          ,

          {"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600}

          ,

          {"name":"row_cache_keys_to_save","type":["null","int"],"default":null}

          ,

          {"name":"merge_shards_chance","type":["null","double"],"default":null}

          ,

          {"name":"id","type":["int","null"]}

          ,{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[

          {"name":"name","type":"bytes"}

          ,

          {"name":"validation_class","type":"string"}

          ,{"name":"index_type","type":[

          {"type":"enum","name":"IndexType","symbols":["KEYS","CUSTOM"],"aliases":["org.apache.cassandra.config.avro.IndexType"]}

          ,"null"]},

          {"name":"index_name","type":["string","null"]}

          ,{"name":"index_options","type":["null",

          {"type":"map","values":"string"}

          ],"default":null}]}},"null"]},

          {"name":"row_cache_provider","type":["string","null"],"default":"org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider"}

          ,

          {"name":"key_alias","type":["null","bytes"],"default":null}

          ,

          {"name":"compaction_strategy","type":["null","string"],"default":null}

          ,{"name":"compaction_strategy_options","type":["null",

          {"type":"map","values":"string"}

          ],"default":null},{"name":"compression_options","type":["null",

          {"type":"map","values":"string"}

          ],"default":null}]}, expecting {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.db.migration.avro","fields":[

          {"name":"keyspace","type":"string"}

          ,

          {"name":"name","type":"string"}

          ,

          {"name":"column_type","type":["string","null"]}

          ,

          {"name":"comparator_type","type":["string","null"]}

          ,

          {"name":"subcomparator_type","type":["string","null"]}

          ,

          {"name":"comment","type":["string","null"]}

          ,

          {"name":"row_cache_size","type":["double","null"]}

          ,

          {"name":"key_cache_size","type":["double","null"]}

          ,

          {"name":"read_repair_chance","type":["double","null"]}

          ,

          {"name":"replicate_on_write","type":"boolean","default":false}

          ,

          {"name":"gc_grace_seconds","type":["int","null"]}

          ,

          {"name":"default_validation_class","type":["null","string"],"default":null}

          ,

          {"name":"key_validation_class","type":["null","string"],"default":null}

          ,

          {"name":"min_compaction_threshold","type":["null","int"],"default":null}

          ,

          {"name":"max_compaction_threshold","type":["null","int"],"default":null}

          ,

          {"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0}

          ,

          {"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600}

          ,

          {"name":"row_cache_keys_to_save","type":["null","int"],"default":null}

          ,

          {"name":"merge_shards_chance","type":["null","double"],"default":null}

          ,

          {"name":"id","type":["int","null"]}

          ,{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[

          {"name":"name","type":"bytes"}

          ,

          {"name":"validation_class","type":"string"}

          ,{"name":"index_type","type":[

          {"type":"enum","name":"IndexType","symbols":["KEYS","CUSTOM"],"aliases":["org.apache.cassandra.config.avro.IndexType"]}

          ,"null"]},

          {"name":"index_name","type":["string","null"]}

          ,{"name":"index_options","type":["null",

          {"type":"map","values":"string"}

          ],"default":null}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]},

          {"name":"row_cache_provider","type":["string","null"],"default":"org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider"}

          ,

          {"name":"key_alias","type":["null","bytes"],"default":null}

          ,

          {"name":"compaction_strategy","type":["null","string"],"default":null}

          ,{"name":"compaction_strategy_options","type":["null",

          {"type":"map","values":"string"}

          ],"default":null},{"name":"compression_options","type":["null",

          {"type":"map","values":"string"}

          ],"default":null},

          {"name":"bloom_filter_fp_chance","type":["double","null"]}

          ],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
          at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
          at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
          at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
          at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
          at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
          at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:192)
          at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:116)
          at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
          at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
          at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
          at org.apache.cassandra.io.SerDeUtils.deserialize(SerDeUtils.java:60)
          at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:98)
          at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:502)
          at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:179)
          at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:355)
          at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

          Show
          Radim Kolar added a comment - i compiled jars with this patch and cassandra do not boots an existing node Opening /var/lib/cassandra/data/system/Migrations-hc-109 (757635 bytes) INFO [SSTableBatchOpen:1] 2011-12-24 18:26:47,326 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/LocationInfo-hc-273 (647 bytes) INFO [SSTableBatchOpen:1] 2011-12-24 18:26:47,338 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-1 (275 bytes) INFO [SSTableBatchOpen:2] 2011-12-24 18:26:47,338 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-2 (85 bytes) INFO [main] 2011-12-24 18:26:47,396 DatabaseDescriptor.java (line 501) Loading schema version ad8d50b0-2cc3-11e1-0000-b1504fb874be ERROR [main] 2011-12-24 18:26:47,555 AbstractCassandraDaemon.java (line 372) Exception encountered during startup org.apache.avro.AvroTypeException: Found {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.db.migration.avro","fields":[ {"name":"keyspace","type":"string"} , {"name":"name","type":"string"} , {"name":"column_type","type":["string","null"]} , {"name":"comparator_type","type":["string","null"]} , {"name":"subcomparator_type","type":["string","null"]} , {"name":"comment","type":["string","null"]} , {"name":"row_cache_size","type":["double","null"]} , {"name":"key_cache_size","type":["double","null"]} , {"name":"read_repair_chance","type":["double","null"]} , {"name":"replicate_on_write","type":"boolean","default":false} , {"name":"gc_grace_seconds","type":["int","null"]} , {"name":"default_validation_class","type":["null","string"],"default":null} , {"name":"key_validation_class","type":["null","string"],"default":null} , {"name":"min_compaction_threshold","type":["null","int"],"default":null} , {"name":"max_compaction_threshold","type":["null","int"],"default":null} , {"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0} , {"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600} , {"name":"row_cache_keys_to_save","type":["null","int"],"default":null} , {"name":"merge_shards_chance","type":["null","double"],"default":null} , {"name":"id","type":["int","null"]} ,{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[ {"name":"name","type":"bytes"} , {"name":"validation_class","type":"string"} ,{"name":"index_type","type":[ {"type":"enum","name":"IndexType","symbols":["KEYS","CUSTOM"],"aliases":["org.apache.cassandra.config.avro.IndexType"]} ,"null"]}, {"name":"index_name","type":["string","null"]} ,{"name":"index_options","type":["null", {"type":"map","values":"string"} ],"default":null}]}},"null"]}, {"name":"row_cache_provider","type":["string","null"],"default":"org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider"} , {"name":"key_alias","type":["null","bytes"],"default":null} , {"name":"compaction_strategy","type":["null","string"],"default":null} ,{"name":"compaction_strategy_options","type":["null", {"type":"map","values":"string"} ],"default":null},{"name":"compression_options","type":["null", {"type":"map","values":"string"} ],"default":null}]}, expecting {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.db.migration.avro","fields":[ {"name":"keyspace","type":"string"} , {"name":"name","type":"string"} , {"name":"column_type","type":["string","null"]} , {"name":"comparator_type","type":["string","null"]} , {"name":"subcomparator_type","type":["string","null"]} , {"name":"comment","type":["string","null"]} , {"name":"row_cache_size","type":["double","null"]} , {"name":"key_cache_size","type":["double","null"]} , {"name":"read_repair_chance","type":["double","null"]} , {"name":"replicate_on_write","type":"boolean","default":false} , {"name":"gc_grace_seconds","type":["int","null"]} , {"name":"default_validation_class","type":["null","string"],"default":null} , {"name":"key_validation_class","type":["null","string"],"default":null} , {"name":"min_compaction_threshold","type":["null","int"],"default":null} , {"name":"max_compaction_threshold","type":["null","int"],"default":null} , {"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0} , {"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600} , {"name":"row_cache_keys_to_save","type":["null","int"],"default":null} , {"name":"merge_shards_chance","type":["null","double"],"default":null} , {"name":"id","type":["int","null"]} ,{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[ {"name":"name","type":"bytes"} , {"name":"validation_class","type":"string"} ,{"name":"index_type","type":[ {"type":"enum","name":"IndexType","symbols":["KEYS","CUSTOM"],"aliases":["org.apache.cassandra.config.avro.IndexType"]} ,"null"]}, {"name":"index_name","type":["string","null"]} ,{"name":"index_options","type":["null", {"type":"map","values":"string"} ],"default":null}],"aliases": ["org.apache.cassandra.config.avro.ColumnDef"] }},"null"]}, {"name":"row_cache_provider","type":["string","null"],"default":"org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider"} , {"name":"key_alias","type":["null","bytes"],"default":null} , {"name":"compaction_strategy","type":["null","string"],"default":null} ,{"name":"compaction_strategy_options","type":["null", {"type":"map","values":"string"} ],"default":null},{"name":"compression_options","type":["null", {"type":"map","values":"string"} ],"default":null}, {"name":"bloom_filter_fp_chance","type":["double","null"]} ],"aliases": ["org.apache.cassandra.config.avro.CfDef"] } at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:192) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:116) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105) at org.apache.cassandra.io.SerDeUtils.deserialize(SerDeUtils.java:60) at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:98) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:502) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:179) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:355) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)
          Hide
          Yuki Morishita added a comment -

          Radim,

          Thanks for the report. The problem is that the new bloom_filter_fp_chance in avro interface definition does not have proper default.
          I attached the patch to fix it.

          Show
          Yuki Morishita added a comment - Radim, Thanks for the report. The problem is that the new bloom_filter_fp_chance in avro interface definition does not have proper default. I attached the patch to fix it.
          Hide
          Jonathan Ellis added a comment -

          committed

          Show
          Jonathan Ellis added a comment - committed
          Hide
          Radim Kolar added a comment -

          FP ratio it is not displayed in output of cli: show schema, describe;

          Show
          Radim Kolar added a comment - FP ratio it is not displayed in output of cli: show schema, describe;
          Hide
          Ophir Radnitz added a comment -

          We've tried this patch with 1.0.6 with fp_ratio of 0.99 (if I get it correctly, after a major compaction and a single albeit large SSTable, bloom filter has very little effect). We've found that many records that were inserted counld not be fetched in a multiget_slice query. It seemed as if the bloom filters resulted in false negatives.

          By the way, the fix patch (0001-give-default-val-to-fp_chance.patch) works for the 1.1 branch but not for 1.0.

          Show
          Ophir Radnitz added a comment - We've tried this patch with 1.0.6 with fp_ratio of 0.99 (if I get it correctly, after a major compaction and a single albeit large SSTable, bloom filter has very little effect). We've found that many records that were inserted counld not be fetched in a multiget_slice query. It seemed as if the bloom filters resulted in false negatives . By the way, the fix patch (0001-give-default-val-to-fp_chance.patch) works for the 1.1 branch but not for 1.0.
          Hide
          Jonathan Ellis added a comment -

          the fix patch (0001-give-default-val-to-fp_chance.patch) works for the 1.1 branch but not for 1.0

          it's already applied to both. (Note that we've switched to git, the old svn repo is no longer maintained.)

          Show
          Jonathan Ellis added a comment - the fix patch (0001-give-default-val-to-fp_chance.patch) works for the 1.1 branch but not for 1.0 it's already applied to both. (Note that we've switched to git, the old svn repo is no longer maintained.)
          Hide
          Yuki Morishita added a comment -

          Patch attached so that cli show schema or describe commands show bloom_filter_fp_chance if set.

          Show
          Yuki Morishita added a comment - Patch attached so that cli show schema or describe commands show bloom_filter_fp_chance if set.
          Hide
          Jonathan Ellis added a comment -

          We've found that many records that were inserted counld not be fetched in a multiget_slice query. It seemed as if the bloom filters resulted in false negatives.

          I have trouble understanding how this could be the case, because if our BF could cause false negatives then surely we'd see that even at today's low default FP rates. This patch didn't change how the BF is used, only the parameters it's created with, nor does it try to retrofit the new BF parameters onto existing sstables.

          You did apply the v4 patch and not an earlier one, right?

          Show
          Jonathan Ellis added a comment - We've found that many records that were inserted counld not be fetched in a multiget_slice query. It seemed as if the bloom filters resulted in false negatives. I have trouble understanding how this could be the case, because if our BF could cause false negatives then surely we'd see that even at today's low default FP rates. This patch didn't change how the BF is used, only the parameters it's created with, nor does it try to retrofit the new BF parameters onto existing sstables. You did apply the v4 patch and not an earlier one, right?
          Hide
          Jonathan Ellis added a comment -

          Patch attached so that cli show schema or describe commands show bloom_filter_fp_chance if set.

          committed

          Show
          Jonathan Ellis added a comment - Patch attached so that cli show schema or describe commands show bloom_filter_fp_chance if set. committed
          Hide
          Ophir Radnitz added a comment -

          I actually applied the 'CASSANDRA-1.0-3497' patch, which I can see now is not the most updated one. We'll probably revisit this once 1.0.7 is out.

          Show
          Ophir Radnitz added a comment - I actually applied the ' CASSANDRA-1 .0-3497' patch, which I can see now is not the most updated one. We'll probably revisit this once 1.0.7 is out.
          Hide
          Jonathan Ellis added a comment -

          Makes sense, that's what I was referring to when I reviewed that patch and said "the BloomFilter#modify approach won't work." v4 / 1.0 branch should be fine.

          Show
          Jonathan Ellis added a comment - Makes sense, that's what I was referring to when I reviewed that patch and said "the BloomFilter#modify approach won't work." v4 / 1.0 branch should be fine.
          Hide
          Brandon Williams added a comment -

          Note for others trying to disable their BF: despite earlier discussion on this ticket, zero is NOT disabled, but instead sets it back to the default, since 0 false positives is invalid. You actually want to set it to 1 to have the smallest possible filter.

          Show
          Brandon Williams added a comment - Note for others trying to disable their BF: despite earlier discussion on this ticket, zero is NOT disabled, but instead sets it back to the default, since 0 false positives is invalid. You actually want to set it to 1 to have the smallest possible filter.

            People

            • Assignee:
              Yuki Morishita
              Reporter:
              Brandon Williams
              Reviewer:
              Jonathan Ellis
            • Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development