diff --git src/main/docbkx/performance.xml src/main/docbkx/performance.xml
index dad9b0c..04ca00c 100644
--- src/main/docbkx/performance.xml
+++ src/main/docbkx/performance.xml
@@ -295,18 +295,131 @@
Bloom Filters
- Bloom Filters can be enabled per-ColumnFamily. Use
- HColumnDescriptor.setBloomFilterType(NONE | ROW | ROWCOL) to enable blooms
- per Column Family. Default = NONE for no bloom filters. If
- ROW, the hash of the row will be added to the bloom on each insert. If
- ROWCOL, the hash of the row + column family name + column family
- qualifier will be added to the bloom on each key insert.
- See HColumnDescriptor
- and for more information or this answer up in quora, A Bloom filter, named for its creator, Burton Howard Bloom, is a data structure which is
+ designed to predict whether a given element is a member of a set of data. A positive result
+ from a Bloom filter is not always accurate, but a negative result is guaranteed to be
+ accurate. Bloom filters are designed to be "accurate enough" for sets of data which are so
+ large that conventional hashing mechanisms would be impractical. For more information about
+ Bloom filters in general, refer to .
+ In terms of HBase, Bloom filters provide a lightweight in-memory structure to reduce the
+ number of disk reads for a given Get operation (Bloom filters do not work with Scans) to only the StoreFiles likely to
+ contain the desired Row. The potential performance gain increases with the number of
+ parallel reads.
+ The Bloom filters themselves are stored in the metadata of each HFile and never need to
+ be updated. When an HFile is opened because a region is deployed to a RegionServer, the
+ Bloom filter is loaded into memory.
+ HBase includes some tuning mechanisms for folding the Bloom filter to reduce the size
+ and keep the false positive rate within a desired range.
+ Bloom filters were introduced in HBASE-1200. Since
+ HBase 0.96, row-based Bloom filters are enabled by default. (HBASE-)
+ For more information on Bloom filters in relation to HBase, see for more information, or the following Quora discussion: How are bloom
filters used in HBase?.
+
+
+ When To Use Bloom Filters
+ Since HBase 0.96, row-based Bloom filters are enabled by default. You may choose to
+ disable them or to change some tables to use row+column Bloom filters, depending on the
+ characteristics of your data and how it is loaded into HBase.
+
+ To determine whether Bloom filters could have a positive impact, check the value of
+ blockCacheHitRatio in the RegionServer metrics. If Bloom filters are enabled, the value of
+ blockCacheHitRatio should increase, because the Bloom filter is filtering out blocks that
+ are definitely not needed.
+ You can choose to enable Bloom filters for a row or for a row+column combination. If
+ you generally scan entire rows, the row+column combination will not provide any benefit. A
+ row-based Bloom filter can operate on a row+column Get, but not the other way around.
+ However, if you have a large number of column-level Puts, such that a row may be present
+ in every StoreFile, a row-based filter will always return a positive result and provide no
+ benefit. Unless you have one column per row, row+column Bloom filters require more space,
+ in order to store more keys. Bloom filters work best when the size of each data entry is
+ at least a few kilobytes in size.
+ Overhead will be reduced when your data is stored in a few larger StoreFiles, to avoid
+ extra disk IO during low-level scans to find a specific row.
+ Bloom filters need to be rebuilt upon deletion, so may not be appropriate in
+ environments with a large number of deletions.
+
+
+
+ Enabling Bloom Filters
+ Bloom filters are enabled on a Column Family. You can do this by using the
+ setBloomFilterType method of HColumnDescriptor or using the HBase API. Valid values are
+ NONE (the default), ROW, or
+ ROWCOL. See for more information on ROW versus
+ ROWCOL. See also the API documentation for HColumnDescriptor.
+ The following example creates a table and enables a ROWCOL Bloom filter on the
+ colfam1 column family.
+
+hbase> create 'mytable',{NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}
+
+
+
+
+ Configuring Server-Wide Behavior of Bloom Filters
+ You can configure the following settings in the hbase-site.xml.
+
+
+
+
+
+ Parameter
+ Default
+ Description
+
+
+
+
+ io.hfile.bloom.enabled
+ yes
+ Set to no to kill bloom filters server-wide if
+ something goes wrong
+
+
+ io.hfile.bloom.error.rate
+ .01
+ The average false positive rate for bloom filters. Folding is used to
+ maintain the false positive rate. Expressed as a decimal representation of a
+ percentage.
+
+
+ io.hfile.bloom.max.fold
+ 7
+ The guaranteed maximum fold rate. Changing this setting should not be
+ necessary and is not recommended.
+
+
+ io.storefile.bloom.max.keys
+ 128000000
+ For default (single-block) Bloom filters, this specifies the maximum
+ number of keys.
+
+
+ io.storefile.delete.family.bloom.enabled
+ true
+ Master switch to enable Delete Family Bloom filters and store them in
+ the StoreFile.
+
+
+ io.storefile.bloom.block.size
+ 65536
+ Target Bloom block size. Bloom filter blocks of approximately this size
+ are interleaved with data blocks.
+
+
+ hfile.block.bloom.cacheonwrite
+ false
+ Enables cache-on-write for inline blocks of a compound Bloom filter.
+
+
+
+
+