Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1023290)
+++ src/docbkx/book.xml (working copy)
@@ -33,6 +33,12 @@
+
+ The HBase Shell
+
+
+
+
Filesystem Format
@@ -750,4 +756,129 @@
+
+
+ Bloom Filters
+
+ Bloom filters were developed over in HBase-1200
+ Add bloomfilters.
+ For description of the development process -- why static blooms
+ rather than dynamic -- and for an overview of the unique properties
+ that pertain to blooms in HBase, as well as possible future
+ directions, see the Development Process section
+ of the document BloomFilters
+ in HBase attached to HBase-1200.
+
+ The bloom filters described here are actually version two of
+ blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
+ option based on work done by the European Commission One-Lab
+ Project 034819. The core of the HBase bloom work was later
+ pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
+ Version 1 of HBase blooms never worked that well. Version 2 is a
+ rewrite from scratch though again it starts with the one-lab
+ work.
+
+
+
+ Configurations
+
+ Blooms are enabled by specifying options on a column family in the
+ HBase shell or in
+
+
+ HColumnDescriptor option
+
+ Use HColumnDescriptor.setBloomFilterType(NONE | ROW |
+ ROWCOL) to enable blooms per Column Family. Default =
+ NONE for no bloom filters. If
+ ROW, the hash of the row will be added to the bloom
+ on each insert. If ROWCOL, the hash of the row +
+ column family + column family qualifier will be added to the bloom on
+ each key insert.
+
+
+
+ io.hfile.bloom.enabled global kill
+ switch
+
+ io.hfile.bloom.enabled in
+ Configuration serves as the kill switch in case
+ something goes wrong. Default = true.
+
+
+
+ io.hfile.bloom.error.rate
+
+ io.hfile.bloom.error.rate = average false
+ positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1
+ bit per bloom entry.
+
+
+
+ io.hfile.bloom.max.fold
+
+ io.hfile.bloom.max.fold = guaranteed minimum
+ fold rate. Most people should leave this alone. Default = 7, or can
+ collapse to at least 1/128th of original size. See the
+ Development Process section of the document BloomFilters
+ in HBase for more on what this option means.
+
+
+
+
+ Bloom StoreFile footprint
+
+ Bloom filters add an entry to the StoreFile
+ general FileInfo data structure and then two
+ extra entries to the StoreFile metadata
+ section.
+
+
+ BloomFilter in the StoreFile
+ FileInfo data structure
+
+
+ BLOOM_FILTER_TYPE
+
+ FileInfo has a
+ BLOOM_FILTER_TYPE entry which is set to
+ NONE, ROW or
+ ROWCOL.
+
+
+
+
+ BloomFilter entries in StoreFile
+ metadata
+
+
+ BLOOM_FILTER_META
+
+ BLOOM_FILTER_META holds Bloom Size, Hash
+ Function used, etc. Its small in size and is cached on
+ StoreFile.Reader load
+
+
+
+ BLOOM_FILTER_DATA
+
+ BLOOM_FILTER_DATA is the actual bloomfilter
+ data. Obtained on-demand. Stored in the LRU cache, if it is enabled
+ (Its enabled by default).
+
+
+
+
+
+
+ Tools
+
+ Here we list HBase tools for administration, analysis, fixup, and
+ debugging.
+