Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1023290) +++ src/docbkx/book.xml (working copy) @@ -33,6 +33,12 @@ + + The HBase Shell + + + + Filesystem Format @@ -750,4 +756,129 @@ + + + Bloom Filters + + Bloom filters were developed over in HBase-1200 + Add bloomfilters. + For description of the development process -- why static blooms + rather than dynamic -- and for an overview of the unique properties + that pertain to blooms in HBase, as well as possible future + directions, see the Development Process section + of the document BloomFilters + in HBase attached to HBase-1200. + + The bloom filters described here are actually version two of + blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom + option based on work done by the European Commission One-Lab + Project 034819. The core of the HBase bloom work was later + pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. + Version 1 of HBase blooms never worked that well. Version 2 is a + rewrite from scratch though again it starts with the one-lab + work. + + +
+ Configurations + + Blooms are enabled by specifying options on a column family in the + HBase shell or in + +
+ <code>HColumnDescriptor</code> option + + Use HColumnDescriptor.setBloomFilterType(NONE | ROW | + ROWCOL) to enable blooms per Column Family. Default = + NONE for no bloom filters. If + ROW, the hash of the row will be added to the bloom + on each insert. If ROWCOL, the hash of the row + + column family + column family qualifier will be added to the bloom on + each key insert. +
+ +
+ <varname>io.hfile.bloom.enabled</varname> global kill + switch + + io.hfile.bloom.enabled in + Configuration serves as the kill switch in case + something goes wrong. Default = true. +
+ +
+ <varname>io.hfile.bloom.error.rate</varname> + + io.hfile.bloom.error.rate = average false + positive rate. Default = 1%. Decrease rate by ½ (e.g. to .5%) == +1 + bit per bloom entry. +
+ +
+ <varname>io.hfile.bloom.max.fold</varname> + + io.hfile.bloom.max.fold = guaranteed minimum + fold rate. Most people should leave this alone. Default = 7, or can + collapse to at least 1/128th of original size. See the + Development Process section of the document BloomFilters + in HBase for more on what this option means. +
+
+ +
+ Bloom StoreFile footprint + + Bloom filters add an entry to the StoreFile + general FileInfo data structure and then two + extra entries to the StoreFile metadata + section. + +
+ BloomFilter in the <classname>StoreFile</classname> + <classname>FileInfo</classname> data structure + +
+ <varname>BLOOM_FILTER_TYPE</varname> + + FileInfo has a + BLOOM_FILTER_TYPE entry which is set to + NONE, ROW or + ROWCOL. +
+
+ +
+ BloomFilter entries in <classname>StoreFile</classname> + metadata + +
+ <varname>BLOOM_FILTER_META</varname> + + BLOOM_FILTER_META holds Bloom Size, Hash + Function used, etc. Its small in size and is cached on + StoreFile.Reader load +
+ +
+ <varname>BLOOM_FILTER_DATA</varname> + + BLOOM_FILTER_DATA is the actual bloomfilter + data. Obtained on-demand. Stored in the LRU cache, if it is enabled + (Its enabled by default). +
+
+
+
+ + + Tools + + Here we list HBase tools for administration, analysis, fixup, and + debugging. +