Index: src/docbkx/performance.xml =================================================================== --- src/docbkx/performance.xml (revision 1201939) +++ src/docbkx/performance.xml (working copy) @@ -140,10 +140,13 @@ The number of regions for an HBase table is driven by the . Also, see the architecture section on - A lower number of regions is preferred, generally in the range of 20 to 200 - per RegionServer. Adjust the regionsize as appropriate to achieve this number. There - are some clusters that set the regionsize to 20Gb, for example, so you may need to - experiment with this setting based on your hardware configuration and application needs. + A lower number of regions is preferred, generally in the range of 20 to low-hundreds + per RegionServer. Adjust the regionsize as appropriate to achieve this number. + + For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb. + For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb). + + You may need to experiment with this setting based on your hardware configuration and application needs. @@ -155,12 +158,6 @@ something you want to consider. -
- Compression - Production systems should use compression with their column family definitions. See for more information. - -
-
<varname>hbase.regionserver.handler.count</varname> See . @@ -218,7 +215,52 @@ Key and Attribute Lengths See .
- +
Table RegionSize + The regionsize can be set on a per-table basis via setFileSize on + HTableDescriptor in the + event where certain tables require different regionsizes than the configured default regionsize. + + See for more information. + +
+
+ Bloom Filters + Bloom Filters can be enabled per-ColumnFamily. + Use HColumnDescriptor.setBloomFilterType(NONE | ROW | + ROWCOL) to enable blooms per Column Family. Default = + NONE for no bloom filters. If + ROW, the hash of the row will be added to the bloom + on each insert. If ROWCOL, the hash of the row + + column family + column family qualifier will be added to the bloom on + each key insert. + See HColumnDescriptor and + for more information. + +
+
ColumnFamily BlockSize + The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. + There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting + indexes should be roughly halved). + + See HColumnDescriptor + and for more information. + +
+
+ In-Memory ColumnFamilies + ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. + In-memory blocks have the highest priority in the , but it is not a guarantee that the entire table + will be in memory. + + See HColumnDescriptor for more information. + +
+
+ Compression + Production systems should use compression with their ColumnFamily definitions. See for more information. + +
+
Writing to HBase Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1201939) +++ src/docbkx/book.xml (working copy) @@ -545,7 +545,8 @@ admin.enableTable(table); See for more information about configuring client connections. - + Note: online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table + to be disabled.
@@ -739,17 +740,6 @@
-
- - In-Memory ColumnFamilies - - ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. - In-memory blocks have the highest priority in the , but it is not a guarantee that the entire table - will be in memory. - - See HColumnDescriptor for more information. - -
Time To Live (TTL) ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. @@ -775,20 +765,6 @@ See HColumnDescriptor for more information.
-
- Bloom Filters - Bloom Filters can be enabled per-ColumnFamily. - Use HColumnDescriptor.setBloomFilterType(NONE | ROW | - ROWCOL) to enable blooms per Column Family. Default = - NONE for no bloom filters. If - ROW, the hash of the row will be added to the bloom - on each insert. If ROWCOL, the hash of the row + - column family + column family qualifier will be added to the bloom on - each key insert. - See HColumnDescriptor and - for more information. - -
Secondary Indexes and Alternate Query Paths @@ -874,6 +850,11 @@ </para> </section> </section> + <section xml:id="schema.ops"><title>Operational and Performance Configuration Options + See the Performance section for more information operational and performance + schema design options, such as Bloom Filters, Table-configured regionsizes, and blocksizes. + +