Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1242000)
+++ src/docbkx/book.xml (working copy)
@@ -283,7 +283,8 @@
HBase does not modify data in place, and so deletes are handled by creating new markers called tombstones.
These tombstones, along with the dead values, are cleaned up on major compactions.
- See for more information on deleting versions of columns.
+ See for more information on deleting versions of columns, and see
+ for more information on compactions.
@@ -588,10 +589,10 @@
HBase currently does not do well with anything above two or three column families so keep the number
of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so
if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
- will also be flushed though the amount of data they carry is small. Compaction is currently triggered
- by the total number of files under a column family. Its not size based. When many column families the
+ will also be flushed though the amount of data they carry is small. When many column families the
flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
- changing flushing and compaction to work on a per column family basis).
+ changing flushing and compaction to work on a per column family basis). For more information
+ on compactions, see .
Try to make do with one column family if you can in your schemas. Only introduce a
second and third column family in the case where data access is usually column scoped;
@@ -2136,16 +2137,133 @@
CompactionThere are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent
- files and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction
- will pick up all the files in the store and in this case it actually promotes itself to being a major compaction.
- For a description of how a minor compaction picks files to compact, see the ascii diagram in the Store source code.
+ StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction
+ will pick up all the StoreFiles in the Store and in this case it actually promotes itself to being a major compaction.
- After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable;
+ After a major compaction runs there will be a single StoreFile per Store, and this will help performance usually. Caution: major compactions rewrite all of the Stores data and on a loaded system, this may not be tenable;
major compactions will usually have to be done manually on large systems. See .
Compactions will not perform region merges. See for more information on region merging.
-
+
+ Compaction File Selection
+ To understand the core algorithm for StoreFile selection, there is some ASCII-art in the Store source code that
+ will serve as useful reference. It has been copied below:
+
+/* normal skew:
+ *
+ * older ----> newer
+ * _
+ * | | _
+ * | | | | _
+ * --|-|- |-|- |-|---_-------_------- minCompactSize
+ * | | | | | | | | _ | |
+ * | | | | | | | | | | | |
+ * | | | | | | | | | | | |
+ */
+
+ Important knobs:
+
+ hbase.store.compaction.ratio Ratio used in compaction
+ file selection algorithm. (default 1.2F)
+ hbase.hstore.compaction.min (.90 hbase.hstore.compactionThreshold) (files) Minimum number
+ of StoreFiles per Store to be selected for a compaction to occur.
+ hbase.hstore.compaction.max (files) Maximum number of StoreFiles to compact per minor compaction.
+ hbase.hstore.compaction.min.size (bytes)
+ Any StoreFile smaller than this setting with automatically be a candidate for compaction. Defaults to
+ regions' memstore flush size (134 mb).
+ hbase.hstore.compaction.max.size (.92) (bytes)
+ Any StoreFile larger than this setting with automatically be excluded from compaction.
+
+
+ The minor compaction StoreFile selection logic is size based, and selects a file for compaction when the file
+ <= sum(smaller_files) * hbase.hstore.compaction.ratio.
+
+
+
+ Minor Compaction File Selection - Example #1 (Basic Example)
+ This example mirrors an example from the unit test TestCompactSelection.
+
+ hbase.store.compaction.ratio = 1.0F
+ hbase.hstore.compaction.min = 3 (files) >
+ hbase.hstore.compaction.max = 5 (files) >
+ hbase.hstore.compaction.min.size = 10 (bytes) >
+ hbase.hstore.compaction.max.size = 1000 (bytes) >
+
+ The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest).
+ With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
+
+ Why?
+
+ 100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
+ 50 --> No, because sum(23, 12, 12) * 1.0 = 47.
+ 23 --> Yes, because sum(12, 12) * 1.0 = 24.
+ 12 --> Yes, because sum(12) * 1.0 = 12.
+ 12 --> Yes, because the previous file had been included, and this is included because this
+ does not exceed the the max-file limit of 5.
+
+
+
+
+ Minor Compaction File Selection - Example #2 (Not Enough Files To Compact)
+ This example mirrors an example from the unit test TestCompactSelection.
+
+ hbase.store.compaction.ratio = 1.0F
+ hbase.hstore.compaction.min = 3 (files) >
+ hbase.hstore.compaction.max = 5 (files) >
+ hbase.hstore.compaction.min.size = 10 (bytes) >
+ hbase.hstore.compaction.max.size = 1000 (bytes) >
+
+
+ The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to newest).
+ With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
+
+ Why?
+
+ 100 --> No, because sum(25, 12, 12) * 1.0 = 47
+ 25 --> No, because sum(12, 12) * 1.0 = 24
+ 12 --> No. Candidate because sum(12) * 1.0 = 12, there are only 2 files to compact and that is less than the threshold of 3
+ 12 --> No. Candidate because the previous StoreFile was, but there are not enough files to compact
+
+
+
+
+ Minor Compaction File Selection - Example #3 (Limiting Files To Compact)
+ This example mirrors an example from the unit test TestCompactSelection.
+
+ hbase.store.compaction.ratio = 1.0F
+ hbase.hstore.compaction.min = 3 (files) >
+ hbase.hstore.compaction.max = 5 (files) >
+ hbase.hstore.compaction.min.size = 10 (bytes) >
+ hbase.hstore.compaction.max.size = 1000 (bytes) >
+
+ The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece (oldest to newest).
+ With the above parameters, the files that would be selected for minor compaction are 7, 6, 5, 4, 3.
+
+ Why?
+
+ 7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21. Also, 7 is less than the min-size
+ 6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15. Also, 6 is less than the min-size.
+ 5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10. Also, 5 is less than the min-size.
+ 4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4 is less than the min-size.
+ 3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is less than the min-size.
+ 2 --> No. Also, 2 is less than the min-size, the max-number of files to compact has been reached.
+ 1 --> No. Also, 1 is less than the min-size, the max-number of files to compact has been reached.
+
+
+
+
+ Impact of Key Configuration Options
+ hbase.store.compaction.ratio. A large ratio (e.g., 10F) will produce a single giant file. Conversely, a value of .25F will
+ produce behavior similar to the BigTable compaction algorithm - resulting in 4 StoreFiles.
+
+ hbase.hstore.compaction.min.size. This defaults to hbase.hregion.memstore.flush.size (134 mb). Because
+ this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to
+ be adjusted downwards in write-heavy environments where many 1 or 2 mb StoreFiles are being flushed, because every file
+ will be targeted for compaction, and the resulting files may still be under the min-size and require further compaction, etc.
+
+
+
Index: src/docbkx/configuration.xml
===================================================================
--- src/docbkx/configuration.xml (revision 1242000)
+++ src/docbkx/configuration.xml (working copy)
@@ -1569,6 +1569,7 @@
they occur. They can be administered through the HBase shell, or via
HBaseAdmin.
+ For more information about compactions and the compaction file selection process, see