diff --git hbase-common/src/main/resources/hbase-default.xml hbase-common/src/main/resources/hbase-default.xml
index a31e53d..2c3a4b5 100644
--- hbase-common/src/main/resources/hbase-default.xml
+++ hbase-common/src/main/resources/hbase-default.xml
@@ -605,77 +605,154 @@ possible configurations would overwhelm and obscure the important.
hbase.hregion.max.filesize10737418240
- Maximum HStoreFile size. If any one of a column families' HStoreFiles has
- grown to exceed this value, the hosting HRegion is split in two.
+ Maximum HFile size. If the sum of the sizes of a region's HFiles has grown to exceed this
+ value, the region is split in two.
hbase.hregion.majorcompaction604800000
- The time (in miliseconds) between 'major' compactions of all
- HStoreFiles in a region. Default: Set to 7 days. Major compactions tend to
- happen exactly when you need them least so enable them such that they run at
- off-peak for your deploy; or, since this setting is on a periodicity that is
- unlikely to match your loading, run the compactions via an external
- invocation out of a cron job or some such.
+ Time between major compactions, expressed in milliseconds. Set to 0 to disable
+ time-based automatic major compactions. User-requested and size-based major compactions will
+ still run. This value is multiplied by hbase.hregion.majorcompaction.jitter to cause
+ compaction to start at a somewhat-random time during a given window of time. The default value
+ is 7 days, expressed in milliseconds. If major compactions are causing disruption in your
+ environment, you can configure them to run at off-peak times for your deployment, or disable
+ time-based major compactions by setting this parameter to 0, and run major compactions in a
+ cron job or by another external mechanism.hbase.hregion.majorcompaction.jitter0.50
- Jitter outer bound for major compactions.
- On each regionserver, we multiply the hbase.region.majorcompaction
- interval by some random fraction that is inside the bounds of this
- maximum. We then add this + or - product to when the next
- major compaction is to run. The idea is that major compaction
- does happen on every regionserver at exactly the same time. The
- smaller this number, the closer the compactions come together.
+ A multiplier applied to hbase.hregion.majorcompaction to cause compaction to occur
+ a given amount of time either side of hbase.hregion.majorcompaction. The smaller the number,
+ the closer the compactions will happen to the hbase.hregion.majorcompaction
+ interval.hbase.hstore.compactionThreshold3
-
- If more than this number of HStoreFiles in any one HStore
- (one HStoreFile is written per flush of memstore) then a compaction
- is run to rewrite all HStoreFiles files as one. Larger numbers
- put off compaction but when it runs, it takes longer to complete.
+ If more than this number of StoreFiles exist in any one Store
+ (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all
+ StoreFiles into a single StoreFile. Larger values delay compaction, but when compaction does
+ occur, it takes longer to complete.hbase.hstore.flusher.count2
-
- The number of flush threads. With less threads, the memstore flushes will be queued. With
- more threads, the flush will be executed in parallel, increasing the hdfs load. This can
- lead as well to more compactions.
-
+ The number of flush threads. With fewer threads, the MemStore flushes will be
+ queued. With more threads, the flushes will be executed in parallel, increasing the load on
+ HDFS, and potentially causing more compactions. hbase.hstore.blockingStoreFiles10
-
- If more than this number of StoreFiles in any one Store
- (one StoreFile is written per flush of MemStore) then updates are
- blocked for this HRegion until a compaction is completed, or
- until hbase.hstore.blockingWaitTime has been exceeded.
+ If more than this number of StoreFiles exist in any one Store (one StoreFile
+ is written per flush of MemStore), updates are blocked for this region until a compaction is
+ completed, or until hbase.hstore.blockingWaitTime has been exceeded.hbase.hstore.blockingWaitTime90000
-
- The time an HRegion will block updates for after hitting the StoreFile
- limit defined by hbase.hstore.blockingStoreFiles.
- After this time has elapsed, the HRegion will stop blocking updates even
- if a compaction has not been completed.
+ The time for which a region will block updates after reaching the StoreFile limit
+ defined by hbase.hstore.blockingStoreFiles. After this time has elapsed, the region will stop
+ blocking updates even if a compaction has not been completed.
+
+
+ hbase.hstore.compaction.min
+ 3
+ The minimum number of StoreFiles which must be eligible for compaction before
+ compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid ending up with
+ too many tiny StoreFiles to compact. Setting this value to 2 would cause a minor compaction
+ each time you have two StoreFiles in a Store, and this is probably not appropriate. If you
+ set this value too high, all the other values will need to be adjusted accordingly. For most
+ cases, the default value is appropriate. In previous versions of HBase, the parameter
+ hbase.hstore.compaction.min was named hbase.hstore.compactionThreshold.hbase.hstore.compaction.max10
- Max number of HStoreFiles to compact per 'minor' compaction.
+ The maximum number of StoreFiles which will be selected for a single minor
+ compaction, regardless of the number of eligible StoreFiles. Effectively, the value of
+ hbase.hstore.compaction.max controls the length of time it takes a single compaction to
+ complete. Setting it larger means that more StoreFiles are included in a compaction. For most
+ cases, the default value is appropriate.
+
+
+ hbase.hstore.compaction.min.size
+ 134217728
+ A StoreFile smaller than this size will always be eligible for minor compaction.
+ HFiles this size or larger are evaluated by hbase.store.compaction.ratio to determine if
+ they are eligible. Because this limit represents the "automatic include"limit for all
+ StoreFiles smaller than this value, this value may need to be reduced in write-heavy
+ environments where many StoreFiles in the 1-2 MB range are being flushed, because every
+ StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the
+ minimum size and require further compaction. If this parameter is lowered, the ratio check is
+ triggered more quickly. This addressed some issues seen in earlier versions of HBase but
+ changing this parameter is no longer necessary in most situations. Default: 128 MB expressed
+ in bytes.
+
+
+ hbase.hstore.compaction.max.size
+ 9223372036854775807
+ A StoreFile larger than this size will be excluded from compaction. The effect of
+ raising hbase.hstore.compaction.max.size is fewer, larger StoreFiles that do not get
+ compacted often. If you feel that compaction is happening too often without much benefit, you
+ can try raising this value. Default: the value of LONG.MAX_VALUE, expressed in bytes.
+
+
+ hbase.hstore.compaction.ratio
+ 1.2F
+ For minor compaction, this ratio is used to determine whether a given StoreFile
+ which is larger than hbase.hstore.compaction.min.size is eligible for compaction. Its
+ effect is to limit compaction of large StoreFiles. The value of hbase.hstore.compaction.ratio
+ is expressed as a floating-point decimal. A large ratio, such as 10, will produce a single
+ giant StoreFile. Conversely, a low value, such as .25, will produce behavior similar to the
+ BigTable compaction algorithm, producing four StoreFiles. A moderate value of between 1.0 and
+ 1.4 is recommended. When tuning this value, you are balancing write costs with read costs.
+ Raising the value (to something like 1.4) will have more write costs, because you will
+ compact larger StoreFiles. However, during reads, HBase will need to seek through fewer
+ StoreFiles to accomplish the read. Consider this approach if you cannot take advantage of
+ Bloom filters. Otherwise, you can lower this value to something like 1.0 to reduce the
+ background cost of writes, and use Bloom filters to control the number of StoreFiles touched
+ during reads. For most cases, the default value is appropriate.
+
+
+ hbase.hstore.compaction.ratio.offpeak
+ 5.0F
+ Allows you to set a different (by default, more aggressive) ratio for determining
+ whether larger StoreFiles are included in compactions during off-peak hours. Works in the
+ same way as hbase.hstore.compaction.ratio. Only applies if hbase.offpeak.start.hour and
+ hbase.offpeak.end.hour are also enabled.
+
+
+ hbase.offpeak.start.hour
+ -1
+ The start of off-peak hours, expressed as an integer between 0 and 23, inclusive.
+ Set to -1 to disable off-peak.
+
+
+ hbase.offpeak.end.hour
+ -1
+ The end of off-peak hours, expressed as an integer between 0 and 23, inclusive. Set
+ to -1 to disable off-peak.
+
+
+ hbase.regionserver.thread.compaction.throttle
+ 2560
+ There are two different thread pools for compactions, one for large compactions and
+ the other for small compactions. This helps to keep compaction of lean tables (such as
+ hbase:meta) fast. If a compaction is larger than this threshold, it
+ goes into the large compaction pool. In most cases, the default value is appropriate. Default:
+ 2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size (which defaults to 128).
+ The value field assumes that the value of hbase.hregion.memstore.flush.size is unchanged from
+ the default.hbase.hstore.compaction.kv.max10
- How many KeyValues to read and then write in a batch when flushing
- or compacting. Do less if big KeyValues and problems with OOME.
- Do more if wide, small rows.
+ The maximum number of KeyValues to read and then write in a batch when flushing or
+ compacting. Set this lower if you have big KeyValues and problems with Out Of Memory
+ Exceptions Set this higher if you have wide, small rows. hbase.storescanner.parallel.seek.enable
@@ -694,7 +771,7 @@ possible configurations would overwhelm and obscure the important.
hfile.block.cache.size0.4Percentage of maximum heap (-Xmx setting) to allocate to block cache
- used by HFile/StoreFile. Default of 0.4 means allocate 40%.
+ used by a StoreFile. Default of 0.4 means allocate 40%.
Set to 0 to disable but it's not recommended; you need at least
enough cache to hold the storefile indices.
@@ -1039,7 +1116,7 @@ possible configurations would overwhelm and obscure the important.
hbase.snapshot.restore.failsafe.namehbase-failsafe-{snapshot.name}-{restore.timestamp}
- Name of the failsafe snapshot taken by the restore operation.
+ Name of the failsafe snapshot taken by the restore opecompn.
You can use the {snapshot.name}, {table.name} and {restore.timestamp} variables
to create a name based on what you are restoring.
diff --git src/main/docbkx/book.xml src/main/docbkx/book.xml
index 6c4c9ef..277d9ec 100644
--- src/main/docbkx/book.xml
+++ src/main/docbkx/book.xml
@@ -2013,7 +2013,7 @@ rs.close();
again, it upgrades to this priority. It is thus part of the second group considered
during evictions.
-
+ In-memory access priority: If the block's family was configured to be
"in-memory", it will be part of this priority disregarding the number of times it
was accessed. Catalog tables are configured like this. This group is the last one
@@ -2166,7 +2166,7 @@ rs.close();
Enable BucketCache To enable BucketCache, set the value of
hbase.offheapcache.percentage to 0 in the RegionServer's
- hbase-site.xml file. This disables SlabCache.
+ hbase-site.xml file. This disables SlabCache.Just as for SlabCache, the usual deploy of BucketCache is via a
managing class that sets up two caching tiers: an L1 onheap cache
@@ -2177,10 +2177,10 @@ rs.close();
by keeping meta blocks -- INDEX and BLOOM in the L1, onheap LruBlockCache tier -- and DATA
blocks are kept in the L2, BucketCache tier. It is possible to amend this behavior in
HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted onheap in the L1 tier by
- setting cacheDataInL1 via (HColumnDescriptor.setCacheDataInL1(true)
+ setting cacheDataInL1 via
+ (HColumnDescriptor.setCacheDataInL1(true)
or in the shell, creating or amending column families setting CACHE_DATA_IN_L1
to true: e.g. hbase(main):003:0> create 't', {NAME => 't', CONFIGURATION => {CACHE_DATA_IN_L1 => 'true'}}
- The BucketCache deploy can be
onheap, offheap, or file based. You set which via the
hbase.bucketcache.ioengine setting it to
@@ -3012,307 +3012,514 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
- Compaction
- Compaction is an operation which reduces the number of
- StoreFiles, by merging them together, in order to increase performance on read
- operations. Compactions can be resource-intensive to perform, and can either help or
- hinder performance depending on many factors.
- Compactions fall into two categories: minor and major.
- Minor compactions usually pick up a small number of small,
- adjacent StoreFiles and rewrite them as a single
- StoreFile. Minor compactions do not drop deletes or expired
- cells. If a minor compaction picks up all the StoreFiles in a
- Store, it promotes itself from a minor to a major compaction.
- If there are a lot of small files to be compacted, the algorithm tends to favor minor
- compactions to "clean up" those small files.
- The goal of a major compaction is to end up with a single
- StoreFile per store. Major compactions also process delete markers and max versions.
- Attempting to process these during a minor compaction could cause side effects.
-
-
+
+ Ambiguous Terminology
+ A StoreFile is a facade of HFile. In terms of compaction, use of
+ StoreFile seems to have prevailed in the past.
+ A Store is the same thing as a ColumnFamily.
+ StoreFiles are related to a Store, or ColumnFamily.
+
+ If you want to read more about StoreFiles versus HFiles and Stores versus
+ ColumnFamilies, see HBASE-11316.
+
+
+ When the MemStore reaches a given size
+ (hbase.hregion.memstore.flush.size), it flushes its contents to a
+ StoreFile. The number of StoreFiles in a Store increases over time.
+ Compaction is an operation which reduces the number of
+ StoreFiles in a Store, by merging them together, in order to increase performance on
+ read operations. Compactions can be resource-intensive to perform, and can either help
+ or hinder performance depending on many factors.
+ Compactions fall into two categories: minor and major. Minor and major compactions
+ differ in the following ways.
+ Minor compactions usually select a small number of small,
+ adjacent StoreFiles and rewrite them as a single StoreFile. Minor compactions do not
+ drop (filter out) deletes or expired versions, because of potential side effects. See and for information on how deletes and versions are
+ handled in relation to compactions. The end result of a minor compaction is fewer,
+ larger StoreFiles for a given Store.
+ The end result of a major compaction is a single StoreFile
+ per Store. Major compactions also process delete markers and max versions. See and for information on how deletes and versions are
+ handled in relation to compactions.
+
+ Compaction and Deletions When an explicit deletion occurs in HBase, the data is not actually deleted.
Instead, a tombstone marker is written. The tombstone marker
prevents the data from being returned with queries. During a major compaction, the
data is actually deleted, and the tombstone marker is removed from the StoreFile. If
- the deletion happens because of an expired TTL, no tombstone is created. Instead, the
- expired data is filtered out and is not written back to the compacted StoreFile.
+ the deletion happens because of an expired TTL, no tombstone is created. Instead, the
+ expired data is filtered out and is not written back to the compacted
+ StoreFile.
-
-
+
+ Compaction and Versions
- When you create a column family, you can specify the maximum number of versions
+ When you create a Column Family, you can specify the maximum number of versions
to keep, by specifying HColumnDescriptor.setMaxVersions(int
versions). The default value is 3. If more versions
than the specified maximum exist, the excess versions are filtered out and not written
- back to the compacted StoreFile.
+ back to the compacted StoreFile.
-
+
Major Compactions Can Impact Query Results
- In some situations, older versions can be inadvertently
- resurrected if a newer version is explicitly deleted. See for a more in-depth explanation. This
- situation is only possible before the compaction finishes.
-
+ In some situations, older versions can be inadvertently resurrected if a newer
+ version is explicitly deleted. See for a more in-depth explanation.
+ This situation is only possible before the compaction finishes.
-
+
In theory, major compactions improve performance. However, on a highly loaded
system, major compactions can require an inappropriate number of resources and adversely
affect performance. In a default configuration, major compactions are scheduled
- automatically to run once in a 7-day period. This is usually inappropriate for systems
+ automatically to run once in a 7-day period. This is sometimes inappropriate for systems
in production. You can manage major compactions manually. See . Compactions do not perform region merges. See for more information on region merging.
+ linkend="ops.regionmgt.merge" /> for more information on region merging.
- Algorithm for Compaction File Selection - HBase 0.96.x and newer
- The compaction algorithms used by HBase have evolved over time. HBase 0.96
- introduced new algorithms for compaction file selection. To find out about the old
- algorithms, see . The rest of this section describes the new algorithm. File
- selection happens in several phases and is controlled by several configurable
- parameters. These parameters will be explained in context, and then will be given in a
- table which shows their descriptions, defaults, and implications of changing
- them.
-
-
- TheExploringCompaction Policy
- HBASE-7842
- was introduced in HBase 0.96 and represents a major change in the algorithms for
- file selection for compactions. Its goal is to do the most impactful compaction with
- the lowest cost, in situations where a lot of files need compaction. In such a
- situation, the list of all eligible files is "explored", and files are grouped by
- size before any ratio-based algorithms are run. This favors clean-up of large
- numbers of small files before larger files are considered. For more details, refer
- to the link to the JIRA. Most of the code for this change can be reviewed in
- hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java.
-
-
-
- Algorithms for Determining File List and Compaction Type
-
- Create a list of all files which can possibly be compacted, ordered by
- sequence ID.
-
+ Compaction Policy - HBase 0.96.x and newer
+ Compacting large StoreFiles, or too many StoreFiles at once, can cause more IO
+ load than your cluster is able to handle without causing performance problems. The
+ method by which HBase selects which StoreFiles to include in a compaction (and whether
+ the compaction is a minor or major compaction) is called the compaction
+ policy.
+ Prior to HBase 0.96.x, there was only one compaction policy. That original
+ compaction policy is still available as
+ RatioBasedCompactionPolicy The new compaction default
+ policy, called ExploringCompactionPolicy, was subsequently
+ backported to HBase 0.94 and HBase 0.95, and is the default in HBase 0.96 and newer.
+ It was implemented in HBASE-7842. In
+ short, ExploringCompactionPolicy attempts to select the best
+ possible set of StoreFiles to compact with the least amount of work, while the
+ RatioBasedCompactionPolicy selects the first set that meets
+ the criteria.
+ Regardless of the compaction policy used, file selection is controlled by several
+ configurable parameters and happens in a multi-step approach. These parameters will be
+ explained in context, and then will be given in a table which shows their
+ descriptions, defaults, and implications of changing them.
+
+
+ Being Stuck
+ When the MemStore gets too large, it needs to flush its contents to a StoreFile.
+ However, a Store can only have hbase.hstore.blockingStoreFiles
+ files, so the MemStore needs to wait for the number of StoreFiles to be reduced by
+ one or more compactions. However, if the MemStore grows larger than
+ hbase.hregion.memstore.flush.size, it is not able to flush its
+ contents to a StoreFile. If the MemStore is too large and the number of StpreFo;es
+ is also too high, the algorithm is said to be "stuck". The compaction algorithm
+ checks for this "stuck" situation and provides mechanisms to alleviate it.
+
+
+
+ The ExploringCompactionPolicy Algorithm
+ The ExploringCompactionPolicy algorithm considers each possible set of
+ adjacent StoreFiles before choosing the set where compaction will have the most
+ benefit.
+ One situation where the ExploringCompactionPolicy works especially well is when
+ you are bulk-loading data and the bulk loads create larger StoreFiles than the
+ StoreFiles which are holding data older than the bulk-loaded data. This can "trick"
+ HBase into choosing to perform a major compaction each time a compaction is needed,
+ and cause a lot of extra overhead. With the ExploringCompactionPolicy, major
+ compactions happen much less frequently because minor compactions are more
+ efficient.
+ In general, ExploringCompactionPolicy is the right choice for most situations,
+ and thus is the default compaction policy. You can also use
+ ExploringCompactionPolicy along with .
+ The logic of this policy can be examined in
+ hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/ExploringCompactionPolicy.java.
+ The following is a walk-through of the logic of the
+ ExploringCompactionPolicy.
+
+
+ Make a list of all existing StoreFiles in the Store. The rest of the
+ algorithm filters this list to come up with the subset of HFiles which will be
+ chosen for compaction.
+
+
+ If this was a user-requested compaction, attempt to perform the requested
+ compaction type, regardless of what would normally be chosen. Note that even if
+ the user requests a major compaction, it may not be possible to perform a major
+ compaction. This may be because not all StoreFiles in the Column Family are
+ available to compact or because there are too many Stores in the Column
+ Family.
+
+
+ Some StoreFiles are automatically excluded from consideration. These
+ include:
+
+
+ StoreFiles that are larger than
+ hbase.hstore.compaction.max.size
+
+
+ StoreFiles that were created by a bulk-load operation which explicitly
+ excluded compaction. You may decide to exclude StoreFiles resulting from
+ bulk loads, from compaction. To do this, specify the
+ hbase.mapreduce.hfileoutputformat.compaction.exclude
+ parameter during the bulk load operation.
+
+
+
+
+ Iterate through the list from step 1, and make a list of all potential sets
+ of StoreFiles to compact together. A potential set is a grouping of
+ hbase.hstore.compaction.min contiguous StoreFiles in the
+ list. For each set, perform some sanity-checking and figure out whether this is
+ the best compaction that could be done:
+
+
+ If the number of StoreFiles in this set (not the size of the StoreFiles)
+ is fewer than hbase.hstore.compaction.min or more than
+ hbase.hstore.compaction.max, take it out of
+ consideration.
+
+
+ Compare the size of this set of StoreFiles with the size of the smallest
+ possible compaction that has been found in the list so far. If the size of
+ this set of StoreFiles represents the smallest compaction that could be
+ done, store it to be used as a fall-back if the algorithm is "stuck" and no
+ StoreFiles would otherwise be chosen. See .
+
+
+ Do size-based sanity checks against each StoreFile in this set of
+ StoreFiles.
+
+
+ If the size of this StoreFile is larger than
+ hbase.hstore.compaction.max.size, take it out of
+ consideration.
+
+
+ If the size is greater than or equal to
+ hbase.hstore.compaction.min.size, sanity-check it
+ against the file-based ratio to see whether it is too large to be
+ considered. The sanity-checking is successful if:
+
+
+ There is only one StoreFile in this set, or
+
+
+ For each StoreFile, its size multiplied by
+ hbase.store.compaction.ratio (or
+ hbase.hstore.compaction.ratio.offpeak if
+ off-peak hours are configured and it is during off-peak hours) is
+ less than the sum of the sizes of the other HFiles in the
+ set.
+
+
+
+
+
+
+
+
+ If this set of StoreFiles is still in consideration, compare it to the
+ previously-selected best compaction. If it is better, replace the
+ previously-selected best compaction with this one.
+
+
+ When the entire list of potential compactions has been processed, perform
+ the best compaction that was found. If no StoreFiles were selected for
+ compaction, but there are multiple StoreFiles, assume the algorithm is stuck
+ (see ) and if so, perform the smallest
+ compaction that was found in step 3.
+
+
+
+
+
+ RatioBasedCompactionPolicy Algorithm
+ The RatioBasedCompactionPolicy was the only compaction policy prior to HBase
+ 0.96, though ExploringCompactionPolicy has now been backported to HBase 0.94 and
+ 0.95. To use the RatioBasedCompactionPolicy rather than the
+ ExploringCompactionPolicy, set
+ hbase.hstore.defaultengine.compactionpolicy.class to
+ RatioBasedCompactionPolicy in the
+ hbase-site.xml file. To switch back to the
+ ExploringCompactionPolicy, remove the setting from the
+ hbase-site.xml.
+ The following section walks you through the algorithm used to select StoreFiles
+ for compaction in the RatioBasedCompactionPolicy.
+
+ The first phase is to create a list of all candidates for compaction. A list
- is created of all StoreFiles not already in the compaction queue, and all files
- newer than the newest file that is currently being compacted. This list of files
- is ordered by the sequence ID. The sequence ID is generated when a Put is
- appended to the write-ahead log (WAL), and is stored in the metadata of the
- StoreFile.
-
-
-
- Check to see if major compaction is required because there are too many
- StoreFiles and the memstore is too large.
-
- A store can only have hbase.hstore.blockingStoreFiles. If
- the store has too many files, you cannot flush data. In addition, you cannot
- perform an insert if the memstore is over
- hbase.hregion.memstore.flush.size. Normally, minor
- compactions will alleviate this situation. However, if the normal compaction
- algorithm do not find any normally-eligible StoreFiles, a major compaction is
- the only way to get out of this situation, and is forced.
- If you are using the ExploringCompaction policy, the set of files to
- compact is always selected, and will not trigger a major compaction. See .
-
-
-
- If this compaction was user-requested, perform the requested type of compaction.
-
- Compactions can run on a schedule or can be initiated manually. If a
- compaction is requested manually, HBase always runs that type of compaction. If the
- user requests a major compaction, the major compaction still runs even if the are
- more than hbase.hstore.compaction.max files that need
- compaction.
-
-
-
- Exclude files which are too large.
-
- The purpose of compaction is to merge small files together, and it is
- counterproductive to compact files which are too large. Files larger than
- hbase.hstore.compaction.max.size are excluded from
- consideration.
-
-
-
- If configured, exclude bulk-loaded files.
-
- You may decide to exclude bulk-loaded files from compaction, in the bulk
- load operation, by specifying the
- hbase.mapreduce.hfileoutputformat.compaction.exclude
- parameter. If a bulk-loaded file was excluded, it is removed from
- consideration at this point.
-
-
-
- If there are too many files to compact, do a minor compaction.
-
- The maximum number of files allowed in a major compaction is controlled by
- the hbase.hstore.compaction.max parameter. If the list
- contains more than this number of files, a compaction that would otherwise be a
- major compaction is downgraded to a minor compaction. However, a user-requested
- major compaction still occurs even if there are more than
- hbase.hstore.compaction.max files to compact.
-
-
-
- Only run the compaction if enough files need to be compacted.
-
+ is created of all StoreFiles not already in the compaction queue, and all
+ StoreFiles newer than the newest file that is currently being compacted. This
+ list of StoreFiles is ordered by the sequence ID. The sequence ID is generated
+ when a Put is appended to the write-ahead log (WAL), and is stored in the
+ metadata of the HFile.
+
+
+ Check to see if the algorithm is stuck (see , and if so, a major compaction is forced.
+ This is a key area where is often a better choice than the
+ RatioBasedCompactionPolicy.
+
+
+ If the compaction was user-requested, try to perform the type of compaction
+ that was requested. Note that a major compaction may not be possible if all
+ HFiles are not available for compaction or if too may StoreFiles exist (more
+ than hbase.hstore.compaction.max).
+
+
+ Some StoreFiles are automatically excluded from consideration. These
+ include:
+
+
+ StoreFiles that are larger than
+ hbase.hstore.compaction.max.size
+
+
+ StoreFiles that were created by a bulk-load operation which explicitly
+ excluded compaction. You may decide to exclude StoreFiles resulting from
+ bulk loads, from compaction. To do this, specify the
+ hbase.mapreduce.hfileoutputformat.compaction.exclude
+ parameter during the bulk load operation.
+
+
+
+
+ The maximum number of StoreFiles allowed in a major compaction is controlled
+ by the hbase.hstore.compaction.max parameter. If the list
+ contains more than this number of StoreFiles, a minor compaction is performed
+ even if a major compaction would otherwise have been done. However, a
+ user-requested major compaction still occurs even if there are more than
+ hbase.hstore.compaction.max StoreFiles to compact.
+
+ If the list contains fewer than
- hbase.hstore.compaction.min files to compact, compaction is
- aborted.
-
-
-
- If this is a minor compaction, determine which files are eligible, based upon
- the hbase.store.compaction.ratio.
-
+ hbase.hstore.compaction.min StoreFiles to compact, a minor
+ compaction is aborted. Note that a major compaction can be performed on a single
+ HFile. Its function is to remove deletes and expired versions, and reset
+ locality on the StoreFile.
+
+ The value of the hbase.store.compaction.ratio parameter
- is multiplied by the sum of files smaller than a given file, to determine
- whether that file is selected for compaction during a minor compaction. For
+ is multiplied by the sum of StoreFiles smaller than a given file, to determine
+ whether that StoreFile is selected for compaction during a minor compaction. For
instance, if hbase.store.compaction.ratio is 1.2, FileX is 5 mb, FileY is 2 mb,
and FileZ is 3 mb:
- 5 <= 1.2 x (2 + 3) or 5 <= 6
+ 5 <= 1.2 x (2 + 3) or 5 <= 6In this scenario, FileX is eligible for minor compaction. If FileX were 7
mb, it would not be eligible for minor compaction. This ratio favors smaller
- files. You can configure a different ratio for use in off-peak hours, using the
- parameter hbase.hstore.compaction.ratio.offpeak, if you also
- configure hbase.offpeak.start.hour and
- hbase.offpeak.end.hour.
-
-
-
- If major compactions are not managed manually, and it has been too long since
- the last major compaction, run a major compaction anyway.
-
+ StoreFile. You can configure a different ratio for use in off-peak hours, using
+ the parameter hbase.hstore.compaction.ratio.offpeak, if you
+ also configure hbase.offpeak.start.hour and
+ hbase.offpeak.end.hour.
+
+
+ If the last major compaction was too long ago and there is more than one
- file to be compacted, a major compaction is run, even if it would otherwise have
- been minor. By default, the maximum time between major compactions is 7 days,
- plus or minus a 4.8 hour period, and determined randomly within those
- parameters. Prior to HBase 0.96, the major compaction period was 24 hours. This
- is also referred to as a time-based or time-triggered major compaction. See
- hbase.hregion.majorcompaction in the table below to tune or
- disable time-based major compactions.
-
-
-
-
+ StoreFile to be compacted, a major compaction is run, even if it would otherwise
+ have been minor. By default, the maximum time between major compactions is 7
+ days, plus or minus a 4.8 hour period, and determined randomly within those
+ parameters. Prior to HBase 0.96, the major compaction period was 24 hours. See
+ hbase.hregion.majorcompaction in the table below to tune or
+ disable scheduled major compactions.
+
+
+
+ Parameters Used by Compaction Algorithm
-
- This table contains the main configuration parameters for compaction. This
- list is not exhaustive. To tune these parameters from the defaults, edit the
- hbase-default.xml file. For a full list of all
- configuration parameters available, see
-
-
-
-
- Parameter
- Description
- Default
-
-
-
-
- hbase.hstore.compaction.min
- The minimum number of files which must be eligible for compaction
- before compaction can run.
- In previous versions, the parameter
- hbase.hstore.compaction.min was called
- hbase.hstore.compactionThreshold.
-
- 3
-
-
- hbase.hstore.compaction.max
- The maximum number of files which will be selected for a single minor
- compaction, regardless of the number of eligible files.
- 10
-
-
- hbase.hstore.compaction.min.size
- A StoreFile smaller than this size (in bytes) will always be eligible for
- minor compaction.
- 128 MB
-
-
- hbase.hstore.compaction.max.size
- A StoreFile larger than this size (in bytes) will be excluded from minor
- compaction.
- Long.MAX_VALUE
-
-
- hbase.store.compaction.ratio
- For minor compaction, this ratio is used to determine whether a given
- file is eligible for compaction. Its effect is to limit compaction of large
- files. Expressed as a floating-point decimal.
- 1.2F
-
-
- hbase.hstore.compaction.ratio.offpeak
- The compaction ratio used during off-peak compactions, if off-peak is
- enabled. Expressed as a floating-point decimal. This allows for more
- aggressive compaction, because in theory, the cluster is under less load.
- Ignored if off-peak is disabled (default).
- 5.0F
-
-
- hbase.offpeak.start.hour
- The start of off-peak hours, expressed as an integer between 0 and 23,
- inclusive. Set to -1 to disable off-peak.
- -1 (disabled)
-
-
- hbase.offpeak.end.hour
- The end of off-peak hours, expressed as an integer between 0 and 23,
- inclusive. Set to -1 to disable off-peak.
- -1 (disabled)
-
-
- hbase.regionserver.thread.compaction.throttle
- Throttles compaction if too much of a backlog of compaction work
- exists.
- 2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size
- (which defaults to 128)
-
-
- hbase.hregion.majorcompaction
- Time between major compactions, expressed in milliseconds. Set to 0 to
- disable time-based automatic major compactions. User-requested and size-based
- major compactions will still run.
- 7 days (604800000 milliseconds)
-
-
- hbase.hregion.majorcompaction.jitter
- A multiplier applied to majorCompactionPeriod to cause compaction to
- occur a given amount of time either side of majorCompactionPeriod. The smaller
- the number, the closer the compactions will happen to the
- hbase.hregion.majorcompaction interval. Expressed as a
- floating-point decimal.
- .50F
-
-
-
-
+ This table contains the main configuration parameters for compaction. This list
+ is not exhaustive. To tune these parameters from the defaults, edit the
+ hbase-default.xml file. For a full list of all configuration
+ parameters available, see
+
+
+
+
+ Parameter
+ Description
+ Default
+
+
+
+
+ hbase.hstore.compaction.min
+ The minimum number of StoreFiles which must be eligible for
+ compaction before compaction can run.
+ The goal of tuning hbase.hstore.compaction.min
+ is to avoid ending up with too many tiny StoreFiles to compact. Setting
+ this value to 2 would cause a minor compaction each
+ time you have two StoreFiles in a Store, and this is probably not
+ appropriate. If you set this value too high, all the other values will
+ need to be adjusted accordingly. For most cases, the default value is
+ appropriate.
+ In previous versions of HBase, the parameter
+ hbase.hstore.compaction.min was called
+ hbase.hstore.compactionThreshold.
+
+ 3
+
+
+ hbase.hstore.compaction.max
+ The maximum number of StoreFiles which will be selected for a
+ single minor compaction, regardless of the number of eligible
+ StoreFiles.
+ Effectively, the value of
+ hbase.hstore.compaction.max controls the length of
+ time it takes a single compaction to complete. Setting it larger means
+ that more StoreFiles are included in a compaction. For most cases, the
+ default value is appropriate.
+
+ 10
+
+
+ hbase.hstore.compaction.min.size
+ A StoreFile smaller than this size will always be eligible for
+ minor compaction. StoreFiles this size or larger are evaluated by
+ hbase.store.compaction.ratio to determine if they are
+ eligible.
+ Because this limit represents the "automatic include" limit for
+ all StoreFiles smaller than this value, this value may need to be reduced
+ in write-heavy environments where many files in the 1-2 MB range are being
+ flushed, because every StoreFile will be targeted for compaction and the
+ resulting StoreFiles may still be under the minimum size and require
+ further compaction.
+ If this parameter is lowered, the ratio check is triggered more
+ quickly. This addressed some issues seen in earlier versions of HBase but
+ changing this parameter is no longer necessary in most situations.
+
+ 128 MB
+
+
+ hbase.hstore.compaction.max.size
+ An StoreFile larger than this size will be excluded from
+ compaction. The effect of raising
+ hbase.hstore.compaction.max.size is fewer, larger
+ StoreFiles that do not get compacted often. If you feel that compaction is
+ happening too often without much benefit, you can try raising this
+ value.
+ Long.MAX_VALUE
+
+
+ hbase.hstore.compaction.ratio
+ For minor compaction, this ratio is used to determine whether a
+ given StoreFile which is larger than
+ hbase.hstore.compaction.min.size is eligible for
+ compaction. Its effect is to limit compaction of large StoreFile. The
+ value of hbase.hstore.compaction.ratio is expressed as
+ a floating-point decimal.
+ A large ratio, such as 10, will produce a
+ single giant StoreFile. Conversely, a value of .25,
+ will produce behavior similar to the BigTable compaction algorithm,
+ producing four StoreFiles.
+ A moderate value of between 1.0 and 1.4 is recommended. When
+ tuning this value, you are balancing write costs with read costs. Raising
+ the value (to something like 1.4) will have more write costs, because you
+ will compact larger StoreFiles. However, during reads, HBase will need to seek
+ through fewer StpreFo;es to accomplish the read. Consider this approach if you
+ cannot take advantage of .
+ Alternatively, you can lower this value to something like 1.0 to
+ reduce the background cost of writes, and use to limit the number of StoreFiles touched
+ during reads.
+ For most cases, the default value is appropriate.
+
+ 1.2F
+
+
+ hbase.hstore.compaction.ratio.offpeak
+ The compaction ratio used during off-peak compactions, if off-peak
+ hours are also configured (see below). Expressed as a floating-point
+ decimal. This allows for more aggressive (or less aggressive, if you set it
+ lower than hbase.hstore.compaction.ratio) compaction
+ during a set time period. Ignored if off-peak is disabled (default). This
+ works the same as hbase.hstore.compaction.ratio.
+ 5.0F
+
+
+ hbase.offpeak.start.hour
+ The start of off-peak hours, expressed as an integer between 0 and 23,
+ inclusive. Set to -1 to disable off-peak.
+ -1 (disabled)
+
+
+ hbase.offpeak.end.hour
+ The end of off-peak hours, expressed as an integer between 0 and 23,
+ inclusive. Set to -1 to disable off-peak.
+ -1 (disabled)
+
+
+ hbase.regionserver.thread.compaction.throttle
+ There are two different thread pools for compactions, one for
+ large compactions and the other for small compactions. This helps to keep
+ compaction of lean tables (such as hbase:meta)
+ fast. If a compaction is larger than this threshold, it goes into the
+ large compaction pool. In most cases, the default value is
+ appropriate.
+ 2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size
+ (which defaults to 128)
+
+
+ hbase.hregion.majorcompaction
+ Time between major compactions, expressed in milliseconds. Set to
+ 0 to disable time-based automatic major compactions. User-requested and
+ size-based major compactions will still run. This value is multiplied by
+ hbase.hregion.majorcompaction.jitter to cause
+ compaction to start at a somewhat-random time during a given window of
+ time.
+ 7 days (604800000 milliseconds)
+
+
+ hbase.hregion.majorcompaction.jitter
+ A multiplier applied to
+ hbase.hregion.majorcompaction to cause compaction to
+ occur a given amount of time either side of
+ hbase.hregion.majorcompaction. The smaller the
+ number, the closer the compactions will happen to the
+ hbase.hregion.majorcompaction interval. Expressed as
+ a floating-point decimal.
+ .50F
+
+
+
+
+
-
+
- Compaction File Selection
- To understand the core algorithm for StoreFile selection, there is some ASCII-art in the Store source code that
- will serve as useful reference. It has been copied below:
-
+ Compaction File Selection
+
+ Legacy Information
+ This section has been preserved for historical reasons and refers to the way
+ compaction worked prior to HBase 0.96.x. You can still use this behavior if you
+ enable For information on
+ the way that compactions work in HBase 0.96.x and later, see .
+
+ To understand the core algorithm for StoreFile selection, there is some ASCII-art
+ in the Store
+ source code that will serve as useful reference. It has been copied below:
+
/* normal skew:
*
* older ----> newer
@@ -3325,200 +3532,403 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
* | | | | | | | | | | | |
*/
- Important knobs:
-
- hbase.store.compaction.ratio Ratio used in compaction
- file selection algorithm (default 1.2f).
- hbase.hstore.compaction.min (.90 hbase.hstore.compactionThreshold) (files) Minimum number
- of StoreFiles per Store to be selected for a compaction to occur (default 2).
- hbase.hstore.compaction.max (files) Maximum number of StoreFiles to compact per minor compaction (default 10).
- hbase.hstore.compaction.min.size (bytes)
- Any StoreFile smaller than this setting with automatically be a candidate for compaction. Defaults to
- hbase.hregion.memstore.flush.size (128 mb).
- hbase.hstore.compaction.max.size (.92) (bytes)
- Any StoreFile larger than this setting with automatically be excluded from compaction (default Long.MAX_VALUE).
+ Important knobs:
+
+ hbase.store.compaction.ratio Ratio used in compaction file
+ selection algorithm (default 1.2f).
+
+
+ hbase.hstore.compaction.min (.90
+ hbase.hstore.compactionThreshold) (files) Minimum number of StoreFiles per Store
+ to be selected for a compaction to occur (default 2).
+
+
+ hbase.hstore.compaction.max (files) Maximum number of
+ StoreFiles to compact per minor compaction (default 10).
+
+
+ hbase.hstore.compaction.min.size (bytes) Any StoreFile smaller
+ than this setting with automatically be a candidate for compaction. Defaults to
+ hbase.hregion.memstore.flush.size (128 mb).
+
+
+ hbase.hstore.compaction.max.size (.92) (bytes) Any StoreFile
+ larger than this setting with automatically be excluded from compaction (default
+ Long.MAX_VALUE).
+
+
+
+ The minor compaction StoreFile selection logic is size based, and selects a file
+ for compaction when the file <= sum(smaller_files) *
+ hbase.hstore.compaction.ratio.
+
+
+ Minor Compaction File Selection - Example #1 (Basic Example)
+ This example mirrors an example from the unit test
+ TestCompactSelection.
+
+
+ hbase.store.compaction.ratio = 1.0f
+
+
+ hbase.hstore.compaction.min = 3 (files)
+
+
+ hbase.hstore.compaction.max = 5 (files)
+
+
+ hbase.hstore.compaction.min.size = 10 (bytes)
+
+
+ hbase.hstore.compaction.max.size = 1000 (bytes)
+
-
- The minor compaction StoreFile selection logic is size based, and selects a file for compaction when the file
- <= sum(smaller_files) * hbase.hstore.compaction.ratio.
-
-
-
- Minor Compaction File Selection - Example #1 (Basic Example)
- This example mirrors an example from the unit test TestCompactSelection.
-
- hbase.store.compaction.ratio = 1.0f
- hbase.hstore.compaction.min = 3 (files)
- hbase.hstore.compaction.max = 5 (files)
- hbase.hstore.compaction.min.size = 10 (bytes)
- hbase.hstore.compaction.max.size = 1000 (bytes)
-
-
- The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest).
- With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
-
- Why?
-
- 100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
- 50 --> No, because sum(23, 12, 12) * 1.0 = 47.
- 23 --> Yes, because sum(12, 12) * 1.0 = 24.
- 12 --> Yes, because the previous file has been included, and because this
- does not exceed the the max-file limit of 5
- 12 --> Yes, because the previous file had been included, and because this
- does not exceed the the max-file limit of 5.
-
-
-
-
- Minor Compaction File Selection - Example #2 (Not Enough Files To Compact)
- This example mirrors an example from the unit test TestCompactSelection.
-
- hbase.store.compaction.ratio = 1.0f
- hbase.hstore.compaction.min = 3 (files)
- hbase.hstore.compaction.max = 5 (files)
- hbase.hstore.compaction.min.size = 10 (bytes)
- hbase.hstore.compaction.max.size = 1000 (bytes)
-
-
- The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to newest).
- With the above parameters, no compaction will be started.
-
- Why?
-
- 100 --> No, because sum(25, 12, 12) * 1.0 = 47
- 25 --> No, because sum(12, 12) * 1.0 = 24
- 12 --> No. Candidate because sum(12) * 1.0 = 12, there are only 2 files to compact and that is less than the threshold of 3
- 12 --> No. Candidate because the previous StoreFile was, but there are not enough files to compact
-
-
-
-
- Minor Compaction File Selection - Example #3 (Limiting Files To Compact)
- This example mirrors an example from the unit test TestCompactSelection.
-
- hbase.store.compaction.ratio = 1.0f
- hbase.hstore.compaction.min = 3 (files)
- hbase.hstore.compaction.max = 5 (files)
- hbase.hstore.compaction.min.size = 10 (bytes)
- hbase.hstore.compaction.max.size = 1000 (bytes)
-
- The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece (oldest to newest).
- With the above parameters, the files that would be selected for minor compaction are 7, 6, 5, 4, 3.
-
- Why?
-
- 7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21. Also, 7 is less than the min-size
- 6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15. Also, 6 is less than the min-size.
- 5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10. Also, 5 is less than the min-size.
- 4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4 is less than the min-size.
- 3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is less than the min-size.
- 2 --> No. Candidate because previous file was selected and 2 is less than the min-size, but the max-number of files to compact has been reached.
- 1 --> No. Candidate because previous file was selected and 1 is less than the min-size, but max-number of files to compact has been reached.
-
-
-
-
- Impact of Key Configuration Options
- hbase.store.compaction.ratio. A large ratio (e.g., 10) will produce a single giant file. Conversely, a value of .25 will
- produce behavior similar to the BigTable compaction algorithm - resulting in 4 StoreFiles.
-
- hbase.hstore.compaction.min.size. Because
- this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to
- be adjusted downwards in write-heavy environments where many 1 or 2 mb StoreFiles are being flushed, because every file
- will be targeted for compaction and the resulting files may still be under the min-size and require further compaction, etc.
-
-
-Experimental: stripe compactions
-
-Stripe compactions is an experimental feature added in HBase 0.98 which aims to improve compactions for large regions or non-uniformly distributed row keys. In order to achieve smaller and/or more granular compactions, the store files within a region are maintained separately for several row-key sub-ranges, or "stripes", of the region. The division is not visible to the higher levels of the system, so externally each region functions as before.
-
-This feature is fully compatible with default compactions - it can be enabled for existing tables, and the table will continue to operate normally if it's disabled later.
-
-When to use
-You might want to consider using this feature if you have:
-
-large regions (in that case, you can get the positive effect of much smaller regions without additional memstore and region management overhead); or
-
-
-non-uniform row keys, e.g. time dimension in a key (in that case, only the stripes receiving the new keys will keep compacting - old data will not compact as much, or at all).
-
-
-
-According to perf testing performed, in these case the read performance can improve somewhat, and the read and write performance variability due to compactions is greatly reduced. There's overall perf improvement on large, non-uniform row key regions (hash-prefixed timestamp key) over long term. All of these performance gains are best realized when table is already large. In future, the perf improvement might also extend to region splits.
-
-How to enable
-
-To use stripe compactions for a table or a column family, you should set its hbase.hstore.engine.class to org.apache.hadoop.hbase.regionserver.StripeStoreEngine. Due to the nature of compactions, you also need to set the blocking file count to a high number (100 is a good default, which is 10 times the normal default of 10). If changing the existing table, you should do it when it is disabled. Examples:
-
-alter 'orders_table', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 'hbase.hstore.blockingStoreFiles' => '100'}
+ The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to
+ newest). With the above parameters, the files that would be selected for minor
+ compaction are 23, 12, and 12.
+ Why?
+
+ 100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
+
+
+ 50 --> No, because sum(23, 12, 12) * 1.0 = 47.
+
+
+ 23 --> Yes, because sum(12, 12) * 1.0 = 24.
+
+
+ 12 --> Yes, because the previous file has been included, and because this
+ does not exceed the the max-file limit of 5
+
+
+ 12 --> Yes, because the previous file had been included, and because this
+ does not exceed the the max-file limit of 5.
+
+
+
+
+
+ Minor Compaction File Selection - Example #2 (Not Enough Files To
+ Compact)
+ This example mirrors an example from the unit test
+ TestCompactSelection.
+
+ hbase.store.compaction.ratio = 1.0f
+
+
+ hbase.hstore.compaction.min = 3 (files)
+
+
+ hbase.hstore.compaction.max = 5 (files)
+
+
+ hbase.hstore.compaction.min.size = 10 (bytes)
+
+
+ hbase.hstore.compaction.max.size = 1000 (bytes)
+
+
+
+ The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to
+ newest). With the above parameters, no compaction will be started.
+ Why?
+
+ 100 --> No, because sum(25, 12, 12) * 1.0 = 47
+
+
+ 25 --> No, because sum(12, 12) * 1.0 = 24
+
+
+ 12 --> No. Candidate because sum(12) * 1.0 = 12, there are only 2 files
+ to compact and that is less than the threshold of 3
+
+
+ 12 --> No. Candidate because the previous StoreFile was, but there are
+ not enough files to compact
+
+
+
+
+
+ Minor Compaction File Selection - Example #3 (Limiting Files To Compact)
+ This example mirrors an example from the unit test
+ TestCompactSelection.
+
+ hbase.store.compaction.ratio = 1.0f
+
+
+ hbase.hstore.compaction.min = 3 (files)
+
+
+ hbase.hstore.compaction.max = 5 (files)
+
+
+ hbase.hstore.compaction.min.size = 10 (bytes)
+
+
+ hbase.hstore.compaction.max.size = 1000 (bytes)
+
+ The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece
+ (oldest to newest). With the above parameters, the files that would be selected for
+ minor compaction are 7, 6, 5, 4, 3.
+ Why?
+
+ 7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21. Also, 7 is less than
+ the min-size
+
+
+ 6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15. Also, 6 is less than
+ the min-size.
+
+
+ 5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10. Also, 5 is less than the
+ min-size.
+
+
+ 4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4 is less than the
+ min-size.
+
+
+ 3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is less than the
+ min-size.
+
+
+ 2 --> No. Candidate because previous file was selected and 2 is less than
+ the min-size, but the max-number of files to compact has been reached.
+
+
+ 1 --> No. Candidate because previous file was selected and 1 is less than
+ the min-size, but max-number of files to compact has been reached.
+
+
+
+
+ Impact of Key Configuration Options
+
+ This information is now included in the configuration parameter table in .
+
+
+
-alter 'orders_table', {NAME => 'blobs_cf', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 'hbase.hstore.blockingStoreFiles' => '100'}}
+
+ Experimental: Stripe Compactions
+ Stripe compactions is an experimental feature added in HBase 0.98 which aims to
+ improve compactions for large regions or non-uniformly distributed row keys. In order
+ to achieve smaller and/or more granular compactions, the StoreFiles within a region
+ are maintained separately for several row-key sub-ranges, or "stripes", of the region.
+ The stripes are transparent to the rest of HBase, so other operations on the HFiles or
+ data work without modification.
+ Stripe compactions change the HFile layout, creating sub-regions within regions.
+ These sub-regions are easier to compact, and should result in fewer major compactions.
+ This approach alleviates some of the challenges of larger regions.
+ Stripe compaction is fully compatible with and works in conjunction with either the
+ ExploringCompactionPolicy or RatioBasedCompactionPolicy. It can be enabled for
+ existing tables, and the table will continue to operate normally if it is disabled
+ later.
+
+
+ When To Use Stripe Compactions
+ Consider using stripe compaction if you have either of the following:
+
+
+ Large regions. You can get the positive effects of smaller regions without
+ additional overhead for MemStore and region management overhead.
+
+
+ Non-uniform keys, such as time dimension in a key. Only the stripes receiving
+ the new keys will need to compact. Old data will not compact as often, if at
+ all
+
+
+
+ Performance Improvements
+ Performance testing has shown that the performance of reads improves somewhat,
+ and variability of performance of reads and writes is greatly reduced. An overall
+ long-term performance improvement is seen on large non-uniform-row key regions, such
+ as a hash-prefixed timestamp key. These performance gains are the most dramatic on a
+ table which is already large. It is possible that the performance improvement might
+ extend to region splits.
+
+
+ Enabling Stripe Compaction
+ You can enable stripe compaction for a table or a column family, by setting its
+ hbase.hstore.engine.class to
+ org.apache.hadoop.hbase.regionserver.StripeStoreEngine. You
+ also need to set the hbase.hstore.blockingStoreFiles to a high
+ number, such as 100 (rather than the default value of 10).
+
+ Enable Stripe Compaction
+
+ If the table already exists, disable the table.
+
+
+ Run one of following commands in the HBase shell. Replace the table name
+ orders_table with the name of your table.
+
+alter 'orders_table', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 'hbase.hstore.blockingStoreFiles' => '100'}
+alter 'orders_table', {NAME => 'blobs_cf', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 'hbase.hstore.blockingStoreFiles' => '100'}}
+create 'orders_table', 'blobs_cf', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 'hbase.hstore.blockingStoreFiles' => '100'}
+
+
+
+ Configure other options if needed. See for more information.
+
+
+ Enable the table.
+
+
-create 'orders_table', 'blobs_cf', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', 'hbase.hstore.blockingStoreFiles' => '100'}
-
-
-Then, you can configure the other options if needed (see below) and enable the table.
-To switch back to default compactions, set hbase.hstore.engine.class to nil to unset it; or set it explicitly to "org.apache.hadoop.hbase.regionserver.DefaultStoreEngine" (this also needs to be done on a disabled table).
-
-When you enable a large table after changing the store engine either way, a major compaction will likely be performed on most regions. This is not a problem with new tables.
-
-How to configure
-
-All of the settings described below are best set on table/cf level (with the table disabled first, for the settings to apply), similar to the above, e.g.
-
+
+ Disable Stripe Compaction
+
+ Disable the table.
+
+
+ Set the hbase.hstore.engine.class option to either nil or
+ org.apache.hadoop.hbase.regionserver.DefaultStoreEngine.
+ Either option has the same effect.
+
+alter 'orders_table', CONFIGURATION => {'hbase.hstore.engine.class' => ''}
+
+
+
+ Enable the table.
+
+
+ When you enable a large table after changing the store engine either way, a
+ major compaction will likely be performed on most regions. This is not necessary on
+ new tables.
+
+
+ Configuring Stripe Compaction
+ Each of the settings for stripe compaction should be configured at the table or
+ column family, after disabling the table. If you use HBase shell, the general
+ command pattern is as follows:
+
+
alter 'orders_table', CONFIGURATION => {'key' => 'value', ..., 'key' => 'value'}}
-
-
-Region and stripe sizing
-
-Based on your region sizing, you might want to also change your stripe sizing. By default, your new regions will start with one stripe. When the stripe is too big (16 memstore flushes size), on next compaction it will be split into two stripes. Stripe splitting will continue in a similar manner as the region grows, until the region itself is big enough to split (region split will work the same as with default compactions).
-
-You can improve this pattern for your data. You should generally aim at stripe size of at least 1Gb, and about 8-12 stripes for uniform row keys - so, for example if your regions are 30 Gb, 12x2.5Gb stripes might be a good idea.
-
-The settings are as follows:
-
-SettingNotes
-
-
-hbase.store.stripe.initialStripeCount
-
-Initial stripe count to create. You can use it as follows:
-
-
-for relatively uniform row keys, if you know the approximate target number of stripes from the above, you can avoid some splitting overhead by starting w/several stripes (2, 5, 10...). Note that if the early data is not representative of overall row key distribution, this will not be as efficient.
-
-
-for existing tables with lots of data, you can use this to pre-split stripes.
-
-
-for e.g. hash-prefixed sequential keys, with more than one hash prefix per region, you know that some pre-splitting makes sense.
-
-
-
-hbase.store.stripe.sizeToSplit
-
-Maximum stripe size before it's split. You can use this in conjunction with the next setting to control target stripe size (sizeToSplit = splitPartsCount * target stripe size), according to the above sizing considerations.
-
-hbase.store.stripe.splitPartCount
-
-The number of new stripes to create when splitting one. The default is 2, and is good for most cases. For non-uniform row keys, you might experiment with increasing the number somewhat (3-4), to isolate the arriving updates into narrower slice of the region with just one split instead of several.
-
-
-
-
-Memstore sizing
-
-By default, the flush creates several files from one memstore, according to existing stripe boundaries and row keys to flush. This approach minimizes write amplification, but can be undesirable if memstore is small and there are many stripes (the files will be too small).
-
-In such cases, you can set hbase.store.stripe.compaction.flushToL0 to true. This will cause flush to create a single file instead; when at least hbase.store.stripe.compaction.minFilesL0 such files (by default, 4) accumulate, they will be compacted into striped files.
-Normal compaction configuration
-
-All the settings that apply to normal compactions (file size limits, etc.) apply to stripe compactions. The exception are min and max number of files, which are set to higher values by default because the files in stripes are smaller. To control these for stripe compactions, use hbase.store.stripe.compaction.minFiles and .maxFiles.
-
-
-
-
-
+
+
+ Region and stripe sizing
+ You can configure your stripe sizing bsaed upon your region sizing. By
+ default, your new regions will start with one stripe. On the next compaction after
+ the stripe has grown too large (16 x MemStore flushes size), it is split into two
+ stripes. Stripe splitting continues as the region grows, until the region is large
+ enough to split.
+ You can improve this pattern for your own data. A good rule is to aim for a
+ stripe size of at least 1 GB, and about 8-12 stripes for uniform row keys. For
+ example, if your regions are 30 GB, 12 x 2.5 GB stripes might be a good starting
+ point.
+
+
+ Stripe Sizing Settings
+
+
+
+
+
+ Setting
+ Notes
+
+
+
+
+
+ hbase.store.stripe.initialStripeCount
+
+
+ The number of stripes to create when stripe compaction is enabled.
+ You can use it as follows:
+
+ For relatively uniform row keys, if you know the approximate
+ target number of stripes from the above, you can avoid some
+ splitting overhead by starting with several stripes (2, 5, 10...).
+ If the early data is not representative of overall row key
+ distribution, this will not be as efficient.
+
+
+ For existing tables with a large amount of data, this setting
+ will effectively pre-split your stripes.
+
+
+ For keys such as hash-prefixed sequential keys, with more than
+ one hash prefix per region, pre-splitting may make sense.
+
+
+
+
+
+
+ hbase.store.stripe.sizeToSplit
+
+ The maximum size a stripe grows before splitting. Use this in
+ conjunction with hbase.store.stripe.splitPartCount to
+ control the target stripe size (sizeToSplit = splitPartsCount * target
+ stripe size), according to the above sizing considerations.
+
+
+
+ hbase.store.stripe.splitPartCount
+
+ The number of new stripes to create when splitting a stripe. The
+ default is 2, which is appropriate for most cases. For non-uniform row
+ keys, you can experiment with increasing the number to 3 or 4, to isolate
+ the arriving updates into narrower slice of the region without additional
+ splits being required.
+
+
+
+
+
+
+ MemStore Size Settings
+ By default, the flush creates several files from one MemStore, according to
+ existing stripe boundaries and row keys to flush. This approach minimizes write
+ amplification, but can be undesirable if the MemStore is small and there are many
+ stripes, because the files will be too small.
+ In this type of situation, you can set
+ hbase.store.stripe.compaction.flushToL0 to
+ true. This will cause a MemStore flush to create a single
+ file instead. When at least
+ hbase.store.stripe.compaction.minFilesL0 such files (by
+ default, 4) accumulate, they will be compacted into striped files.
+
+
+ Normal Compaction Configuration and Stripe Compaction
+ All the settings that apply to normal compactions (see ) apply to stripe compactions.
+ The exceptions are the minimum and maximum number of files, which are set to
+ higher values by default because the files in stripes are smaller. To control
+ these for stripe compactions, use
+ hbase.store.stripe.compaction.minFiles and
+ hbase.store.stripe.compaction.maxFiles, rather than
+ hbase.hstore.compaction.min and
+ hbase.hstore.compaction.max.
+
+
+
+
@@ -3700,16 +4110,33 @@ public enum Consistency {
In terms of semantics, TIMELINE consistency as implemented by HBase differs from pure eventual consistency in these respects:
-
- Single homed and ordered updates: Region replication or not, on the write side, there is still only 1 defined replica (primary) which can accept writes. This replica is responsible for ordering the edits and preventing conflicts. This guarantees that two different writes are not committed at the same time by different replicas and the data diverges. With this, there is no need to do read-repair or last-timestamp-wins kind of conflict resolution.
-
- The secondaries also apply the edits in the order that the primary committed them. This way the secondaries will contain a snapshot of the primaries data at any point in time. This is similar to RDBMS replications and even HBase’s own multi-datacenter replication, however in a single cluster.
-
- On the read side, the client can detect whether the read is coming from up-to-date data or is stale data. Also, the client can issue reads with different consistency requirements on a per-operation basis to ensure its own semantic guarantees.
-
- The client can still observe edits out-of-order, and can go back in time, if it observes reads from one secondary replica first, then another secondary replica. There is no stickiness to region replicas or a transaction-id based guarantee. If required, this can be implemented later though.
-
-
+
+ Single homed and ordered updates: Region replication or not, on the write side,
+ there is still only 1 defined replica (primary) which can accept writes. This
+ replica is responsible for ordering the edits and preventing conflicts. This
+ guarantees that two different writes are not committed at the same time by different
+ replicas and the data diverges. With this, there is no need to do read-repair or
+ last-timestamp-wins kind of conflict resolution.
+
+
+ The secondaries also apply the edits in the order that the primary committed
+ them. This way the secondaries will contain a snapshot of the primaries data at any
+ point in time. This is similar to RDBMS replications and even HBase’s own
+ multi-datacenter replication, however in a single cluster.
+
+
+ On the read side, the client can detect whether the read is coming from
+ up-to-date data or is stale data. Also, the client can issue reads with different
+ consistency requirements on a per-operation basis to ensure its own semantic
+ guarantees.
+
+
+ The client can still observe edits out-of-order, and can go back in time, if it
+ observes reads from one secondary replica first, then another secondary replica.
+ There is no stickiness to region replicas or a transaction-id based guarantee. If
+ required, this can be implemented later though.
+
+