diff --git hbase-common/src/main/resources/hbase-default.xml hbase-common/src/main/resources/hbase-default.xml
index 1362693..c7bc7e5 100644
--- hbase-common/src/main/resources/hbase-default.xml
+++ hbase-common/src/main/resources/hbase-default.xml
@@ -1448,4 +1448,33 @@ possible configurations would overwhelm and obscure the important.
hbase.http.staticuser.userdr.stack
+
+
+ hbase.mob.file.cache.size
+ 1000
+
+ Number of opened file handlers to cache.
+ A larger value will benefit reads by provinding more file handlers per mob
+ file cache and would reduce frequent file opening and closing.
+ However, if this is set too high, this could lead to a "too many opened file handers"
+ The default value is 1000.
+
+
+
+ hbase.mob.cache.evict.period
+ 3600
+
+ The amount of time in seconds before the mob cache evicts cached mob files.
+ The default value is 3600 seconds.
+
+
+
+ hbase.mob.cache.evict.remain.ratio
+ 0.5f
+
+ The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
+ is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
+ The default value is 0.5f.
+
+
diff --git src/main/docbkx/book.xml src/main/docbkx/book.xml
index 01c6b41..15f1cea 100644
--- src/main/docbkx/book.xml
+++ src/main/docbkx/book.xml
@@ -4676,8 +4676,10 @@ if (result.isStale()) {
+
+
diff --git src/main/docbkx/hbase_mob.xml src/main/docbkx/hbase_mob.xml
new file mode 100644
index 0000000..df6baa7
--- /dev/null
+++ src/main/docbkx/hbase_mob.xml
@@ -0,0 +1,247 @@
+
+
+
+
+ HBase Medium Object (MOB) Storage
+ Data comes in many sizes, and saving all of your data in HBase, including binary data such
+ as images and documents documents, is ideal. HBase can technically handle binary objects
+ with cells that are 1 byte to 10MB in size. However, HBase's normal read and write paths are
+ optimized for values smaller than 100KB in size. When HBase deals with large numbers of
+ values larger than 100kb and up to 10MB, referred to here as medium
+ objects, or MOBs, performance is degraded due to
+ write amplification caused by splits and compactions. HBase 2.0+ adds support for better
+ managing large numbers of MOBs while maintaining performance, consistency, and low
+ operational overhead. MOB support is provided by the work done in HBASE-11339.
+
+ To take advantage of MOB, first configure the MOB cache settings for each RegionServer,
+ then configure specific columns to hold MOB data. Currently, you also need to configure a
+ periodic re-optimization of MOB data layout, but this requirement is expected to be removed
+ at a later date.
+ Client code does not need to change to take advantage of HBase MOB support. The feature is
+ transparent to the client.
+
+ Configure the MOB Cache
+ To configure the MOB Cache on each RegionServer, add the following properties to the
+ RegionServer's hbase-site.xml, customize the configuration to suit your environment, and
+ restart or rolling restart the RegionServer.
+
+ hbase.mob.file.cache.size
+ 1000
+
+ Number of opened file handlers to cache.
+ A larger value will benefit reads by provinding more file handlers per mob
+ file cache and would reduce frequent file opening and closing.
+ However, if this is set too high, this could lead to a "too many opened file handers"
+ The default value is 1000.
+
+
+
+ hbase.mob.cache.evict.period
+ 3600
+
+ The amount of time in seconds before the mob cache evicts cached mob files.
+ The default value is 3600 seconds.
+
+
+
+ hbase.mob.cache.evict.remain.ratio
+ 0.5f
+
+ The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
+ is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
+ The default value is 0.5f.
+
+]]>
+
+
+
+
+ Configure Columns for MOB
+ You can configure columns to support MOB during table creation or alteration, either
+ in HBase Shell or via the Java API. The two relevant properties are the boolean
+ IS_MOB and the MOB_THRESHOLD, which is the number of bytes
+ at which an object is considered to be a MOB.
+
+ Configure a Column for MOB Using HBase Shell
+
+hbase> create 't1', 'f1', {IS_MOB => true, MOB_THRESHOLD => 102400}
+hbase> alter ‘t1′, {NAME => ‘f1', IS_MOB => true, MOB_THRESHOLD => 102400}
+
+
+
+ Configure a Column for MOB Using the API
+
+...
+HColumnDescriptor hcd = new HColumnDescriptor(“f”);
+hcd.setValue(MobConstants.IS_MOB, Bytes.toBytes(Boolean.TRUE));
+...
+HColumnDescriptor hcd;
+hcd.setValue(MobConstants.MOB_THRESHOLD, Bytes.toBytes(102400L);
+...
+
+
+
+
+
+ Read Raw Values From the MOB
+ Client Gets and Puts do not need to change to use HBase MOB. However, a new Scanner
+ mode is provided, which allows you to read raw values from the MOB.
+
+Scan scan = new Scan();
+scan.setAttribute(MobConstants.MOB_SCAN_RAW, Bytes.toBytes(Boolean.TRUE));
+InternalScanner scanner = (InternalScanner) region.getScanner(scan);
+scanner.next(result, limit);
+
+
+
+
+ Testing MOB
+ The utility org.apache.hadoop.hbase.IntegrationTestIngestMOB is
+ provided to assist with testing the MOB feature. The utility is run as follows:
+ $ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
+ -threshold 100*1024 \
+ -minMobDataSize 100*1024*4/5 \
+ -maxMobDataSize 100*1024*50
+
+
+ threshold is the threshold at which cells are considered to
+ be MOBs. The default is 100 kb.
+
+
+ minMobDataSize is the minimum value for the size of MOB
+ data. The default is 80 kb.
+
+
+ maxMobDataSize is the maximum value for the size of MOB
+ data. The default is 5 MB.
+
+
+
+
+
+ Set Up MOB Re-Optimization Tasks
+ The MOB feature introduces a new read and write path to HBase and currently requires
+ two external tools for housekeeping and optimization. The
+ expiredMobFileCleaner handles TTLs and time-based expiry of data.
+ The sweep tool coalesces small MOB files or MOB files with many
+ deletions. or updates
+
+ Configure and Run the expiredMobCleaner
+
+ First, configure the MOB clean delay, by setting the following property in the
+ RegionServer's hbase-site.xml. The default is 1
+ hour.
+
+ hbase.mob.cleaner.delay
+ 60 * 60 * 1000
+
+ ]]>
+
+
+
+ Next, start the expiredCleaner processes. Start one process
+ for each column that is configured for MOB.
+ $ org.apache.hadoop.hbase.mob.compactions.expiredMobFileCleaner \
+ tableName \
+ familyName
+
+
+
+
+ Configure and Run the sweeper Tool
+
+ First, configure the sweeper's properties in the
+ RegionServer's hbase-site.xml file. Adjust these properties
+ to suit your environment.
+
+
+ If there're too many cells deleted in a mob file, it's regarded
+ as a invalid file and needs to be re-written/merged.
+ If (mobFileSize-existingCellsSize)/mobFileSize>=ratio, it's regarded
+ as a invalid file. The default value is 0.3f.
+
+ hbase.mob.compaction.invalid.file.ratio
+ 0.3f
+
+
+
+ If the size of a mob is less than the threshold, it's regarded as a small
+ file and needs to be merged. The default value is 64MB.
+
+ hbase.mob.compaction.small.file.threshold
+ 67108864
+
+
+
+ The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
+ The default value is 128MB.
+
+ hbase.mob.compaction.memstore.flush.size
+ 134217728
+
+
+ ]]>
+
+ The worst case scenario when using the sweeper tool is
+ when the compaction of MOB files succeeds but the update of the references
+ (a Put operation) fails. In this case, new MOB files have been created the
+ new MOB file paths have not been put into HBase, so these MOB files will not
+ be referenced by HBase.
+
+
+
+ Next, add the HBase install directory, $HBASE_HOME/*, and HBase
+ library directory to yarn-site.xml Adjust this example to
+ suit your environment.
+
+ Classpath for typical applications.
+ yarn.application.classpath
+
+ $HADOOP_CONF_DIR
+ $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*
+ $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
+ $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*
+ $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
+ $HBASE_HOME/*, $HBASE_HOME/lib/*
+
+
+ ]]>
+
+
+ Finally, run the sweeper tool for each column which is
+ configured for MOB..
+ $ org.apache.hadoop.hbase.mob.compactions.Sweeper \
+ tableName \
+ familyName
+
+
+
+
+
diff --git src/main/docbkx/ops_mgt.xml src/main/docbkx/ops_mgt.xml
index 21045bd..88ffa06 100644
--- src/main/docbkx/ops_mgt.xml
+++ src/main/docbkx/ops_mgt.xml
@@ -2128,7 +2128,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112'
If you cannot estimate the size of your tables well, when starting off, it's probably
best to stick to the default region size, perhaps going smaller for hot tables (or
manually split hot regions to spread the load over the cluster), or go with larger region
- sizes if your cell sizes tend to be largish (100k and up).
+ sizes if your cell sizes tend to be largish (100k and up). See also the new feature , introduced in HBase 2.0+.
In HBase 0.98, experimental stripe compactions feature was added that would allow for
larger regions, especially for log data. See .
diff --git src/main/docbkx/schema_design.xml src/main/docbkx/schema_design.xml
index efbcb55..6961f46 100644
--- src/main/docbkx/schema_design.xml
+++ src/main/docbkx/schema_design.xml
@@ -464,6 +464,15 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
less than the number of row versions.
+
+ Cell Size
+ HBase is optimized to handle cell sizes up to 100 KB very well, though it can technically
+ handle cell sizes from 1 kb to 10 MB. Objects between 10 MB and 64 MB are referred to as
+ Medium Objects (MOBs), and support for storing those objects directly in HBase is provided in
+ HBase 2.0+. See .
+ For storing objects larger than 64 MB or larger than 10 MB without MOB support, store the
+ objects directly in HDFS, and store a reference to the file path in HBase.
+ Supported Datatypes