diff --git hbase-common/src/main/resources/hbase-default.xml hbase-common/src/main/resources/hbase-default.xml
index 1362693..efcfd7c 100644
--- hbase-common/src/main/resources/hbase-default.xml
+++ hbase-common/src/main/resources/hbase-default.xml
@@ -1448,4 +1448,33 @@ possible configurations would overwhelm and obscure the important.
hbase.http.staticuser.userdr.stack
+
+
+ hbase.mob.file.cache.size
+ 1000
+
+ Number of opened file handlers to cache.
+ A larger value will benefit reads by provinding more file handlers per mob
+ file cache and would reduce frequent file opening and closing.
+ However, if this is set too high, this could lead to a "too many opened file handers"
+ The default value is 1000.
+
+
+
+ hbase.mob.cache.evict.period
+ 3600
+
+ The amount of time in seconds before the mob cache evicts cached mob files.
+ The default value is 3600 seconds.
+
+
+
+ hbase.mob.cache.evict.remain.ratio
+ 0.5f
+
+ The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
+ is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
+ The default value is 0.5f.
+
+
diff --git src/main/docbkx/book.xml src/main/docbkx/book.xml
index 8fc2f7a..8d7d8eb 100644
--- src/main/docbkx/book.xml
+++ src/main/docbkx/book.xml
@@ -4875,8 +4875,10 @@ if (result.isStale()) {
+
+
diff --git src/main/docbkx/hbase_mob.xml src/main/docbkx/hbase_mob.xml
new file mode 100644
index 0000000..7d77803
--- /dev/null
+++ src/main/docbkx/hbase_mob.xml
@@ -0,0 +1,227 @@
+
+
+
+
+ HBase Medium Object (MOB) Storage
+ Data comes in many sizes, and saving all of your data in HBase, including binary data such
+ as images and documents, is ideal. HBase can technically handle binary objects with cells
+ that are up to 10MB in size. However, HBase's normal read and write paths are optimized for
+ values smaller than 100KB in size. When HBase deals with large numbers of values up to 10MB,
+ referred to here as medium objects, or MOBs,
+ performance is degraded due to write amplification caused by splits and compactions. HBase
+ 2.0+ adds support for better managing large numbers of MOBs while maintaining performance,
+ consistency, and low operational overhead. MOB support is provided by the work done in HBASE-11339.
+
+ To take advantage of MOB, you need to use HFile version 3. Optionally, configure the MOB
+ file reader's cache settings for each RegionServer (see ), then configure specific columns to hold MOB data. Currently, you also need to configure
+ a periodic re-optimization of MOB data layout, but this requirement is expected to be
+ removed at a later date.
+ Client code does not need to change to take advantage of HBase MOB support. The feature is
+ transparent to the client.
+
+
+ Limitations of MOB Functionality
+ Work on HBase MOB is ongoing. Work is needed for support for snapshots (HBASE-11645),
+ metrics (HBASE-11683), and a native compaction mechanism (HBASE-11861).
+
+
+
+ Configure Columns for MOB
+ You can configure columns to support MOB during table creation or alteration, either
+ in HBase Shell or via the Java API. The two relevant properties are the boolean
+ isMob and the mobThreshold, which is the number of bytes
+ at which an object is considered to be a MOB. Only isMob is required. If
+ you do not specify the MOB_THRESHOLD, the default threshold value of 100 kb
+ is used.
+
+ Configure a Column for MOB Using HBase Shell
+
+hbase> create 't1', 'f1', {isMob => true, mobThreshold => 102400}
+hbase> alter ‘t1′, {NAME => ‘f1', isMob => true, mobThreshold => 102400}
+
+
+
+ Configure a Column for MOB Using the API
+
+...
+HColumnDescriptor hcd = new HColumnDescriptor(“f”);
+hcd.setValue(MobConstants.IS_MOB, Bytes.toBytes(Boolean.TRUE));
+...
+HColumnDescriptor hcd;
+hcd.setValue(MobConstants.MOB_THRESHOLD, Bytes.toBytes(102400L);
+...
+
+
+
+
+
+ Testing MOB
+ The utility org.apache.hadoop.hbase.IntegrationTestIngestMOB is
+ provided to assist with testing the MOB feature. The utility is run as follows:
+ $ sudo -u hbase hbase org.apache.hadoop.hbase.IntegrationTestIngestMOB \
+ -threshold 100*1024 \
+ -minMobDataSize 100*1024*4/5 \
+ -maxMobDataSize 100*1024*50
+
+
+ threshold is the threshold at which cells are considered to
+ be MOBs. The default is 100 kb.
+
+
+ minMobDataSize is the minimum value for the size of MOB
+ data. The default is 80 kb.
+
+
+ maxMobDataSize is the maximum value for the size of MOB
+ data. The default is 5 MB.
+
+
+
+
+
+ Set Up MOB Re-Optimization Tasks
+ The MOB feature introduces a new read and write path to HBase and currently requires
+ an external tool, the sweeper tool, for housekeeping and
+ optimization. The sweeper tool uses MapReduce to coalesce small MOB
+ files or MOB files with many deletions or updates
+
+
+ Configure and Run the sweeper Tool
+
+ First, configure the sweeper's properties in the
+ RegionServer's hbase-site.xml file. Adjust these properties
+ to suit your environment.
+
+ hbase.mob.sweep.tool.compaction.ratio
+ 0.5f
+
+ If there're too many cells deleted in a mob file, it's regarded
+ as an invalid file and needs to be merged.
+ If existingCellsSize/mobFileSize is less than ratio, it's regarded
+ as an invalid file. The default value is 0.5f.
+
+
+
+ hbase.mob.sweep.tool.compaction.mergeable.size
+ 134217728
+
+ If the size of a mob file is less than this value, it's regarded as a small
+ file and needs to be merged. The default value is 128MB.
+
+
+
+ hbase.mob.sweep.tool.compaction.memstore.flush.size
+ 134217728
+
+ The flush size for the memstore used by sweep job. Each sweep reducer owns such a memstore.
+ The default value is 128MB.
+
+
+
+ hbase.mob.cleaner.interval
+ 86400000
+
+ The period that ExpiredMobFileCleaner runs. The unit is millisecond.
+ The default value is one day.
+
+]]>
+
+
+
+ Next, add the HBase install directory, $HBASE_HOME/*, and HBase
+ library directory to yarn-site.xml Adjust this example to
+ suit your environment.
+
+ Classpath for typical applications.
+ yarn.application.classpath
+
+ $HADOOP_CONF_DIR
+ $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*
+ $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
+ $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*
+ $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
+ $HBASE_HOME/*, $HBASE_HOME/lib/*
+
+]]>
+
+
+
+ Finally, run the sweeper tool for each column which is
+ configured for MOB..
+ $ org.apache.hadoop.hbase.mob.compactions.Sweeper \
+ tableName \
+ familyName
+
+
+
+
+ Configure the MOB Cache
+ Because there can be a large number of MOB files at any time, as compared to the
+ number of HFiles, MOB files are not always kept open. The MOB file reader cache is a LRU
+ cache which keeps the most recently used MOB files open. To configure the MOB file
+ reader's cache on each RegionServer, add the following properties to the RegionServer's
+ hbase-site.xml, customize the configuration to suit your
+ environment, and restart or rolling restart the RegionServer.
+
+ hbase.mob.file.cache.size
+ 1000
+
+ Number of opened file handlers to cache.
+ A larger value will benefit reads by provinding more file handlers per mob
+ file cache and would reduce frequent file opening and closing.
+ However, if this is set too high, this could lead to a "too many opened file handers"
+ The default value is 1000.
+
+
+
+ hbase.mob.cache.evict.period
+ 3600
+
+ The amount of time in seconds before the mob cache evicts cached mob files.
+ The default value is 3600 seconds.
+
+
+
+ hbase.mob.cache.evict.remain.ratio
+ 0.5f
+
+ The ratio (between 0.0 and 1.0) of files that remains cached after an eviction
+ is triggered when the number of cached mob files exceeds the hbase.mob.file.cache.size.
+ The default value is 0.5f.
+
+
+]]>
+
+
+
\ No newline at end of file
diff --git src/main/docbkx/ops_mgt.xml src/main/docbkx/ops_mgt.xml
index f882646..51fd8f4 100644
--- src/main/docbkx/ops_mgt.xml
+++ src/main/docbkx/ops_mgt.xml
@@ -2144,7 +2144,8 @@ hbase> restore_snapshot 'myTableSnapshot-122112'
If you cannot estimate the size of your tables well, when starting off, it's probably
best to stick to the default region size, perhaps going smaller for hot tables (or
manually split hot regions to spread the load over the cluster), or go with larger region
- sizes if your cell sizes tend to be largish (100k and up).
+ sizes if your cell sizes tend to be largish (100k and up). See also the new feature , introduced in HBase 2.0+.
In HBase 0.98, experimental stripe compactions feature was added that would allow for
larger regions, especially for log data. See .
diff --git src/main/docbkx/schema_design.xml src/main/docbkx/schema_design.xml
index efbcb55..1e86ee7 100644
--- src/main/docbkx/schema_design.xml
+++ src/main/docbkx/schema_design.xml
@@ -464,6 +464,15 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
less than the number of row versions.
+
+ Cell Size
+ HBase is optimized to handle cell sizes up to 100 KB very well, though it can technically
+ handle cell sizes from 1 kb to 10 MB. Objects between 1 MB and 64 MB are referred to as
+ Medium Objects (MOBs), and support for storing those objects directly in HBase is provided in
+ HBase 2.0+. See .
+ For storing objects larger than 64 MB or larger than 10 MB without MOB support, store the
+ objects directly in HDFS, and store a reference to the file path in HBase.
+ Supported Datatypes