Index: src/docbkx/performance.xml
===================================================================
--- src/docbkx/performance.xml (revision 1156398)
+++ src/docbkx/performance.xml (working copy)
@@ -336,4 +336,25 @@
+
+
+ Deleting from HBase
+
+ Using HBase Tables as Queues
+ HBase tables are sometimes used as queues. In this case, special care must be taken to regularly perform major compactions on tables used in
+ this manner. As is documented in , marking rows as deleted creates additional StoreFiles which then need to be processed
+ on reads. Tombstones only get cleaned up with major compactions.
+
+ See also and HBaseAdmin.majorCompact.
+
+
+
+ Delete RPC Behavior
+ Be aware that htable.delete(Delete) doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation.
+ For a large number of deletes, consider htable.delete(List).
+
+ See
+
+
+
Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1158223)
+++ src/docbkx/book.xml (working copy)
@@ -108,10 +108,10 @@
job // job instance
);
...and the mapper instance would extend TableMapper...
- public class MyMapper extends TableMapper<Text, LongWritable> {
-public void map(ImmutableBytesWritable row, Result value, Context context)
-throws InterruptedException, IOException {
-// process data for the row from the Result instance.
+
+public class MyMapper extends TableMapper<Text, LongWritable> {
+ public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
+ // process data for the row from the Result instance.
@@ -211,7 +211,7 @@
Try to minimize row and column sizes
- Or why are my storefile indices large?
+ Or why are my StoreFile indices large?In HBase, values are always freighted with their coordinates; as a
cell value passes through the system, it'll be accompanied by its
row, column name, and timestamp - always. If your rows and column names
@@ -230,9 +230,25 @@
Compression will also make for larger indices. See
the thread a question storefileIndexSize
up on the user mailing list.
- `
- In summary, although verbose attribute names (e.g., "myImportantAttribute") are easier to read, you pay for the clarity in storage and increased I/O - use shorter attribute names and constants.
- Also, try to keep the row-keys as small as possible too.
+
+ Most frequently small inefficiencies don't matter all that much. Unfortunately,
+ this is a case where it does. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
+ several billion times in your data
+ Column Families
+ Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
+
+
+ Attributes
+ Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
+ to store in HBase.
+
+
+ Row Key
+ Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
+ A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
+ when designing rowkeys.
+
+
@@ -289,6 +305,14 @@
See HColumnDescriptor for more information.
+
+ Time To Live (TTL)
+ ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
+ This applies to all versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
+
+ See HColumnDescriptor for more information.
+
+
Secondary Indexes and Alternate Query Paths