Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1181879)
+++ src/docbkx/book.xml (working copy)
@@ -609,8 +609,8 @@
Another common question is whether one should prefer rows or columns. The context is typically in extreme cases of wide
tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.
- Winner: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not where
- one needs to store a few dozen or hundred columns.
+ Winner: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not in the
+ standard use-case where one needs to store a few dozen or hundred columns.
@@ -1687,6 +1687,38 @@
For more information, see the KeyValue source code.
+ Example
+ To emphasize the points above, examine what happens with two Puts for two different columns for the same row:
+
+ Put #1: rowkey=row1, cf:attr1=value1
+ Put #2: rowkey=row1, cf:attr2=value2
+
+ Even though these are for the same row, a KeyValue is created for each column:
+ Key portion for Put #1:
+
+ rowlength (4)
+ row (row1)
+ columnfamilylength (2)
+ columnfamily (cf)
+ columnqualifier (attr1)
+ timestamp (server time of Put)
+ keytype (Put)
+
+
+ Key portion for Put #2:
+
+ rowlength (4)
+ row (row1)
+ columnfamilylength (2)
+ columnfamily (cf)
+ columnqualifier (attr2)
+ timestamp (server time of Put)
+ keytype (Put)
+
+
+
+ It is critical to understand that the rowkey, ColumnFamily, and column (aka columnqualifier) are embedded within
+ the KeyValue instance. The longer these identifiers are, the bigger the KeyValue is.Compaction