Index: src/docbkx/book.xml
===================================================================
--- src/docbkx/book.xml (revision 1196854)
+++ src/docbkx/book.xml (working copy)
@@ -567,8 +567,8 @@
Cardinality of ColumnFamilies
Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).
- If ColumnFamily-A has 1000,000 rows and ColumnFamily-B has 1 billion rows, ColumnFamily-A's data will likely be spread
- across many, many regions (and RegionServers). This makes mass scans for ColumnFamily-A less efficient.
+ If ColumnFamilyA has 1000,000 rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread
+ across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.
@@ -631,11 +631,32 @@
when designing rowkeys.
- Numeric Example
+ Byte Patterns
A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
- This is a perfect example of a small inefficiency that may not seem like much, but can add up in HBase when
- used as rowkeys.
+
+ Not convinced? Below is some sample code that you can run on your own.
+
+// long
+//
+long l = 1234567890L;
+byte[] lb = Bytes.toBytes(l);
+System.out.println("long bytes length: " + lb.length); // returns 8
+
+String s = "" + l;
+byte[] sb = Bytes.toBytes(s);
+System.out.println("long as string length: " + sb.length); // returns 10
+
+// hash
+//
+MessageDigest md = MessageDigest.getInstance("MD5");
+byte[] digest = md.digest(Bytes.toBytes(s));
+System.out.println("md5 digest bytes length: " + digest.length); // returns 16
+
+String sDigest = new String(digest);
+byte[] sbDigest = Bytes.toBytes(sDigest);
+System.out.println("md5 digest as string length: " + sbDigest.length); // returns 26
+