Index: src/docbkx/book.xml =================================================================== --- src/docbkx/book.xml (revision 1087323) +++ src/docbkx/book.xml (working copy) @@ -315,6 +315,69 @@ via the table row key -- its primary key. +
Conceptual View + + The following example is a slightly modified form of the one on page 2 of the BigTable paper. + There is a table called "webtable" that contains two ColumnFamilies named "contents:" and "anchor:". In this example, "anchor:" contains two columns (cssnsi.com, my.look.ca) and "contents:" contains one column (html). + Table "webtable" + + + + + + + Row KeyTime StampColumnFamily "contents:"ColumnFamily "anchor:" + + + "com.cnn.www"t9anchor:cnnsi.com = "CNN" + "com.cnn.www"t8anchor:my.look.ca = "CNN.com" + "com.cnn.www"t6contents:html = "<html>..." + "com.cnn.www"t5contents:html = "<html>..." + "com.cnn.www"t3contents:html = "<html>..." + + +
+
+
+
Physical View + + Although at a conceptual level tables may be viewed as a sparse set of rows, physically they are stored on a per-ColumnFamily basis. New columns (i.e., "columnfamily:column") can be added to any ColumnFamily without pre-announcing them. + ColumnFamily "anchor:" + + + + + + Row KeyTime StampColumnFamily "anchor:" + + + "com.cnn.www"t9anchor:cnnsi.com = "CNN" + "com.cnn.www"t8anchor:my.look.ca = "CNN.com" + + +
+ ColumnFamily "contents:" + + + + + + Row KeyTime StampColumnFamily "contents:" + + + "com.cnn.www"t6contents:html = "<html>..." + "com.cnn.www"t5contents:html = "<html>..." + "com.cnn.www"t3contents:html = "<html>..." + + +
+It is important to note in the diagram above that the empty cells shown in the conceptual view are not stored since they need not be in a column-oriented storage format. Thus a request for the value of the "contents:html" column at time stamp t8 would return no value. Similarly, a request for an "anchor:my.look.ca" value at time stamp t9 would return no value. + +However, if no timestamp is supplied, the most recent value for a particular column would be returned and would also be the first one found since timestamps are stored in descending order. Thus a request for the values of all columns in the row "com.cnn.www" if no timestamp is specified would be: the value of "contents:html" from time stamp t6, the value of "anchor:cnnsi.com" from time stamp t9, the value of "anchor:my.look.ca" from time stamp t8. + +
+
+
Table