Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.20.0
-
None
-
None
-
Implemented in a recent book.xml fix (FAQ section)
Description
One of the conceptual gulfs that needs addressing in HBase documentation is that if people are looking at the Hadoop website, they will read about HDFS that it is for (paraphrasing) "high throughput but does not promise low latency and is not suited for random reads."
HBase runs on top of HDFS, and it promises both low-latency and random reads.
How?
I'm not disputing that HBase does it... but not much is written down anywhere other than references to "caching."
Lars George put together a great page on some of the HBase file structures as they are stored in HDFS. Information like that would be useful to have in the HBase documentation, etc.